Skip to main content
Back to projects

kokoro-speakd — persistent Kokoro TTS daemon

Loads the TTS model once and serves synthesis over a unix socket — sub-200ms warm response instead of a 3-5s per-call cold load.

PythonONNX RuntimeKokoro 82Mlaunchd / systemd
Model load
once per daemon
Warm latency
< 200 ms
PyPI provenance
PEP 740 attestations

The problem

Per-request model load adds 3-5s latency to every TTS call. Claude Code session-level voice feedback needs sub-second response.

The solution

Daemon architecture: the model is held in memory, requests arrive via unix socket, with optional GPU acceleration via ONNX Runtime. Distributed via PyPI Trusted Publishing with PEP 740 build-provenance attestations.

Overview

Loads the Kokoro 82M model once at daemon start and exposes a unix socket for repeat synthesis requests. Native macOS launchd + Linux systemd integration. PyPI-published with PEP 740 attestations.