← Back to projects
kokoro-speakd — persistent Kokoro TTS daemon
Loads the TTS model once and serves synthesis over a unix socket — sub-200ms warm response instead of a 3-5s per-call cold load.
PythonONNX RuntimeKokoro 82Mlaunchd / systemd
The problem
Per-request model load adds 3-5s latency to every TTS call. Claude Code session-level voice feedback needs sub-second response.
The solution
Daemon architecture: the model is held in memory, requests arrive via unix socket, with optional GPU acceleration via ONNX Runtime. Distributed via PyPI Trusted Publishing with PEP 740 build-provenance attestations.
Overview
Loads the Kokoro 82M model once at daemon start and exposes a unix socket for repeat synthesis requests. Native macOS launchd + Linux systemd integration. PyPI-published with PEP 740 attestations.