kokoro-speakd — persistent Kokoro TTS daemon

Loads the TTS model once and serves synthesis over a unix socket: sub-200ms warm response, instead of a 3-5s cold load per call.

PythonONNX RuntimeKokoro 82Mlaunchd / systemd

View source

Model load: once per daemon
Warm latency: < 200 ms
PyPI provenance: PEP 740 attestations

The problem

Per-request model load adds 3-5s latency to every TTS call. Claude Code session-level voice feedback needs sub-second response.

The solution

Daemon architecture: the model is held in memory, requests arrive via unix socket, with optional GPU acceleration via ONNX Runtime. Distributed via PyPI Trusted Publishing with PEP 740 build-provenance attestations.

Overview

Loads the Kokoro 82M model once at daemon start and exposes a unix socket for repeat synthesis requests. Native macOS launchd + Linux systemd integration. PyPI-published with PEP 740 attestations.