Liveness probes tell you the process is up. Readiness probes tell you it can actually do its job. For a voice agent, the difference is whether the next call connects or drops.
The default health check is too generous
The standard FastAPI starter ships something like:
1 |
|
This returns 200 as long as the Python interpreter is alive and the event loop is scheduling. It tells you nothing about whether the app can actually take a call:
- Is
TWILIO_AUTH_TOKENset? - Is
ULTRAVOX_API_KEYset? - Is
N8N_WEBHOOK_URLreachable? - Is
PUBLIC_URLpopulated so Twilio can call us back?
If any of these are missing, the process is “healthy” but every call will fail with a 500 or worse — a dial tone that never goes anywhere.
Liveness vs readiness — two different questions
| Probe | Question | Failure action |
|---|---|---|
Liveness (/health) |
“Is the process stuck?” | Kill and restart the container |
Readiness (/ready) |
“Should I send this pod traffic?” | Stop routing requests to it |
Conflating them is how you end up with a deployment that’s “100% healthy” while 100% of calls drop.
The shape of a useful readiness endpoint
In VoxFlow it’s about ten lines:
1 |
|
Two things matter here:
- It returns 503 when not ready. That’s the contract Kubernetes’
readinessProbe, AWS ALB target groups, and most reverse proxies all understand. Without the 503, the orchestrator can’t act on the answer. - The body lists every check individually. When something fails, your on-call doesn’t need to grep logs —
curl /readytells them exactly which env var is missing.
Why we don’t ping Twilio/Ultravox here
You’ll see “deep health check” tutorials that recommend pinging every downstream from /ready. Don’t.
- Cost — Ultravox charges per API call; a probe every 5s adds up.
- Cascading failures — if Ultravox has a 2-minute blip, every replica suddenly fails readiness and your traffic dies, even though calls in flight would still complete.
- Truth — “the env var is set” is a meaningful production invariant; “the downstream answered our probe in the last second” is noise.
If you want downstream health, gauge it from real call outcomes (your Prometheus metrics, the next blog post in this series) — not from a synthetic probe.
Kubernetes wiring
1 | livenessProbe: |
Liveness is loose (don’t kill the container for transient stalls). Readiness is tight (pull the pod from the load balancer fast when config goes bad).
A war story
The bug this would have prevented: a rolling deploy on a Friday afternoon. New Secret resource was missing one env var. Pods came up healthy. /health returned 200. Twilio dispatched calls. Every call returned 500 because the missing var blew up inside the request handler. Pager went off four minutes later, after callers had already complained.
With /ready checking env-var presence, the orchestrator would never have routed a single call to those pods. The bad deploy would have stalled at “0/3 pods ready” — visible, contained, fixable before any user noticed.
Takeaway
/health answers “am I alive?” /ready answers “should you trust me?” Voice agents need both. The readiness endpoint is twenty lines of code that turns a class of production outages into a deployment-time error. Add it before you ship.