ARCHITECTURE & APPROACH
Why human testing and LLM evals aren't enough.
Evaluating the brain doesn't test the mouth or the ears. You need full-pipe telephony testing.
Manual Human QA
- ❌ Inconsistent: Humans can't replicate the exact same interruption timing twice.
- ❌ Unscalable: You cannot run 1,000 concurrent human calls to test load.
- ❌ Expensive: Paying humans to dial phone numbers is a waste of engineering budget.
LLM-to-LLM Text Evals
- ❌ Ignores Latency: Text evals don't measure Text-to-Speech (TTS) delay.
- ❌ Misses Interruptions: Endpointing failures (when the bot cuts you off) only happen over audio.
- ❌ No PSTN Realities: Ignores packet loss, background noise, and bad cell reception.
Vera Telephony Testing
- ✅ Full-Pipe: Tests STT, the LLM, and TTS over actual SIP/PSTN networks.
- ✅ Programmable: Define the exact dialect, interruption frequency, and mood of the caller.
- ✅ Scalable: Hit your IVR with 10 or 10,000 calls simultaneously.