Skip to content
Vera

ARCHITECTURE & APPROACH

Why human testing and LLM evals aren't enough.

Evaluating the brain doesn't test the mouth or the ears. You need full-pipe telephony testing.

Manual Human QA

  • Inconsistent: Humans can't replicate the exact same interruption timing twice.
  • Unscalable: You cannot run 1,000 concurrent human calls to test load.
  • Expensive: Paying humans to dial phone numbers is a waste of engineering budget.

LLM-to-LLM Text Evals

  • Ignores Latency: Text evals don't measure Text-to-Speech (TTS) delay.
  • Misses Interruptions: Endpointing failures (when the bot cuts you off) only happen over audio.
  • No PSTN Realities: Ignores packet loss, background noise, and bad cell reception.

Vera Telephony Testing

  • Full-Pipe: Tests STT, the LLM, and TTS over actual SIP/PSTN networks.
  • Programmable: Define the exact dialect, interruption frequency, and mood of the caller.
  • Scalable: Hit your IVR with 10 or 10,000 calls simultaneously.