How to Cut Voice AI Costs: Lessons from 15M+ Calls on Vapi
Hi, I’m Piotr. I’ve run over 15 million calls on Voice AI with Vapi, and I want to share a few quick, practical tips that can save you a significant amount of money on your Voice AI application.
The Wrong Places People Look First
When teams try to cut Voice AI costs, they usually reach for one of these:
- “Vapi is expensive, let’s migrate off it.” (It’s maybe 5 cents per minute, which is rarely the real problem.)
- “Let’s re-architect the pipeline” into complex multi-agent setups that carefully trim which context gets passed where.
- “Let’s rework the app” to push fewer tokens through the LLM.
Most of my clients start here. It’s almost never where the biggest savings are.
Where the Money Actually Goes
A typical Voice AI cost breakdown looks like this:
- Provider cost per minute (e.g. Vapi)
- LLM
- Text-to-speech
- Speech-to-text
- Telephony
- Your own infrastructure / deployments
People tend to optimize roughly in that order: LLM first, then the voice provider, then TTS and STT. In reality, telephony is usually the biggest culprit, and it’s hiding in plain sight.
The Telephony Trap
Almost everyone uses Twilio or Telnyx. Both bill rounded up to the full minute. A 5-second call is billed as a full minute.
In our case, ~95% of calls ended up being voicemail, and voicemail detection typically took around 8 seconds. So the vast majority of our calls were 8 seconds long, and we were paying for a full minute on every single one of them.
When I plotted our cost pie chart, telephony dominated it. I did not expect that, and it’s not intuitive.
The Fix
We switched to a local telephony provider (for us, based in South Africa) that billed per second, at per-second rates roughly on par with Twilio’s per-minute rates. The result: roughly an 8x reduction in telephony cost.
What I’d Do Differently
Don’t flip 100% of traffic to a new provider overnight. We did, and as soon as we hit around 10,000 accounts, things started breaking. A less-known provider simply didn’t handle our load as well as Twilio or Telnyx would have, and most of our calls started failing on random issues.
At scale, roll out any cost-saving switch gradually. Keep the incumbent as the bulk of traffic while you verify the new provider holds up.
The Next Chart: Verification Is Eating Your Minutes
After the telephony fix, telephony shrank to a much smaller slice and the LLM became the next largest one. But before touching the LLM, I looked at what our calls were actually spending time on.
Roughly 80% of call time was verification, confirming day, month, and year of birth.
A typical failure looked like this:
- Agent: “Can you give me your date of birth?”
- Caller: “10th of June, 1958.”
- STT mis-transcribes the month. LLM says, “Can you repeat it?”
- Caller repeats, gets the day slightly wrong this time.
- LLM errors again. Loop continues.
You can never trust speech-to-text to transcribe this kind of structured data correctly. It will get something wrong almost every time. So the job is to design around it.
Three Things That Worked
1. Send an SMS link for details. Massively underappreciated UX. The caller taps a link, fills in the data in a form, and the result is returned to the LLM. Zero transcription ambiguity.
2. Use the keypad / DTMF as the source of truth. Where SMS isn’t appropriate, the keypad solves the same problem for most cases. I see very few Voice AI apps leaning on this hard enough.
3. Return partial-failure responses from verification tools.
This is the one I’ve never seen anywhere else, and it’s the most impactful. Instead of a boolean pass/fail, have your custom verification tool return which fields matched and which didn’t, e.g. { day: true, month: true, year: false }.
The LLM can then say, “Can you repeat the year for me, please?” and only re-collect the broken field. No more re-asking the entire date of birth.
For us this made verification about 50% faster. Since 80% of our minutes were going into verification, that was a massive overall saving.
Only Now: Optimize LLM, TTS, and STT
With telephony and verification solved, the providers-per-component swap becomes the easy part.
- LLM: Look at the τ² (Tau-squared) benchmark. It’s the most useful signal I’ve found for whether an LLM is actually viable for voice AI. Moving from our previous model to GPT-4.1-mini, and then to GPT-5.4-nano, was several times cheaper at the same latency.
- Text-to-speech: Benchmark it on your own production cases, such as number pronunciation, malformed input handling, etc. Deepgram Aura came out ~3x cheaper than ElevenLabs for our workload.
- Speech-to-text: Same rule. Test on the utterances you actually see in production.
The Takeaway
The biggest Voice AI savings are almost never in the places people look first:
- Fix telephony billing (per-second provider, gradual rollout).
- Fix verification UX (SMS links, keypad, partial-failure tool responses).
- Then swap LLM, TTS, and STT providers using real benchmarks.
Once telephony and call UX are in order, the provider swaps are easy. Together, these changes saved us well over $100k per year.
I hope you can apply some of this in your own setup. I’m Piotr, I consult on Voice AI solutions. See you next time.
Ready for a production cost audit? Get a free consultation.