
How xAI actually charges you
Pricing pages for voice AI tend to bury the unit you're actually billed on. xAI doesn't: the Voice API section of its pricing page states the billable unit up front, and it's minutes of audio, not tokens, not characters, not "credits."
Here's the full rate card, every line item xAI publishes, not just the headline number:
| Line item | Price |
|---|---|
| Realtime voice agent | $0.05 / min ($3.00 / hr) |
| Phone number (telephony) | +$0.01 / min |
| Realtime text input | $0.004 / message |
| Text to speech | $15.00 / 1M characters |
| Speech to text | $0.10 / hr (REST), $0.20 / hr (streaming) |
| Web search | $5 / 1,000 calls |
| X search | $5 / 1,000 calls |
| Document / collections search (RAG) | $2.50 / 1,000 calls |
| File attachment search | $10 / 1,000 calls |
| File storage | $0.025 / GiB / day |
| Collection storage | $0.10 / GiB / day |
| Batch API | 20-50% off standard token rates (text/language models only) |
| Priority processing | 2x premium over standard rates |
Two things jump out once you see the whole table. First, the voice rate and the telephony rate are the only two costs you can budget with total confidence, everything else scales with how the agent actually behaves on a call. Second, there's no line for a monthly platform fee or seat license anywhere on it, Grok's billing is pure consumption, which is unusual for a voice-agent builder and worth noting if you've priced out subscription-tier competitors.
What a real call actually costs
The rate card only tells you the price per unit. What you actually want to know is what a call costs, so I built it out line by line for a realistic 5-minute support call on a provisioned number, one that fires a couple of tool calls to look something up.

- Voice: 5 min × $0.05 = $0.25
- Phone number: 5 min × $0.01 = $0.05
- Two web searches: 2 × $0.005 = $0.01
- One document/collections search: 1 × $0.0025 = $0.0025
That's ~$0.31 for a 5-minute call that actually looks something up, versus the ~$0.30 in voice and telephony alone. In other words, the tool-call gotcha that xAI's fine print warns about is real, but on a single call it's a few cents, not a multiplier. The underlying Grok model also bills its own token usage for the reasoning behind those tool calls ($1.25-$2.50 per 1M tokens depending on the model), which is typically a rounding error next to the per-minute charges unless the agent is reasoning through something unusually long.
Where this compounds is volume, not any single call.
What it costs once you're actually running it
A single call's math is reassuring. A month of calls is where the number starts to matter for a budget conversation.

Assuming a 4-minute average call on a provisioned number ($0.06/min for voice plus telephony), before any tool-call spend:
| Monthly call volume | Voice + telephony only |
|---|---|
| 200 calls/month | ~$48 |
| 1,000 calls/month | ~$240 |
| 5,000 calls/month | ~$1,200 |
Add a light layer of tool calls (a couple of lookups per call, per the worked example above) and each figure moves up by roughly 5-6%, so the 5,000-call tier lands closer to $1,270/month rather than $1,200. There's no volume discount published anywhere on the pricing page at any of these tiers, the per-minute rate stays flat whether you're doing 200 calls or 50,000. That's a fair trade for predictability, but it does mean scale doesn't earn you a better rate the way some enterprise-tier competitors quietly offer.
How the price compares to OpenAI and ElevenLabs
The reason xAI leads with "$0.05/min, industry-leading" is that most competing voice stacks don't publish a comparably clean number.

| Provider | Billing unit | Effective rate |
|---|---|---|
| Grok Voice Agent | Per minute (flat) | $0.05/min |
| ElevenLabs Agents | Per minute (subscription tiers) | ~$0.08/min, consistent from the $6 Starter plan (75 min) up to the $990 Business plan (12,375 min) |
| ElevenLabs API (pay-per-use) | Per minute | $0.08/min speech engine rate |
| OpenAI gpt-realtime-2 | Per token | $32/1M input, $64/1M output audio tokens, no blended per-minute rate published |
ElevenLabs is the closer of the two comparisons because it also publishes a real per-minute number, and every one of its subscription tiers, from the $6 Starter plan's 75 minutes to the $990 Business plan's 12,375 minutes, works out to almost exactly $0.08 per minute. Grok undercuts that by 37.5% on the base voice rate alone.
OpenAI is the harder one to pin down. Its current pricing page bills gpt-realtime-2 by audio tokens, not minutes, and doesn't publish a blended per-minute estimate for the model at all, only two narrower models (gpt-realtime-translate at $0.034/min and gpt-realtime-whisper at $0.017/min) get a per-minute figure. That's exactly why xAI's own comparison leans on an estimate rather than a quote: xAI states that "$0.10/min is a highly conservative blended estimate. In production, pricing typically exceeds $0.10/min." That's xAI's number about a competitor, not OpenAI's own published rate, so treat it as directional, but the fact that OpenAI doesn't publish a comparable per-minute figure at all is itself telling.
Developers who've actually run the comparison aren't just talking about latency. One of the more measured takes from the community, after Grok topped the Big Bench Audio leaderboard, was a reminder that price and speed are still moving targets:
"Good job on getting it to #1 in the benchmark but cost and speed needs work. Though I expect xAI to deliver on cost and speed soon enough."
The gotchas to budget for
Three things the headline rate doesn't tell you, worth checking before you commit a budget line to this:
- No free tier. xAI's pricing page lists paid rates only. You get one free phone number to test with and browser-based testing, but there's no usage allotment to prototype against before you're billed.
- No published volume discount. The per-minute rate is identical at 200 calls and 50,000 calls a month, per the worked table above. If you're negotiating enterprise volume, that conversation happens off the public page.
- Tool calls scale with agent behavior, not call count. An agent configured to search aggressively on every turn racks up $5-$10 per 1,000 calls in tool fees regardless of how short the calls are, so the real lever on your bill is how the agent is prompted and configured, not just how many minutes it talks.
If your backlog is text, not phone calls
Everything above is about phone and voice minutes, which is a genuinely different budget line than the one most support teams are actually trying to shrink: the pile of email and chat tickets sitting in Zendesk or Freshdesk. That's the queue I build automation for at eesel, and the pricing comparison there isn't per-minute at all, it's per resolved ticket.
eesel's pricing is $0.40 per resolved conversation, no per-seat fee, no platform minimum, with the first $50 of usage free to test against your own tickets. It plugs into your existing helpdesk in minutes, learns from tickets you've already solved, and runs a simulation against your ticket history before it ever answers a live customer, which is the same "don't let it act on a guess" problem that voice-agent builders like Grok's are still working through. Smava runs a fully automated agent on 100,000+ tickets a month at that rate, and Gridwise resolved 73% of tier-1 requests in its first month.
Match the spend to the channel: Grok Voice for a phone line, an AI helpdesk agent for the ticket queue. You can try eesel free before deciding either way.









