Blog / AI news

Claude Sonnet 5 pricing: the real cost in 2026

Written by

Kurnia Kharisma Agung Samiadjie

Reviewed by

Katelin Teen

Last edited July 2, 2026

Expert Verified

Claude Sonnet 5 pricing illustration with the Anthropic mark

TL;DR

Claude Sonnet 5 pricing is $3 per million input tokens and $15 per million output, with introductory rates of $2/$10 running through 31 August 2026. That is exactly the same per-token rate as Sonnet 4.6, and roughly half the price of Opus 4.8 ($5/$25). It keeps the 1M-token context window.

Two catches make the sticker misleading. First, Sonnet 5 uses a new tokenizer that counts about 30% more tokens for the same text, so per-token parity with 4.6 does not mean per-request parity. Second, at higher effort settings it generates a lot of thinking tokens, and independent testing found it can cost more per task than Opus 4.8 despite the lower rate.

For anyone pricing a support bot, my honest take after years of running AI on live ticket queues: the token rate is the cheapest part of the bill. The expensive 80% is the retrieval, guardrails, actions, and testing wrapped around the model, and that is the number you should actually be forecasting.

Claude Sonnet 5 pricing at a glance

Anthropic launched Claude Sonnet 5 on 30 June 2026 as the mid-tier ("Sonnet-class") model, and the headline it leads with is value: near-Opus quality on coding and agentic work at Sonnet cost. Here is the number everyone comes for.

Period	Input / MTok	Output / MTok
Introductory (through 31 Aug 2026)	$2	$10
Standard (from 1 Sep 2026)	$3	$15

The standard $3/$15 rate is identical to Sonnet 4.6 (and 4.5, and Sonnet 4 before it). Anthropic set the introductory discount deliberately, so that migrating to Sonnet 5 stays "roughly cost-neutral" while the discount is live, despite the new tokenizer. More on why that caveat matters below.

For context on where this sits, Sonnet 5 is the everyday default: it is the default model for Free and Pro plans on Claude.ai, and it is also available to Max, Team, and Enterprise users, plus Claude Code and Cowork. So for most people, "which Claude am I using" is now Sonnet 5 unless they switch.

The full Claude Sonnet 5 pricing table

The base rate is only the start. Prompt caching and batch processing change the real per-request math a lot, especially for a support workload that reuses the same knowledge base context on every ticket.

Rate category	Intro (through 31 Aug 2026)	Standard (from 1 Sep 2026)
Base input	$2 / MTok	$3 / MTok
5-minute cache write	$2.50 / MTok	$3.75 / MTok
1-hour cache write	$4 / MTok	$6 / MTok
Cache hits & refreshes (read)	$0.20 / MTok	$0.30 / MTok
Output	$10 / MTok	$15 / MTok
Batch API input (50% off)	$1 / MTok	$1.50 / MTok
Batch API output (50% off)	$5 / MTok	$7.50 / MTok

All figures are from Anthropic's own pricing docs. The cache multipliers are consistent across models: a 5-minute write is 1.25x the base input rate, a 1-hour write is 2x, and a cache read is 0.1x. Anthropic's product page cites "up to 90% cost savings with prompt caching and 50% cost savings with batch processing." A couple of modifiers stack on top: US-only inference (inference_geo: "us") adds a 1.1x multiplier on all token categories, and the full 1M-token window is billed at the standard per-token rate with no long-context surcharge.

How Sonnet 5 pricing compares across the Claude 5 family

The point of Sonnet 5 is the price-to-capability ratio, so it only makes sense next to the rest of the lineup. Here is where the standard rates land.

Model	Input / MTok	Output / MTok	Context
Claude Fable 5	$10	$50	1M
Claude Mythos 5	$10	$50	1M
Claude Opus 4.8	$5	$25	1M
Claude Sonnet 5	$3 ($2 intro)	$15 ($10 intro)	1M
Claude Sonnet 4.6	$3	$15	1M
Claude Haiku 4.5	$1	$5	200K

The reason the mid-tier price is interesting is that Sonnet 5's quality is not mid-tier. Anthropic's own benchmark table puts it a step below Opus 4.8 but a clear jump over Sonnet 4.6, at 40% of Opus's output price.

Claude Sonnet 5 benchmark table showing Sonnet 5 against Sonnet 4.6 and Opus 4.8 across agentic coding, reasoning, and computer use, as taken from Anthropic

On SWE-bench Pro agentic coding, Sonnet 5 scores 63.2% versus Sonnet 4.6's 58.1% and Opus 4.8's 69.2%. On OSWorld-Verified computer use it hits 81.2%, within striking distance of Opus 4.8's 83.4%. Anthropic's framing is that Sonnet 5 "covers a much wider range of cost-performance options than Opus 4.8" and its higher-effort runs "can match Opus 4.8 on some tasks." That is the whole pitch: you dial effort up when you need it, and pay Sonnet rates the rest of the time.

The tokenizer catch: same price per token, bigger bill

Here is the line item most spend models miss. Sonnet 5 uses a new tokenizer (the same family introduced with Opus 4.7) that "produces approximately 30% more tokens for the same text." The launch post's footnote gives a finer range of roughly 1.0x to 1.35x depending on content type.

Diagram showing the same prompt billed as 1,000 tokens on Sonnet 4.6 but about 1,300 tokens on Sonnet 5 because of the new tokenizer

So the sticker price matches Sonnet 4.6, but per-token parity is not per-request parity. The exact same prompt and response get billed on more tokens. Anthropic designed the introductory discount to absorb this so the transition is "roughly cost-neutral" while it is active. Once standard pricing kicks in on 1 September 2026, that token inflation becomes a real cost increase per equivalent request versus Sonnet 4.6. If you are building a spend forecast, apply the ~1.0x to 1.35x token multiplier on top of the rate rather than comparing sticker $/MTok directly. This is exactly the kind of thing that quietly wrecks a naive cost-savings estimate.

The effort dial is your real cost lever

Sonnet 5's effort parameter (low / medium / high / xhigh / max) is the practical knob for cost. Higher effort means more thinking tokens, which means higher cost but higher capability. It defaults to high on the Claude API and Claude Code, and Sonnet 5 is the first Sonnet-tier model to get the new xhigh level.

A dial from low to max showing that higher effort means more thinking tokens and higher cost, with Sonnet 5 at medium roughly equal to Sonnet 4.6 at high

The efficiency mapping Anthropic gives is worth knowing before you set a default: Sonnet 5 at medium is roughly Sonnet 4.6 at high, and Sonnet 5 at high is roughly Sonnet 4.6 at max. In plain terms, you can often drop an effort level versus 4.6 and get the same result for fewer tokens. For a high-volume workload like automated ticket resolution, that dial is where your real monthly number is decided, not the headline rate. Anthropic even raised rate limits across Chat, Cowork, Claude Code, and the Platform specifically to accommodate higher-effort usage.

What Sonnet 5 actually costs per task

This is where the value story gets a useful argument attached to it. Anthropic and early boosters framed Sonnet 5 as "Opus-level work at Sonnet price," and on the sticker that is true. But the community pushed back fast, and the pushback is the genuinely useful part for a buyer.

Artificial Analysis, a named third-party benchmark aggregator, reported Sonnet 5 scoring 53 on its Intelligence Index and costing around $2.29 per task on that run, and noted that without the promotional pricing it can cost more per task than Opus 4.8 on that index, because higher-effort Sonnet 5 runs burn a lot of tokens.

Claude Sonnet 5 achieves 53 on the Artificial Analysis Intelligence Index, but without promotional pricing will cost more per task than Opus 4.8.

Artificial Analysis on X

That is the same tokenizer-and-effort dynamic showing up in a real number. Not everyone agrees on the read, and named operators still like it for day-to-day work. Wade Foster, CEO of Zapier, put the practical case for it plainly.

Claude Sonnet 5 is live. It does Opus-level work at Sonnet-level pricing. Here's when to use it.

Wade Foster on LinkedIn

Both things are true, and the resolution is the effort dial. Run Sonnet 5 at low or medium effort and it is clearly cheaper per task than Opus 4.8 for most work. Crank it to xhigh or max on hard problems and the per-task cost can cross over. Verify the exact figures at the source before you cite them, but the shape of the trade is clear: sticker $/MTok is not the whole cost story, and for a cost comparison you need the tokens-per-task number too.

The pricing question that actually matters for support teams

Here is where I have to be blunt, because this is the mistake I watch teams make most. A cheaper, near-Opus model like Sonnet 5 makes "we'll just build our own support bot on the API" sound more tempting than it has in years. And the model really is cheap now. But the model was never the hard part.

Comparison of the small Claude Sonnet 5 token cost against the much taller stack of a real support agent's cost: retrieval, guardrails, actions, and escalation

I have watched a confident-sounding bot quietly hand customers wrong answers, which is why every serious rollout now gets simulated against historical tickets before it ever touches a live queue. A support agent your customers can trust is retrieval over your docs and past tickets, confidence-based routing, real actions inside your helpdesk, clean escalation to humans, and testing. Sonnet 5 gives you a smart engine for pennies. It does not give you any of that 80%.

So the pricing question that matters is not "$2 or $3 per million tokens." It is the fully loaded cost per resolved ticket once you add the wrapper, and the engineering time to build and maintain it. That is the number that decides whether building beats buying, and it is almost always where the raw-API plan falls apart. If you want to sanity-check your own math, the guide on AI support agent cost and the ROI of AI customer service both work through it, and the wider AI for customer service roundup covers the buy side.

Try eesel

If you are pricing Sonnet 5 because you want an AI agent resolving tickets, eesel is the 80% the API rate does not include. It plugs into your existing helpdesk, learns from your past tickets and help center, and runs on usage-based pricing with no per-seat fees, so your cost tracks resolved tickets rather than a fixed license. The part I would flag most: before you go live, eesel simulates the agent against thousands of your real historical tickets so you see the resolution rate and the exact cost per ticket up front, instead of finding out in production.

eesel AI reports dashboard showing resolution and usage analytics

It is free to try, and you can point it at your own docs in a few minutes to see what a real resolution rate looks like on your tickets, whichever Claude model runs underneath.

Frequently Asked Questions

How much does Claude Sonnet 5 cost?

Claude Sonnet 5 pricing is $3 per million input tokens and $15 per million output, with introductory rates of $2/$10 running through 31 August 2026. That is the raw API cost. If you are budgeting a full support bot, the guide on AI support agent cost shows what else lands on the invoice.

Is Claude Sonnet 5 pricing cheaper than Opus 4.8?

Yes on the sticker. Sonnet 5 is $3/$15 versus Opus 4.8 at $5/$25 per million tokens. But at standard pricing and high effort, Sonnet 5 can generate enough tokens per task to cost more than Opus 4.8 on some benchmarks, so model the tokens, not just the rate. See the wider Claude Sonnet 5 overview.

Why is Claude Sonnet 5 pricing the same as Sonnet 4.6 but my bill went up?

Sonnet 5 uses a new tokenizer that counts roughly 30% more tokens for the same text, so the same prompt is billed on more tokens even though the per-token rate is identical. Re-check your real per-conversation cost rather than reusing old estimates. More on model choice in the AI chatbot platform guide.

Can I run a customer support agent on Claude Sonnet 5 pricing alone?

The API rate is only the model. A production agent also needs retrieval, confidence-based routing, actions inside your helpdesk, escalation, and testing before go-live, which is why teams on the raw API often rebuild what an AI for customer service platform already ships. This roundup of AI support agents covers the buy side.

Does Claude Sonnet 5 pricing include cheaper batch and cache rates?

Yes. The Batch API is 50% off (input and output), and cache reads are billed at 0.1x the base input rate, which Anthropic says can cut costs up to 90% on repeated context. For high-volume support that reuses the same knowledge base, both matter for cost savings.