Blog / AI

Claude Sonnet 5 review: is it the one to actually use?

Written by

Kurnia Kharisma Agung Samiadjie

Reviewed by

Katelin Teen

Last edited July 2, 2026

Expert Verified

TL;DR

Claude Sonnet 5 is Anthropic's new mid-tier model (launched June 30, 2026), and the pitch is blunt: near-Opus quality on coding and agentic work, at Sonnet's price. After going through the benchmarks and the pricing docs, I mostly buy it. On Anthropic's own numbers it beats Sonnet 4.6 across the board and lands within a point or two of the much pricier Opus 4.8 on several tests, and on one knowledge-work benchmark it actually edges it.

The catch is cost. The sticker price ($3 / $15 per million tokens) is identical to Sonnet 4.6, but a new tokenizer generates more tokens per request and the model thinks harder by default, so the real bill can drift up. One independent index even found it costing more per task than Opus 4.8 at standard pricing. So: excellent model, watch the effort dial.

If you're evaluating Sonnet 5 to power AI support, the model is only half the story: what determines whether it's safe to let loose on tickets is the layer around it. That's the part eesel AI handles, and I'll get to it at the end.

Anthropic's Claude Sonnet 5 product page, as taken from Anthropic

What Claude Sonnet 5 is, in one minute

I build on these models for a living, so I'll keep the spec recital short. Claude Sonnet 5 (claude-sonnet-5) is the direct successor to Sonnet 4.6 and the new default model on Claude.ai. Anthropic calls it "our most agentic Sonnet yet," and the headline specs back the framing: a 1M-token context window, up to 128K tokens of output, high-resolution vision, and adaptive thinking on by default.

The two things worth internalizing before anything else:

The effort parameter (low / medium / high / xhigh / max) is new territory for a Sonnet model, which is the first to get xhigh. It's the knob that decides how hard the model thinks, and it's the knob that decides your bill.
A new tokenizer that produces roughly 1.0-1.35x more tokens for the same text versus Sonnet 4.6, which matters more than it sounds. More on that below.

It shipped everywhere on day one: Claude.ai (web and mobile), Claude Code and Cowork, the Claude Platform, plus AWS, Google Cloud, and Microsoft Foundry. For a fuller rundown of what changed, my colleague wrote up what Claude Sonnet 5 is separately; this piece is the review.

The benchmarks: how good is it, really?

Here's the table Anthropic published, and it's the clearest way to see where Sonnet 5 actually lands.

Claude Sonnet 5 benchmark table comparing Sonnet 5, Sonnet 4.6, and Opus 4.8, as taken from Anthropic

A few numbers jump out. On agentic coding, Sonnet 5 hits 63.2% on SWE-bench Pro (up from 58.1% for Sonnet 4.6, and closing on Opus 4.8's 69.2%) and 80.4% on Terminal-Bench 2.1, nearly matching Opus 4.8's 82.7%. On computer use (OSWorld-Verified) it scores 81.2%, a hair under Opus 4.8's 83.4%. And on knowledge work (GDPval-AA v2) it posts 1618, edging out Opus 4.8's 1615, which is the kind of result that makes the "near-Opus" claim ring true rather than marketing.

The pattern holds on Humanity's Last Exam: with tools, Sonnet 5 hits 57.4% against Opus 4.8's 57.9%, basically a tie. Where Opus still clearly leads is tools-off reasoning (49.8% vs 43.2%), which is the tell for what Opus is still for.

What makes the story more interesting than a single-number leaderboard is how the gains show up as you spend more. Anthropic plotted pass rate against cost per task on BrowseComp, the agentic-search benchmark:

Claude Sonnet 5 agentic search performance by effort level on BrowseComp, as taken from Anthropic

The orange Sonnet 5 line sits well above old Sonnet 4.6 at every price point, and it tracks the Opus 4.8 curve closely until the very top end. In plain terms: for most of the cost range, Sonnet 5 gives you Opus-shaped results for Sonnet money. That's the whole review in one chart.

Where Sonnet 5 sits in the Claude 5 lineup

Anthropic's naming is a bit of a maze right now (the flagship is called Fable 5, while Opus, Sonnet, and Haiku keep their own version numbers), so here's how the tiers actually stack by price and capability.

A price-and-capability ladder of the Claude 5 family, from Fable 5 at the top down to Haiku 4.5, with Sonnet 5 highlighted in the middle

Sonnet 5 sits squarely in the middle: cheaper and faster than Opus 4.8, more capable than Haiku 4.5, and well below the Fable 5 frontier tier. Anthropic's own framing is that the biggest recent capability jumps have come from its Opus-class models, and that "Sonnet 5 narrows the gap." The benchmarks agree. The practical upshot: for a lot of teams, the "which Claude do I default to?" question now has a boring answer, and it's this one.

Model	Role	Input / output per MTok	Context
Claude Fable 5	Frontier flagship	$10 / $50	1M
Claude Opus 4.8	Most capable Opus-tier	$5 / $25	1M
Claude Sonnet 5	Best speed + intelligence balance	$3 / $15	1M
Claude Haiku 4.5	Fastest and cheapest	$1 / $5	200K

The effort dial is the real story

If you take one thing from this review, make it this: with Sonnet 5, you stop switching models to control cost and start turning a dial instead.

The Claude Sonnet 5 effort dial from low to max, showing that Sonnet 5 at medium performs like Sonnet 4.6 at high

Anthropic's rough mapping is genuinely useful for budgeting: Sonnet 5 at medium performs roughly like Sonnet 4.6 did at high, and Sonnet 5 at high lands around where 4.6 sat at max. So the same quality bar now costs you fewer thinking tokens, which is a real efficiency win. The flip side is that the default is high, so out of the box Sonnet 5 thinks harder (and costs more per call) than you might expect if you're used to older defaults. Anthropic even raised rate limits across Chat, Cowork, Claude Code, and the API specifically to absorb the higher token usage at higher effort.

For anyone wiring Sonnet 5 into a product, the effort dial is where your cost engineering now lives. Set it too low and you leave quality on the table; leave it on max for simple tasks and you torch budget for no reason.

The value controversy: is it actually cheaper?

This is where the launch-week consensus split, and it's the most useful part of the review to sit with.

The sticker price is unchanged from Sonnet 4.6: $2 / $10 per million tokens as an introductory rate through August 31, 2026, then $3 / $15 after that. But two things quietly push the real cost up. First, that new tokenizer bills the same text on roughly 1.0-1.35x more tokens. Second, the higher default effort means more thinking tokens per request. Per-token price parity is not per-request price parity, and Anthropic is candid that the intro discount exists to keep the migration "roughly cost-neutral" while it lasts.

The independent benchmark aggregator Artificial Analysis put a sharp point on it:

"Claude Sonnet 5 achieves 53 on the Artificial Analysis Intelligence Index, but without promotional pricing will cost more per task than Opus 4.8."
Artificial Analysis on X

They clocked it at about $2.29 per task on that index. On the enthusiast side, the reaction was the mirror image, Wade Foster, Zapier's CEO, summed up the bull case:

"Claude Sonnet 5 is live. It does Opus-level work at Sonnet-level pricing."
Wade Foster on LinkedIn

Both are right, which is the point. At low and medium effort on typical work, Sonnet 5 is a clear value. Push it to xhigh or max on a hard task and you can spend past Opus 4.8 for a result Opus would have given you more cheaply. The threads on r/singularity have been chewing on exactly this. The verdict isn't "cheap" or "expensive," it's "cheap if you manage the dial."

Sonnet 5 vs Opus 4.8: which should you use?

Because they overlap so much (both 1M context, both strong on agents), the decision is less "better vs worse" and more "which one for this job."

A decision guide for choosing between Claude Sonnet 5 and Opus 4.8 based on task type, volume, and cost

My rule of thumb after reading the numbers: default to Sonnet 5 for the bulk of coding, tool use, and everyday agent work, especially anything you run at volume where cost compounds. Reach for Opus 4.8 when the task is genuinely hard reasoning, needs long stretches of autonomy, or is high-stakes enough that the marginal quality is worth the premium (and where Opus's tools-off reasoning edge actually matters). For most people, most of the time, Sonnet 5 is now the right default, and Opus is the escalation path, not the starting point.

Is it safe to put in front of customers?

For a chatbot demo, safety is a footnote. For an agent that autonomously touches real customer tickets, it's the whole ballgame, so I looked closely here.

The good news: Sonnet 5 is the safest Sonnet to date. Anthropic reports lower hallucination, less sycophancy, stronger prompt-injection robustness, and cleaner refusals of malicious requests than Sonnet 4.6. On the automated behavioral audit it scored better (safer) than its predecessor.

Rates of misaligned behavior across Claude models, with Sonnet 5 lower than Sonnet 4.6 but higher than Opus 4.8, as taken from Anthropic

The honest wrinkle: Sonnet 5's misaligned-behavior score (2.53) is better than Sonnet 4.6 (2.89) but worse than Opus 4.8 (2.10) and Mythos Preview (1.95). So it's safer than the model it replaces, not the safest in the family. Anthropic's system card is upfront that Sonnet 5 poses "very low alignment risk (though higher than for previous Sonnet models)."

On offensive-cyber capability, though, Sonnet 5 is reassuringly weak: on Anthropic's Firefox 147 exploit-development test (built with Mozilla, all vulnerabilities since patched), it never produced a working exploit.

Scores measuring Claude models' success at developing Firefox 147 exploits, with Sonnet 5 at 0.0 working exploits, as taken from Anthropic

That combination (capable, cheaper, and not a cyber-offense powerhouse) is close to ideal for customer-facing automation. But "the model is safe" and "the deployment is safe" are two different sentences, which brings me to the part that actually matters if you run a support team.

What Sonnet 5 means if you run support

I work at eesel, where we've spent years putting AI agents on live support queues, and the thing that survives every model launch is this: a smarter, cheaper model lowers the cost of a correct answer, but it doesn't decide which tickets are safe to answer. We've watched confident-sounding bots quietly give wrong answers, which is exactly why the guardrails matter more than the raw benchmark.

A DTC supplements CX lead put the customer-side version of this to us more bluntly than any spec sheet could: "the AI will never be able to answer 100% of the questions. I need an AI who is only handling the tickets that it's confident to handle, and all the other ones, leave them alone." That instinct doesn't change when the underlying model gets a version bump.

Sonnet 5 makes the economics better, that's real. But the deciding factors for support automation, confidence-based routing, training on your own historical tickets, and the ability to simulate a rollout before it ever touches a customer, live in the layer around the model, not in the model itself. That's the part I'd tell anyone to focus on once they've picked a model they trust.

Try eesel

The nice thing about picking a support tool that's model-agnostic is that launches like Sonnet 5 just make it better underneath, with nothing for you to swap out. eesel AI plugs into your existing helpdesk, trains on your past tickets and help center, and runs on frontier models like Claude Sonnet 5, so you get the capability gains without re-plumbing anything.

The eesel AI reports dashboard showing resolution and analytics

Where it earns its keep is the part a raw model can't do: a simulation mode that replays thousands of your historical tickets so you see the resolution rate before going live, and confidence-based rules so the AI only auto-replies where it's sure and leaves the rest for a human. One customer, Gridwise, saw eesel resolve 73% of their tier-1 requests in the first month. You can try eesel free and simulate on your own tickets in a few minutes.

Frequently Asked Questions

Is Claude Sonnet 5 good?

Yes, with a caveat. On Anthropic's own benchmarks Sonnet 5 posts clear gains over Sonnet 4.6 in coding, agentic search, computer use, and reasoning, and it lands within a point or two of Opus 4.8 on several of them. The caveat is cost: a new tokenizer and higher default effort mean the real per-request price can creep up, so Sonnet 5 is 'good' as long as you watch the effort dial.

How much does Claude Sonnet 5 cost?

Standard API pricing is $3 per million input tokens and $15 per million output tokens, identical to Sonnet 4.6, with an introductory rate of $2/$10 running through August 31, 2026 per Anthropic's pricing docs. The sticker is the same as before, but the new tokenizer produces roughly 1.0-1.35x more tokens for the same text, so budget for that. See our Claude Sonnet 5 pricing breakdown for the full math.

Claude Sonnet 5 vs Opus 4.8: which should I use?

Use Sonnet 5 for most coding and agent work and anything high-volume or cost-sensitive, and reach for Opus 4.8 only for the hardest reasoning tasks that need maximum autonomy. Sonnet 5 gets most of the way there at a lower price, and you can push its effort level up to close the gap on demanding jobs.

What is the Claude Sonnet 5 effort parameter?

It's a dial (low, medium, high, xhigh, max) that controls how much the model thinks before answering. Higher effort means more reasoning tokens, higher cost, and better answers. Sonnet 5 is the first Sonnet-tier model to offer xhigh, and it defaults to high on the API and in Claude Code.

Is Claude Sonnet 5 safe to put in front of customers?

It's the safest Sonnet yet, with lower hallucination, less sycophancy, and stronger prompt-injection resistance than Sonnet 4.6, and it never produced a working exploit on Anthropic's Firefox 147 cyber test. That said, no model resolves every ticket correctly, which is why a support layer like eesel AI simulates on your past tickets and only auto-replies where it's confident.

Which Claude model does Claude.ai use by default now?

Claude Sonnet 5 is the default model for Free and Pro plans on Claude.ai across web, iOS, and Android, and it's also available to Max, Team, and Enterprise, plus Claude Code and Cowork. Opus 4.8 remains the higher-capability option and Haiku 4.5 the fast, cheap one.