Claude Sonnet 5 review: is it the one to actually use?

Kurnia Kharisma Agung Samiadjie
Written by

Kurnia Kharisma Agung Samiadjie

Katelin Teen
Reviewed by

Katelin Teen

Last edited July 2, 2026

Expert Verified
Claude Sonnet 5 review hero banner with the Anthropic logo

What Claude Sonnet 5 is, in one minute

I build on these models for a living, so I'll keep the spec recital short. Claude Sonnet 5 (claude-sonnet-5) is the direct successor to Sonnet 4.6 and the new default model on Claude.ai. Anthropic calls it "our most agentic Sonnet yet," and the headline specs back the framing: a 1M-token context window, up to 128K tokens of output, high-resolution vision, and adaptive thinking on by default.

The two things worth internalizing before anything else:

  • The effort parameter (low / medium / high / xhigh / max) is new territory for a Sonnet model, which is the first to get xhigh. It's the knob that decides how hard the model thinks, and it's the knob that decides your bill.
  • A new tokenizer that produces roughly 1.0-1.35x more tokens for the same text versus Sonnet 4.6, which matters more than it sounds. More on that below.

It shipped everywhere on day one: Claude.ai (web and mobile), Claude Code and Cowork, the Claude Platform, plus AWS, Google Cloud, and Microsoft Foundry. For a fuller rundown of what changed, my colleague wrote up what Claude Sonnet 5 is separately; this piece is the review.

The benchmarks: how good is it, really?

Here's the table Anthropic published, and it's the clearest way to see where Sonnet 5 actually lands.

Claude Sonnet 5 benchmark table comparing Sonnet 5, Sonnet 4.6, and Opus 4.8, as taken from Anthropic
Claude Sonnet 5 benchmark table comparing Sonnet 5, Sonnet 4.6, and Opus 4.8, as taken from Anthropic

A few numbers jump out. On agentic coding, Sonnet 5 hits 63.2% on SWE-bench Pro (up from 58.1% for Sonnet 4.6, and closing on Opus 4.8's 69.2%) and 80.4% on Terminal-Bench 2.1, nearly matching Opus 4.8's 82.7%. On computer use (OSWorld-Verified) it scores 81.2%, a hair under Opus 4.8's 83.4%. And on knowledge work (GDPval-AA v2) it posts 1618, edging out Opus 4.8's 1615, which is the kind of result that makes the "near-Opus" claim ring true rather than marketing.

The pattern holds on Humanity's Last Exam: with tools, Sonnet 5 hits 57.4% against Opus 4.8's 57.9%, basically a tie. Where Opus still clearly leads is tools-off reasoning (49.8% vs 43.2%), which is the tell for what Opus is still for.

What makes the story more interesting than a single-number leaderboard is how the gains show up as you spend more. Anthropic plotted pass rate against cost per task on BrowseComp, the agentic-search benchmark:

Claude Sonnet 5 agentic search performance by effort level on BrowseComp, as taken from Anthropic
Claude Sonnet 5 agentic search performance by effort level on BrowseComp, as taken from Anthropic

The orange Sonnet 5 line sits well above old Sonnet 4.6 at every price point, and it tracks the Opus 4.8 curve closely until the very top end. In plain terms: for most of the cost range, Sonnet 5 gives you Opus-shaped results for Sonnet money. That's the whole review in one chart.

Where Sonnet 5 sits in the Claude 5 lineup

Anthropic's naming is a bit of a maze right now (the flagship is called Fable 5, while Opus, Sonnet, and Haiku keep their own version numbers), so here's how the tiers actually stack by price and capability.

A price-and-capability ladder of the Claude 5 family, from Fable 5 at the top down to Haiku 4.5, with Sonnet 5 highlighted in the middle
A price-and-capability ladder of the Claude 5 family, from Fable 5 at the top down to Haiku 4.5, with Sonnet 5 highlighted in the middle

Sonnet 5 sits squarely in the middle: cheaper and faster than Opus 4.8, more capable than Haiku 4.5, and well below the Fable 5 frontier tier. Anthropic's own framing is that the biggest recent capability jumps have come from its Opus-class models, and that "Sonnet 5 narrows the gap." The benchmarks agree. The practical upshot: for a lot of teams, the "which Claude do I default to?" question now has a boring answer, and it's this one.

ModelRoleInput / output per MTokContext
Claude Fable 5Frontier flagship$10 / $501M
Claude Opus 4.8Most capable Opus-tier$5 / $251M
Claude Sonnet 5Best speed + intelligence balance$3 / $151M
Claude Haiku 4.5Fastest and cheapest$1 / $5200K

The effort dial is the real story

If you take one thing from this review, make it this: with Sonnet 5, you stop switching models to control cost and start turning a dial instead.

The Claude Sonnet 5 effort dial from low to max, showing that Sonnet 5 at medium performs like Sonnet 4.6 at high
The Claude Sonnet 5 effort dial from low to max, showing that Sonnet 5 at medium performs like Sonnet 4.6 at high

Anthropic's rough mapping is genuinely useful for budgeting: Sonnet 5 at medium performs roughly like Sonnet 4.6 did at high, and Sonnet 5 at high lands around where 4.6 sat at max. So the same quality bar now costs you fewer thinking tokens, which is a real efficiency win. The flip side is that the default is high, so out of the box Sonnet 5 thinks harder (and costs more per call) than you might expect if you're used to older defaults. Anthropic even raised rate limits across Chat, Cowork, Claude Code, and the API specifically to absorb the higher token usage at higher effort.

For anyone wiring Sonnet 5 into a product, the effort dial is where your cost engineering now lives. Set it too low and you leave quality on the table; leave it on max for simple tasks and you torch budget for no reason.

The value controversy: is it actually cheaper?

This is where the launch-week consensus split, and it's the most useful part of the review to sit with.

The sticker price is unchanged from Sonnet 4.6: $2 / $10 per million tokens as an introductory rate through August 31, 2026, then $3 / $15 after that. But two things quietly push the real cost up. First, that new tokenizer bills the same text on roughly 1.0-1.35x more tokens. Second, the higher default effort means more thinking tokens per request. Per-token price parity is not per-request price parity, and Anthropic is candid that the intro discount exists to keep the migration "roughly cost-neutral" while it lasts.

The independent benchmark aggregator Artificial Analysis put a sharp point on it:

"Claude Sonnet 5 achieves 53 on the Artificial Analysis Intelligence Index, but without promotional pricing will cost more per task than Opus 4.8."

They clocked it at about $2.29 per task on that index. On the enthusiast side, the reaction was the mirror image, Wade Foster, Zapier's CEO, summed up the bull case:

LinkedIn

"Claude Sonnet 5 is live. It does Opus-level work at Sonnet-level pricing."

Both are right, which is the point. At low and medium effort on typical work, Sonnet 5 is a clear value. Push it to xhigh or max on a hard task and you can spend past Opus 4.8 for a result Opus would have given you more cheaply. The threads on r/singularity have been chewing on exactly this. The verdict isn't "cheap" or "expensive," it's "cheap if you manage the dial."

Sonnet 5 vs Opus 4.8: which should you use?

Because they overlap so much (both 1M context, both strong on agents), the decision is less "better vs worse" and more "which one for this job."

A decision guide for choosing between Claude Sonnet 5 and Opus 4.8 based on task type, volume, and cost
A decision guide for choosing between Claude Sonnet 5 and Opus 4.8 based on task type, volume, and cost

My rule of thumb after reading the numbers: default to Sonnet 5 for the bulk of coding, tool use, and everyday agent work, especially anything you run at volume where cost compounds. Reach for Opus 4.8 when the task is genuinely hard reasoning, needs long stretches of autonomy, or is high-stakes enough that the marginal quality is worth the premium (and where Opus's tools-off reasoning edge actually matters). For most people, most of the time, Sonnet 5 is now the right default, and Opus is the escalation path, not the starting point.

Is it safe to put in front of customers?

For a chatbot demo, safety is a footnote. For an agent that autonomously touches real customer tickets, it's the whole ballgame, so I looked closely here.

The good news: Sonnet 5 is the safest Sonnet to date. Anthropic reports lower hallucination, less sycophancy, stronger prompt-injection robustness, and cleaner refusals of malicious requests than Sonnet 4.6. On the automated behavioral audit it scored better (safer) than its predecessor.

Rates of misaligned behavior across Claude models, with Sonnet 5 lower than Sonnet 4.6 but higher than Opus 4.8, as taken from Anthropic
Rates of misaligned behavior across Claude models, with Sonnet 5 lower than Sonnet 4.6 but higher than Opus 4.8, as taken from Anthropic

The honest wrinkle: Sonnet 5's misaligned-behavior score (2.53) is better than Sonnet 4.6 (2.89) but worse than Opus 4.8 (2.10) and Mythos Preview (1.95). So it's safer than the model it replaces, not the safest in the family. Anthropic's system card is upfront that Sonnet 5 poses "very low alignment risk (though higher than for previous Sonnet models)."

On offensive-cyber capability, though, Sonnet 5 is reassuringly weak: on Anthropic's Firefox 147 exploit-development test (built with Mozilla, all vulnerabilities since patched), it never produced a working exploit.

Scores measuring Claude models' success at developing Firefox 147 exploits, with Sonnet 5 at 0.0 working exploits, as taken from Anthropic
Scores measuring Claude models' success at developing Firefox 147 exploits, with Sonnet 5 at 0.0 working exploits, as taken from Anthropic

That combination (capable, cheaper, and not a cyber-offense powerhouse) is close to ideal for customer-facing automation. But "the model is safe" and "the deployment is safe" are two different sentences, which brings me to the part that actually matters if you run a support team.

What Sonnet 5 means if you run support

I work at eesel, where we've spent years putting AI agents on live support queues, and the thing that survives every model launch is this: a smarter, cheaper model lowers the cost of a correct answer, but it doesn't decide which tickets are safe to answer. We've watched confident-sounding bots quietly give wrong answers, which is exactly why the guardrails matter more than the raw benchmark.

A DTC supplements CX lead put the customer-side version of this to us more bluntly than any spec sheet could: "the AI will never be able to answer 100% of the questions. I need an AI who is only handling the tickets that it's confident to handle, and all the other ones, leave them alone." That instinct doesn't change when the underlying model gets a version bump.

Sonnet 5 makes the economics better, that's real. But the deciding factors for support automation, confidence-based routing, training on your own historical tickets, and the ability to simulate a rollout before it ever touches a customer, live in the layer around the model, not in the model itself. That's the part I'd tell anyone to focus on once they've picked a model they trust.

Try eesel

The nice thing about picking a support tool that's model-agnostic is that launches like Sonnet 5 just make it better underneath, with nothing for you to swap out. eesel AI plugs into your existing helpdesk, trains on your past tickets and help center, and runs on frontier models like Claude Sonnet 5, so you get the capability gains without re-plumbing anything.

The eesel AI reports dashboard showing resolution and analytics
The eesel AI reports dashboard showing resolution and analytics

Where it earns its keep is the part a raw model can't do: a simulation mode that replays thousands of your historical tickets so you see the resolution rate before going live, and confidence-based rules so the AI only auto-replies where it's sure and leaves the rest for a human. One customer, Gridwise, saw eesel resolve 73% of their tier-1 requests in the first month. You can try eesel free and simulate on your own tickets in a few minutes.

Frequently Asked Questions

Is Claude Sonnet 5 good?
Yes, with a caveat. On Anthropic's own benchmarks Sonnet 5 posts clear gains over Sonnet 4.6 in coding, agentic search, computer use, and reasoning, and it lands within a point or two of Opus 4.8 on several of them. The caveat is cost: a new tokenizer and higher default effort mean the real per-request price can creep up, so Sonnet 5 is 'good' as long as you watch the effort dial.
How much does Claude Sonnet 5 cost?
Standard API pricing is $3 per million input tokens and $15 per million output tokens, identical to Sonnet 4.6, with an introductory rate of $2/$10 running through August 31, 2026 per Anthropic's pricing docs. The sticker is the same as before, but the new tokenizer produces roughly 1.0-1.35x more tokens for the same text, so budget for that. See our Claude Sonnet 5 pricing breakdown for the full math.
Claude Sonnet 5 vs Opus 4.8: which should I use?
Use Sonnet 5 for most coding and agent work and anything high-volume or cost-sensitive, and reach for Opus 4.8 only for the hardest reasoning tasks that need maximum autonomy. Sonnet 5 gets most of the way there at a lower price, and you can push its effort level up to close the gap on demanding jobs.
What is the Claude Sonnet 5 effort parameter?
It's a dial (low, medium, high, xhigh, max) that controls how much the model thinks before answering. Higher effort means more reasoning tokens, higher cost, and better answers. Sonnet 5 is the first Sonnet-tier model to offer xhigh, and it defaults to high on the API and in Claude Code.
Is Claude Sonnet 5 safe to put in front of customers?
It's the safest Sonnet yet, with lower hallucination, less sycophancy, and stronger prompt-injection resistance than Sonnet 4.6, and it never produced a working exploit on Anthropic's Firefox 147 cyber test. That said, no model resolves every ticket correctly, which is why a support layer like eesel AI simulates on your past tickets and only auto-replies where it's confident.
Which Claude model does Claude.ai use by default now?
Claude Sonnet 5 is the default model for Free and Pro plans on Claude.ai across web, iOS, and Android, and it's also available to Max, Team, and Enterprise, plus Claude Code and Cowork. Opus 4.8 remains the higher-capability option and Haiku 4.5 the fast, cheap one.

Share this article

Kurnia Kharisma Agung Samiadjie

Article by

Kurnia Kharisma Agung Samiadjie

Related Posts

All posts →
Editorial illustration of Claude Opus 4.8, Anthropic's flagship AI model
AI

What is Claude Opus 4.8? A clear-eyed look at Anthropic's flagship model

Claude Opus 4.8 is Anthropic's latest flagship model. Here's what changed, what it costs, and what a smarter model actually means for AI customer support.

Riellvriany IndriawanRiellvriany IndriawanJun 17, 2026
Claude Sonnet 5 illustration with the Anthropic mark and a support workflow
AI news

Claude Sonnet 5: what it means for customer support

Claude Sonnet 5 brings near-Opus coding and agentic quality at mid-tier prices. Here is what the model actually changes for support teams, and what it does not.

Rama Adi NugrahaRama Adi NugrahaJul 1, 2026
Illustration of Claude Fable 5 working as a long-running autonomous teammate for a business team
AI

Claude Fable 5 for business: what Anthropic's most powerful model actually means for your team

A clear-eyed look at Claude Fable 5 for business: what it costs, where it shines, where it bites, and how to actually put it to work in customer support.

Alicia Kirana UtomoAlicia Kirana UtomoJun 17, 2026
Illustration of a coding session publishing a dashboard artifact to a shareable link
AI

What are Claude Code artifacts? A clear guide for 2026

Claude Code artifacts turn a coding session into a live, shareable web page. Here is what they are, how they work, and how they differ from chat artifacts.

Alicia Kirana UtomoAlicia Kirana UtomoJun 21, 2026
Editorial illustration of Claude Opus 4.8 for business use
AI

Claude Opus 4.8 for business: what it changes, and what it doesn't

Claude Opus 4.8 is Anthropic's flagship model. Here's a practical, operator's read on what it means for your business, what it costs, and where it falls short.

Alicia Kirana UtomoAlicia Kirana UtomoJun 17, 2026
Illustration comparing Claude Sonnet 5 alternatives, frontier AI models, in Anthropic-accented style
AI news

7 best Claude Sonnet 5 alternatives in 2026

The 7 best Claude Sonnet 5 alternatives in 2026, from GPT-5.6 and Gemini to open-weight models like GLM-5.2, with a builder's take on which one to actually pick.

Kurnia Kharisma Agung SamiadjieKurnia Kharisma Agung SamiadjieJul 2, 2026
Claude Sonnet 5 pricing illustration with the Anthropic mark
AI news

Claude Sonnet 5 pricing: the real cost in 2026

Claude Sonnet 5 pricing is $3/$15 per million tokens ($2/$10 intro). Here is the full table, the tokenizer catch, and what it really costs for support.

Kurnia Kharisma Agung SamiadjieKurnia Kharisma Agung SamiadjieJul 2, 2026
Two people speaking different languages with a live sound wave bridging them, illustrating Gemini 3.5 Live Translate
AI

What is Gemini 3.5 Live Translate?

Gemini 3.5 Live Translate is Google's real-time speech-to-speech translation model for 70+ languages. Here's what it does, how it works, and where it fits.

Riellvriany IndriawanRiellvriany IndriawanJun 17, 2026
Editorial illustration for a guide to what Claude Fable 5 can do, Anthropic's most powerful AI model
AI models

What can Claude Fable 5 do? A capability-by-capability guide

What can Claude Fable 5 do? Run for days unattended, write and ship code, read 1M-token documents, and check its own work. Here's what that means in practice.

Riellvriany IndriawanRiellvriany IndriawanJun 17, 2026

Ready to hire your AI teammate?

Set up in minutes. No credit card required.

Get started free