Blog / Guides

What can Claude Fable 5 do? A capability-by-capability guide

Written by

Riellvriany Indriawan

Reviewed by

Katelin Teen

Last edited July 8, 2026

Expert Verified

Editorial illustration for a guide to what Claude Fable 5 can do, Anthropic's most powerful AI model

Update — July 8, 2026: Anthropic is extending Fable 5 access. Per Claude on X: "We’re extending access to Claude Fable 5 on all paid plans through July 12." — @claudeai

TL;DR

Claude Fable 5 is Anthropic's most powerful model, and the short answer to "what can it do" is: real, multi-stage work that runs for days without a human re-prompting at every turn. It plans, writes and ships code, reads enormous documents, delegates to sub-agents, and checks its own output. On Anthropic's own benchmarks it's the strongest model the public can use, and early testers call it the best coding model available.

Two honest caveats sit next to that. It's slow and expensive, at $10 per million input tokens and $50 per million output, twice the price of Opus 4.8. And it won't always answer at full strength: a safety layer quietly routes some prompts back to a weaker model.

If you run a support or operations team, the practical takeaway is that a model this capable is the engine, not the car. What you actually deploy is an AI agent built on top of one, with the knowledge, guardrails, and testing the raw model doesn't include.

So what can Claude Fable 5 actually do?

Claude Fable 5 is Anthropic's fifth model generation and a new "Mythos-class" tier that sits above Claude Opus 4.8, which in turn sits above Sonnet 4.6. If you've read our Claude overview, this is the new ceiling. It launched on 9 June 2026 and runs on claude.ai, the Claude API, Claude Code, AWS, and Microsoft Foundry.

But specs and tiers aren't really what people mean when they ask what it can do. They mean: what work can I hand it and trust it to finish? Here's the honest map of its concrete capabilities, then we'll go through each one.

A map of what Claude Fable 5 can do: work for days unattended, write and ship code, read 1M-token documents, check its own work, and run sub-agents in parallel

It runs autonomously for days, then checks its own work

This is the capability Anthropic actually built Fable 5 around, and it's the one that matters most. Run it inside a harness like Claude Code or Claude Managed Agents and, in Anthropic's words, it can "work for days at a time: planning across stages, delegating to sub-agents, and checking its own work."

That loop, plan then delegate then work then check, is the part that's actually new. Earlier models lost the thread on long, multi-stage tasks; this one keeps its footing and, crucially, marks its own homework. Anthropic describes it as "thorough, proactive, and tests its own work," and the cloud providers spell out a plan, check, refine loop baked in. Self-correction is the difference between an agent you babysit and one you can leave running overnight.

How Claude Fable 5 works as a long-horizon agent: plan across stages, delegate to sub-agents, do the work, check its own work, then loop until done

The scale this unlocks is real. In early testing, Stripe pointed Fable 5 at a 50-million-line Ruby codebase and ran a migration across the whole thing in a day, and community reports describe sessions spinning up up to 1,000 parallel sub-agents for codebase-scale work. That ability to hold a goal, break it into stages, and grind through them is exactly what separates an AI agent from a rule-based chatbot: one finishes the job, the other waits for the next instruction.

It writes and ships production-grade code

The flashiest thing Claude Fable 5 can do is write software that actually works. On Anthropic's published comparison it scores 80.3% on SWE-Bench Pro for agentic coding, against 69.2% for Opus 4.8, with GPT 5.5 at 58.6% and Gemini 3.1 Pro at 54.2%. On the tougher FrontierCode (Diamond) benchmark it more than doubles Opus, jumping to 29.3% from 13.4%. CNBC reported the gap as more than 10% higher than Opus 4.8 on some tests.

Anthropic's benchmark table comparing Claude Mythos 5 / Fable 5 against Claude Opus 4.8, GPT 5.5, and Gemini 3.1 Pro, as taken from Anthropic

Numbers are one thing; a full day of real work is another. Developer Simon Willison pointed Fable at his open-source LLM library, and it identified and implemented four separate fixes, then shipped a new release that was almost entirely written by the model. His verdict captures the productivity ceiling:

"I'm really impressed with the quality of API design, tests, code and documentation that Fable put together for this. I spent several hours on it today, but it feels like several days' worth of work." - Simon Willison

He wasn't alone. Andrej Karpathy called it a step change worth a major-version bump, and one developer running the FrontierCode benchmark posted a striking progression: Opus 4.7 at 5.2%, Opus 4.8 at 13.4%, Fable 5 at 29.3%. If you're weighing up where it sits against the rest of the field, our roundup of the best AI coding assistant tools and the best Claude AI developer tools is a good next read.

It reads the long, messy documents you already have

Plenty of business work isn't code, it's documents, and this is where the 1,000,000-token context window earns its keep. Fable 5 "understands diagrams, charts, and tables nested in files and PDFs," which Anthropic frames around finance, legal, and analytics work, and there's no price premium for filling that full context.

The concrete proof came from a Hacker News user who handed it a 50-page PDF of dense, interconnected specs and got back a correct breakdown of what was done, partly done, and missing:

"I gave it a 50 page PDF of fairly dense and interconnected specs and asked it which had been implemented... it correctly identified what was done, what was partially done, and what was missing." - Hacker News commenter

For any team sitting on a pile of contracts, policy docs, or a sprawling knowledge base, that's more useful day-to-day than another point on a coding leaderboard. It's also the same muscle a support agent uses when it reads your help docs and past tickets to answer a customer, just pointed at internal documents instead.

What it costs to do all this

Here's the part that tempers the excitement. Everything above runs at frontier-tool prices: $10 per million input tokens and $50 per million output, exactly twice Opus 4.8. Cached input tokens get a 90% discount, and there's a 1.1x surcharge for US-only inference, but the headline rate is what you'll feel. For how Fable 5 stacks up against the rest of the lineup, our Claude pricing guide breaks down every tier, and the Claude Pro plan is where most individuals first meet it.

Spec	Claude Fable 5
Launched	9 June 2026
Model class	"Mythos-class", a tier above Opus 4.8
Context window	1,000,000 tokens
Max output	128,000 tokens
Knowledge cut-off	January 2026
Input price	$10 / 1M tokens ($1 cached)
Output price	$50 / 1M tokens
Long-context surcharge	None

How much you actually spend depends almost entirely on how hard you let it think. Simon Willison ran his "draw a pelican on a bicycle" test across all five thinking-effort levels, and the cost for a single image ranged from under 10 cents at "low" to about 72 cents at "max". The effort level is a dial you set, and it's the main lever on your bill.

Claude Fable 5's thinking-effort dial: the same task costs about 10 cents at low effort and about 72 cents at max, a 7x swing you control

The bill adds up fast at the high end. Willison tracked a single day of testing at $110.42 of token spend. But there's a real counterweight: Canva's evaluations lead found Fable used about half the tokens of Opus 4.8 in their internal agentic harnesses, so a smarter model that finishes in fewer steps can land at roughly the same real-world cost. The lesson isn't "Fable is unaffordable," it's that your costs depend entirely on how you run it.

What Claude Fable 5 won't do

Capabilities cut both ways, and there's one thing Fable 5 deliberately won't do at full strength. For cybersecurity, biology, chemistry, and model-distillation prompts, a new generation of classifiers detects the topic and routes your response to Opus 4.8 instead, and you're told it happened. Anthropic says at least 95% of sessions never trigger any fallback.

The catch is the false positives. Developers reported being switched to the weaker model mid-session for completely benign work, including one user who got refused on a basic liquid-handling protocol with nothing risky in it. AI-policy writer Nathan Lambert flagged a second, quieter mechanism for prompts that look like frontier AI research, where the model can get less effective without telling you. The practical advice: if your work sits in a technical vertical, test before you commit to it.

What all this means if you run a support team

This is where we live, so let's be specific. Given everything Fable 5 can do, should a support leader rush to wire it into their helpdesk? Mostly, not as much as the hype suggests.

Here's the uncomfortable truth about AI for customer service: for tier-1 tickets, the model is rarely the bottleneck. Most teams shopping for customer service automation are quietly over-indexing on which model sits underneath. A well-grounded Opus 4.8 or even Sonnet 4.6 already answers the overwhelming majority of "where's my order," "how do I reset my password," "what's your refund policy" questions correctly. Paying double for Fable 5 to answer them is like renting a Formula 1 car for the school run. What actually decides whether your AI helpdesk agent works is everything wrapped around the model. It's the same pattern that separates the strong tools in any AI helpdesk software roundup from the forgettable ones.

What turns a raw model into a customer-ready support agent: the frontier model is the engine, with knowledge, confidence routing, and integrations stacked on top

Three things matter more than the model tier. First, does it know your business? The win comes from training on your past tickets and help docs, not from a smarter base model. Second, does it know when to stay quiet? Raw models answer confidently even when wrong, which is precisely why chatbots give bad answers; production agents need confidence-based routing, the heart of any good ticket triage setup, so low-confidence tickets get drafted or escalated, not auto-sent. As one DTC supplements CX lead put it in a customer interview, the AI will never answer 100% of questions, so what they actually want is an agent that only handles the tickets it's confident about and leaves the rest alone. That's a product capability, not a model capability.

Third, can you trust it before it goes live? That points straight at the build-versus-buy question, which comes up constantly: "Anthropic just shipped an incredible model, why not build our support bot on the API?" You can. It's also a bigger project than it looks, because the model gives you intelligence but not the helpdesk connection, the guardrails, the simulation environment, or the reporting. Several technical teams who tried it switched to buying instead:

"We could try to write our own LLM application but we didn't want to invest our time into that. We wanted something that we would not have to maintain." - Karel, GENERAL BYTES

A frontier model is the bottom layer of the stack, not the whole stack. If your core product is AI, build. If it's anything else and you just want tickets answered well, buying the layers above the model is faster, cheaper, and less fragile, the same logic behind choosing any AI for ticket automation over a homegrown script.

Try eesel

eesel AI is the layer that sits on top of frontier models like Claude, so you get the capability without the engineering project. It plugs into your existing helpdesk (Zendesk, Freshdesk, HubSpot, Gorgias, and 100+ integrations), learns from your past tickets and help docs on day one, and answers across triage, drafting, and resolution.

eesel AI helpdesk dashboard overview, showing how it connects to your existing tools and resolves support tickets

The differentiator is the part Fable 5 can't give you on its own: a simulation mode that runs the agent against thousands of your past tickets so you see exactly how it would have responded, and what your resolution rate would be, before a single customer talks to it. That's how Gridwise got to 73% of tier-1 requests resolved in their first month. And because pricing is usage-based at $0.40 per resolved ticket with no per-seat fees, you pay for outcomes, not for tokens you can't predict. You can try eesel free with $50 of usage and no credit card.

Frequently Asked Questions

What can Claude Fable 5 do that older models couldn't?

The headline thing Claude Fable 5 can do is sustain long, multi-stage work: Anthropic says it can work for days, planning across stages, delegating to sub-agents, and checking its own work, where earlier models tapped out. It also tops agentic coding benchmarks and reads 1M-token documents. If you want that capability pointed at support tickets, that's what an AI helpdesk agent is built for.

Can Claude Fable 5 write and run code on its own?

Yes. It scores 80.3% on SWE-Bench Pro for agentic coding, and developer Simon Willison had it write almost an entire open-source library release in an afternoon. For a wider view of the field, see our roundup of the best AI coding assistant tools.

Can Claude Fable 5 handle customer support questions?

It can answer them, but you rarely want a raw frontier model wired straight into a queue. The model is rarely the bottleneck for tier-1 tickets; grounding and routing decide whether answers are right. A platform like eesel AI adds the knowledge, confidence routing, and testing a model alone doesn't give you.

How much does it cost to run Claude Fable 5?

Claude Fable 5 costs $10 per million input tokens and $50 per million output tokens, twice the price of Opus 4.8, and there's no premium for using the full context. If you'd rather pay per outcome than per token, eesel's usage-based pricing starts at $0.40 per resolved ticket.

What can't Claude Fable 5 do?

It won't always answer at full strength: for cybersecurity, biology, and chemistry prompts its safety classifiers route the response back to Opus 4.8, and some benign technical work gets caught too. It also won't manage your helpdesk, guardrails, or reporting; that's the job of a layer like an AI agent built on top of it.

Hire your AI teammate

Set up in minutes. No credit card required.

Try for free Book a demo

Share this article

Article by

Riellvriany Indriawan

Riell is a designer and writer at eesel AI with about two years of experience researching CX platforms, AI chatbots, and helpdesk software. She combines her design background with a sharp eye for how these tools actually look and feel in practice — making her comparisons unusually visual and user-focused.