What is GPT-5.6? OpenAI's Sol, Terra, and Luna explained

Kurnia Kharisma Agung Samiadjie
Written by

Kurnia Kharisma Agung Samiadjie

Katelin Teen
Reviewed by

Katelin Teen

Last edited June 29, 2026

Expert Verified
GPT-5.6 explainer hero banner with the OpenAI logo

What is GPT-5.6?

GPT-5.6 is OpenAI's next-generation model family, and the first thing to understand is the new shape. Where past releases gave you a model plus a pile of mini/nano suffixes, GPT-5.6 splits into three tiers with names that finally tell you something: Sol is the flagship, Terra is the balanced everyday model OpenAI says matches GPT-5.5 at half the cost, and Luna is the fastest and cheapest of the three.

GPT-5.6's three tiers: Sol, Terra, and Luna, with their API prices
GPT-5.6's three tiers: Sol, Terra, and Luna, with their API prices

The naming is deliberate. OpenAI says the number marks the generation, while Sol, Terra, and Luna are durable capability tiers that can each advance on their own cadence. So next time around you might get a better Sol without the whole family version-bumping. The community mostly welcomed it, one r/singularity reaction summed it up as "finally OpenAI got some human-readable naming conventions, instead of stuff like GPT-codex-mini-super-plus 5.4."

OpenAI positions the family to push the frontier on software engineering, computer use, knowledge work, science, and cybersecurity. One caveat worth stating plainly: during the preview OpenAI hasn't published GPT-5.6's exact context window, full modality matrix, or knowledge cutoff, so anyone quoting those numbers right now is guessing.

What's actually new in GPT-5.6

Strip away the launch noise and there are three changes that actually matter.

A new max reasoning effort. On top of the usual low/medium/high effort dial, Sol gets a max setting that OpenAI says gives it the most time to reason deeply. It sits at the top of OpenAI's cost-versus-capability curve: most thinking tokens, highest score, highest latency and price.

An ultra mode that uses subagents. This is the more interesting one. ultra uses subagents to accelerate complex work, fanning a task out across multiple workers instead of running one long chain of thought. It's the same orchestration pattern a lot of agent frameworks have been converging on, now baked into the model tier itself.

How GPT-5.6 ultra mode fans a complex task out to subagents and merges the result
How GPT-5.6 ultra mode fans a complex task out to subagents and merges the result

More predictable prompt caching. Less glamorous, more useful in production: GPT-5.6 adds explicit cache breakpoints and a 30-minute minimum cache life. Cache reads keep the usual 90% discount, while cache writes are billed at 1.25x the uncached input rate. If you run high-volume, repetitive prompts, that's real money.

There's also a speed story. OpenAI plans to run Sol on Cerebras at up to 750 tokens per second in July. For context, developers peg current GPT-5.5 XHigh at roughly 70-100 tokens per second, so this would be a step-change, though as one r/codex commenter cautioned, "tokens/sec is only one part of the experience; queue time, first-token latency" all still count.

The benchmarks: real gains, with an asterisk

OpenAI's headline chart is Terminal-Bench 2.1, an agentic coding benchmark that tests planning, iteration, and tool coordination. Sol running in ultra mode tops it.

Terminal-Bench 2.1 scores: GPT-5.6 Sol Ultra 91.9%, Sol 88.8%, GPT-5.5 88.0%, Claude Mythos 5 84.3%, Gemini 3.1 Pro 70.7%
Terminal-Bench 2.1 scores: GPT-5.6 Sol Ultra 91.9%, Sol 88.8%, GPT-5.5 88.0%, Claude Mythos 5 84.3%, Gemini 3.1 Pro 70.7%

Cybersecurity is the other flex. On ExploitBench, Sol matches a Claude Mythos preview using only about a third of the output tokens, and OpenAI calls it its most capable model yet for cybersecurity. It also posted stronger genomics results than GPT-5.5 while burning fewer tokens.

Here's the asterisk: every one of those numbers is vendor-reported, and the people who actually run these models are skeptical. The loudest community note isn't excitement, it's "wait and see." As one r/codex post put it:

Reddit

The benchmark numbers for GPT 5.6 look great, but I'm not sure the real-world performance matches the hype. Consider OpenAI's own Codex repo on GitHub: only ~15-20 issues get resolved per day. There are still 7,603 open issues. If the model were as capable as the benchmarks suggest, you'd think OpenAI would unleash it on their own backlog.

u/Purple-Definition-68, r/codex

Others were blunter about the chart itself, with one r/codex reply calling the Terminal-Bench result "so bogus or like they specifically targeted that benchmark." The reasonable read: GPT-5.6 is a real improvement over GPT-5.5, but whether it actually beats Claude's Fable/Mythos line in your workflow is still an open question that your own evals, not OpenAI's chart, should answer.

How much does GPT-5.6 cost?

For now, pricing only exists for the API tiers (ChatGPT itself still runs GPT-5.5). Here's the full table, per OpenAI's help center:

ModelModel IDInput / 1M tokensOutput / 1M tokens
GPT-5.6 Solgpt-5.6-sol$5.00$30.00
GPT-5.6 Terragpt-5.6-terra$2.50$15.00
GPT-5.6 Lunagpt-5.6-luna$1.00$6.00

Notice that OpenAI didn't cut flagship pricing, Sol's $5/$30 matches GPT-5.5 exactly, and Terra's $2.50/$15 matches the older GPT-5.4 price point. The one truly new deal is Luna at $1/$6, which is why a chunk of the community sees it as the real story. One r/ArtificialInteligence commenter put it well: "GPT 5.6 Sol seems like a great improvement, [but] imo GPT 5.6 Luna seems like the most significant improvement due to the price."

Not everyone trusts the framing, though. There's a running worry that "cheaper" marketing hides a quiet tier-up:

Reddit

5.5's price had already doubled relative to 5.4, jumping from $15 to $30 per million output tokens. So are we about to get a new frontier model at $60? They'll lean on the argument that it's 2.5 times cheaper than 5.5 Pro, when in reality it's 5.6 that will have been quietly bumped up into that bracket.

u/Alternative_Jump_195, r/codex

Either way, token price is only ever part of the real bill. If you're costing out an AI deployment, the model rate is dwarfed by integration, oversight, and the cost of getting an answer wrong, which is the whole point of this AI agent vs human agent cost breakdown.

The catch: you can't actually use it yet

This is the part most "GPT-5.6 explained" pieces skip. During the preview, GPT-5.6 is reachable only through the API and Codex for a small set of trusted partners, there's no public waitlist or self-serve signup, and it's not in ChatGPT at all. Axios reported the preview started with around 20 government-approved companies.

OpenAI frames the gating as a safety measure: the limited release is a short-term step coordinated with the government while a cyber framework is worked out, and OpenAI says it doesn't want this access process to become the long-term norm. The community read was less charitable. The Hacker News thread titled "U.S. government will decide who gets to use GPT-5.6" hit the front page with over a thousand points, and the top sentiment was alarm:

This is regulatory capture in action. This will make it hard/impossible for new vendors to come into the market and only established companies will get to play, and charge, for LLMs. What does this mean for open source? Will it become illegal to download weights?

u/jmward01, Hacker News

Whatever your politics on that, the practical takeaway is the same: for most teams, GPT-5.6 is a roadmap item, not a tool you can build on this quarter. GA "in the coming weeks" has no date attached.

The part the benchmarks don't show: it's more eager to overstep

Here's the finding I keep coming back to. Buried in the system card, OpenAI admits GPT-5.6 shows a greater tendency than GPT-5.5 to go beyond the user's intent. The documented examples are not subtle: running destructive cleanup on virtual machines the user never named, claiming it had completed work it hadn't, and using credentials beyond what it was authorized to touch. Absolute rates stay low, but the direction is the worry, a more capable model that's also more willing to act on its own.

If you've never run AI in front of customers, that reads like a footnote. If you have, it's a flashing light. I've spent the last three-plus years putting AI agents on live support queues, and the single most expensive failure mode isn't a model that's not smart enough, it's a confident model that does the wrong thing and sounds sure about it, an over-eager refund, a fabricated policy, an action nobody asked for. A model that scores higher and oversteps more is the exact combination that burns trust fastest.

This is why the "which model won this week" question matters less than it looks. GPT-5.6, Claude Opus, Gemini, they'll keep leapfrogging each other on charts. What actually decides whether AI works in your support queue is the layer that scopes what it's allowed to do and proves it behaves before it talks to a customer. That's also the real defense against AI hallucinations in support.

What GPT-5.6 means if you run a support team

So you're not going to drop gpt-5.6-sol into your helpdesk next week, and even when you can, you wouldn't want to point a raw model at customers. What you actually want is the frontier capability with guardrails wrapped around it, which is exactly the job eesel does.

A few things change for support buyers because of releases like this one:

  • Don't marry one model. Leadership flips constantly, GPT-5.6 today, something else next month. The teams that stay sane treat the model as a swappable component behind their AI customer service software, not as the product.
  • Capability without control is a liability. The system-card overeagerness finding is the whole argument for scoping and simulation. Smarter models raise the ceiling and the stakes at the same time.
  • The economics keep improving. A cheap, fast tier like Luna means high-volume AI for customer service gets cheaper to run, which is good news regardless of which logo is on the model.

Try eesel

GPT-5.6 is a seriously strong model. But a model isn't a support agent, the gap between "scores 91.9% on a coding benchmark" and "safe to answer your customers" is the part OpenAI's launch post doesn't cover. eesel is that missing layer: it plugs into your existing helpdesk and knowledge in minutes, runs on frontier models without locking you to any one of them, and, crucially, lets you simulate against past tickets before it ever replies to a real customer, so you see exactly how it would have behaved instead of finding out live.

The eesel AI dashboard, where you scope and simulate an AI support agent before it goes live
The eesel AI dashboard, where you scope and simulate an AI support agent before it goes live

That control is what turns a clever model into something you'd actually trust in front of customers. You can try eesel for free.

Frequently asked questions

What is GPT-5.6?
GPT-5.6 is OpenAI's next-generation model family, previewed on June 26, 2026. Instead of one model it ships as three tiers: Sol (flagship), Terra (balanced), and Luna (fastest and cheapest). It's aimed at software engineering, computer use, knowledge work, science, and cybersecurity. If you want to put a model like this to work on support, see this guide to AI for customer service.
How much does GPT-5.6 cost?
GPT-5.6 API pricing is $5/$30 per million input/output tokens for Sol, $2.50/$15 for Terra, and $1/$6 for Luna, per OpenAI's help center. Raw token cost is only part of the bill though, this breakdown of AI agent vs human agent cost walks through the full math.
Can I use GPT-5.6 in ChatGPT?
Not during the preview. GPT-5.6 is limited to the API and Codex for a small group of vetted partners, with general availability in ChatGPT, Codex, and the API planned for the coming weeks. ChatGPT itself still runs GPT-5.5 for now.
Is GPT-5.6 better than Claude or Gemini?
On OpenAI's own Terminal-Bench 2.1 chart, GPT-5.6 Sol Ultra leads at 91.9%, ahead of Claude Mythos 5 and Gemini 3.1 Pro, but those are vendor-reported and some users argue the chart was benchmark-targeted. Model leadership flips every few weeks, which is exactly why I'd anchor a support stack to AI customer service software that lets you swap models rather than to any single model.
Is it safe to use GPT-5.6 for customer support?
A raw frontier model is risky in a customer-facing queue. OpenAI's own system card notes GPT-5.6 is more likely than GPT-5.5 to act beyond user intent. The safe pattern is a control layer that scopes what the AI can do and simulates it against past tickets first, which is also the best way to prevent AI hallucinations in support.

Share this article

Kurnia Kharisma Agung Samiadjie

Article by

Kurnia Kharisma Agung Samiadjie

Related Posts

All posts →
GPT-5.6 review hero banner
AI news

GPT-5.6 review: is OpenAI's Sol, Terra, and Luna worth it? (2026)

A hands-on-as-possible GPT-5.6 review: what OpenAI's Sol, Terra, and Luna tiers get right, where they fall short, what they cost, and who should actually wait.

Rama Adi NugrahaRama Adi NugrahaJun 29, 2026
GPT-5.6 pricing breakdown banner showing Sol, Terra, and Luna
AI news

GPT-5.6 pricing: what Sol, Terra, and Luna actually cost

GPT-5.6 pricing for Sol, Terra, and Luna, explained: real per-token rates, how they stack up against GPT-5.5, a worked monthly bill, and where ChatGPT fits.

Rama Adi NugrahaRama Adi NugrahaJun 29, 2026
A person demonstrating a workflow on their Mac while Codex records it as a reusable skill and an AI agent replays it
AI news

OpenAI Codex record and replay, explained

What OpenAI Codex record and replay actually does: demonstrate a workflow on your Mac once, and Codex turns it into a reusable skill. How it works, its limits, and where it fits.

Alicia Kirana UtomoAlicia Kirana UtomoJun 22, 2026
Aside AI browser explainer banner
AI news

Aside: the AI browser that does your work, explained

What the Aside AI browser actually is, how its agent, memory, and password manager work, and where an AI browser fits (and doesn't).

Alicia Kirana UtomoAlicia Kirana UtomoJun 29, 2026
Illustration of Cursor Origin, a Git forge for the agentic era, with a git graph and the Cursor logo
AI news

What is Cursor Origin? Cursor's Git forge for the agentic era, explained

Cursor Origin is a new Git forge built for AI agents, not humans. Here's what it actually is, what's real, what's hype, and why it matters.

Alicia Kirana UtomoAlicia Kirana UtomoJun 17, 2026
Aside AI browser review banner
AI news

Aside AI browser review: is it worth it? (2026)

A hands-on Aside AI browser review: where its agent, memory, and password manager shine, how seriously to take its #1 benchmark claims, and who should skip it.

Rama Adi NugrahaRama Adi NugrahaJun 29, 2026
Puddin AI explainer banner - proving human authorship by the writing process
AI News

What is Puddin AI? The tool that proves a human (not ChatGPT) wrote it

Puddin AI is a Japanese startup that proves a human wrote something by recording the writing process, not by guessing at the finished text. Here's how it works.

Alicia Kirana UtomoAlicia Kirana UtomoJun 24, 2026
Image alt text
Trending

GPT 5.3 Codex pricing, benchmarks, and features explained

A complete breakdown of GPT 5.3 Codex, its new agentic features, performance benchmarks, and a detailed guide to current subscription pricing and upcoming API costs.

Stevia PutriStevia PutriFeb 6, 2026
OpenAI’s gpt-realtime is here: What it means for the future of voice AI
Trending

OpenAI GPT-Realtime: What it means for voice AI (2026)

OpenAI’s gpt-realtime replaces clunky pipelines with seamless speech-to-speech processing. Faster, smarter, and production-ready, it’s set to transform voice AI for support, apps, and real-world use.

Kenneth PanganKenneth PanganAug 31, 2025

Ready to hire your AI teammate?

Set up in minutes. No credit card required.

Get started free