Our complete GPT 5.3 Codex review: A new era for agentic AI

Stevia Putri
Written by

Stevia Putri

Katelin Teen
Reviewed by

Katelin Teen

Last edited February 6, 2026

Expert Verified
Image alt text

On February 5, 2026, OpenAI released GPT-5.3-Codex, its newest coding model. The release coincided with Anthropic's Opus 4.6, highlighting the competitive pace of AI development.

OpenAI is positioning this as more than a minor update. They are shifting Codex from a powerful code generator into a general-purpose agent that can operate a computer and handle professional workflows from start to finish. The concept moves from a tool toward an AI teammate.

This article will break down what’s new, review its performance, and analyze what this means for developers and businesses.

What is GPT 5.3 Codex?

At its core, GPT-5.3-Codex is what OpenAI calls its "most capable agentic coding model to date." It follows GPT-5.2-Codex, but with a significantly expanded scope.

According to OpenAI's official announcement, the new model is built on three main principles:

  1. Top-tier agentic skills: The model is designed to handle long, complex tasks across the software development lifecycle and other professional domains.
  2. Improved efficiency: It is reportedly 25% faster and uses fewer tokens than the previous version, which enhances user experience and reduces operational costs.
  3. Self-improvement: Notably, OpenAI states the model helped "create itself." It assisted engineers with tasks like debugging its own training and managing deployments.
The concept is to provide an interactive partner rather than a tool that simply follows commands. This positions it as a teammate that can be guided in real-time, not just an assistant for task delegation.
An infographic detailing the core principles of the GPT 5.3 Codex review: top-tier agentic skills, improved efficiency, and self-improvement.
An infographic detailing the core principles of the GPT 5.3 Codex review: top-tier agentic skills, improved efficiency, and self-improvement.

New capabilities of GPT 5.3 Codex

Let's get into the details of how this new model performs. We’ve dug into OpenAI's claims and the early analysis to see what’s really going on.

Benchmark performance: A leap in agentic skills

OpenAI backed up its release with new scores on key industry benchmarks. These numbers show a significant jump in what the AI can do on its own.

Here’s a look at the data from their blog post, visualized for clarity:
A bar chart infographic for our GPT 5.3 Codex review, comparing its benchmark scores against GPT-5.2-Codex on SWE-Bench Pro, Terminal-Bench 2.0, and OSWorld-Verified.
A bar chart infographic for our GPT 5.3 Codex review, comparing its benchmark scores against GPT-5.2-Codex on SWE-Bench Pro, Terminal-Bench 2.0, and OSWorld-Verified.
BenchmarkGPT-5.3-CodexGPT-5.2-CodexImprovement
SWE-Bench Pro56.8%56.4%A slight edge in multi-language software engineering.
Terminal-Bench 2.077.3%64.0%A massive leap in command-line proficiency.
OSWorld-Verified64.7%38.2%A huge jump in general computer productivity tasks.

The improvements in Terminal-Bench and OSWorld are significant. This suggests the model has improved capabilities for operating within a digital environment and using tools like a person would.

However, the competitive landscape is strong. Community analysis shows that while Codex's 77.3% on Terminal-Bench 2.0 beats Anthropic's Opus 4.6 (65.4%), the tables turn on OSWorld. There, Opus 4.6 scores 72.7% to Codex's 64.7%. This indicates that neither model currently leads across all agentic skills.

Yes. And this is from someone who has always hated codex and only used 5.2 high and xhigh. But 5.3-codex-xhigh is amazing, I’ve build more in 4 hours than I have in the last week.

From coding assistant to professional collaborator

OpenAI is clearly positioning Codex as more than just a tool for developers. They are showing off its ability to manage entire professional workflows.

For example, they shared demos where Codex created a 10-slide PowerPoint presentation for a financial advisor and built fully functional racing and diving games from scratch. This capability extends far beyond suggesting the next line of code.

Regarding the "built itself" claim, it means the model was powerful enough to accelerate its own development. OpenAI's engineers used it to help data scientists build new data pipelines and even had it dynamically scale GPU clusters during the launch. It is a proof of concept for how agentic AI can accelerate complex technical work.

The practical gap for businesses

This capability is impressive. For many businesses, however, this serves as a foundational technology that requires further development for specific applications.

It still takes a lot of technical know-how and engineering time to turn it into a reliable tool for a specific job, like customer support or sales.

Many companies require AI solutions tailored to specific business functions, such as an AI teammate that can learn their products, understand refund policies, and begin handling support tickets. This highlights the gap between a general-purpose model and a business-ready solution.

User experience and accessibility

Beyond its raw power, how does it feel to use GPT-5.3-Codex? And more importantly, who can get access to it?

A more interactive and steerable AI

One of the notable new features is called "steering." It lets you interact with the model while it's working on a task. You can jump in to ask questions, give feedback, and nudge it in the right direction in real time.

This is a significant shift from the typical "black box" approach where a user provides a prompt and waits for the final output. It adds a layer of transparency and control, letting you see the agent's "thought process" and fix its course before it goes too far down the wrong path. It feels less like giving instructions and more like actual collaboration.

Exactly I wouldn't mind if it needed to work 20 hours instead of 1 hour if it could deliver same quality of code I can write myself.

The biggest limitation: No API access

So, how can you try it out? GPT-5.3-Codex is available through the Codex app, a CLI, IDE extensions, and the web interface for paid ChatGPT users.

However, a significant limitation for businesses is that API access is not yet available. OpenAI says it's "rolling out soon," but for now, that's the main roadblock preventing companies from building this power into their own products or internal workflows. Without an API, it remains a powerful but standalone tool, not a scalable part of your tech stack.

This delay presents a challenge for businesses. While businesses wait for API access to build custom solutions, other platforms offer ready-to-deploy applications. For instance, eesel AI provides an AI teammate designed to integrate with help desks like Zendesk, Gorgias, and Intercom. The eesel AI Agent learns from a company's data and can begin handling customer support issues, without requiring custom development.
A view of the eesel AI Agent, an alternative solution mentioned in this GPT 5.3 Codex review, handling customer support tickets autonomously.
A view of the eesel AI Agent, an alternative solution mentioned in this GPT 5.3 Codex review, handling customer support tickets autonomously.

Pricing and the new cybersecurity model

The last pieces of the puzzle are cost and security.

How much does it cost?

Right now, OpenAI hasn't announced any specific pricing for GPT-5.3-Codex. Access is included with paid ChatGPT plans.

Because there’s no API access yet, there's no API pricing available either. This creates uncertainty for businesses planning their AI initiatives, as the cost at scale is unknown, making budgeting difficult.

Some platforms provide more predictable pricing structures. For example, eesel AI's pricing is based on a pay-per-interaction model. This model is not tied to the number of user seats, which can help businesses forecast costs and calculate ROI as they scale their use of AI for customer support.

A "high capability" model for cybersecurity

OpenAI has labeled GPT-5.3-Codex as a "High capability" model for cybersecurity under its Preparedness Framework. This is because it was trained to find software vulnerabilities, making it a strong tool for security professionals.

To manage the risks, OpenAI has rolled out safety measures like the "Trusted Access for Cyber" program, which gives access to vetted cybersecurity experts, and a $10M grant to speed up cyber defense research.

This level of capability has significant security implications. While it is a powerful tool for defense, it also introduces risks that businesses must manage. A managed platform can help address these concerns by offering built-in security and compliance features. For example, eesel AI states that customer data is isolated and never used for training, providing AI capabilities with established security protocols.

A glimpse into the future

GPT-5.3-Codex is a significant step forward for agentic AI. Its performance, speed, and wider skill set make it a powerful tool for developers and other tech professionals. It offers a glimpse into a future where AI agents are our daily collaborators.

However, for many businesses, its current limitations are significant. The missing API access, unknown costs, and the work required to turn a general model into a specific business tool mean it is more of a preview of future capabilities than a solution for immediate implementation.

To see GPT-5.3-Codex in action and hear more detailed first-hand experiences, the following review provides a comprehensive look at its new features and what they mean for the future of AI-assisted development.

A detailed review of OpenAI's GPT-5.3-Codex, covering its new features, performance benchmarks, and its impact on the software world.

How to deploy an AI agent today

A key challenge is that a powerful foundational model like Codex is the engine, but businesses still need to build the application around it. These models are not designed for direct, out-of-the-box business use.

This is where a platform like eesel AI can provide a complete solution. Instead of setting up a tool, you "hire" an AI teammate. The eesel AI Agent connects to the tools you already use, learns your business in minutes, and starts working with your team to handle customer support tickets on its own.

This allows businesses to start using AI agents without waiting for foundational models to become fully productized. Explore how the eesel AI Agent can be applied to customer service operations.

Frequently Asked Questions

What is the main takeaway from this GPT 5.3 Codex review?
The main takeaway is that GPT-5.3-Codex is a significant step forward for agentic AI, especially for developers. However, its lack of an API and undefined pricing make it more of a future-facing tool than a practical business solution you can implement today.
How does GPT 5.3 Codex compare to Anthropic's Opus 4.6?
The comparison is mixed. Codex beats Opus 4.6 on the Terminal-Bench 2.0 benchmark, showing better command-line skills. But Opus 4.6 scores higher on OSWorld, indicating better performance on general computer tasks. Neither model is the clear winner across the board.
What is the biggest limitation highlighted in the GPT 5.3 Codex review?
The single biggest limitation for businesses is the lack of API access. Without an API, companies can't integrate Codex's capabilities into their own products or internal systems, making it a standalone tool for now.
Who should be most excited about this release?
Developers and technical professionals are the primary audience for this release, given the model's capabilities in coding, debugging, and infrastructure management.
What does the "steering" feature mentioned in this review allow users to do?
"Steering" is an interactive feature that lets you guide the model while it's working. You can ask questions, provide feedback, and correct its course in real-time, making it feel more like a collaborative partner than a black-box tool.

Share this article

Stevia Putri

Article by

Stevia Putri

Stevia Putri is a marketing generalist at eesel AI, where she helps turn powerful AI tools into stories that resonate. She’s driven by curiosity, clarity, and the human side of technology.

Related Posts

All posts →
Image alt text
Trending

A detailed Claude Cowork review: Features, pricing, and limitations

Anthropic's Claude Cowork brings AI agent capabilities to the desktop, allowing users to automate tasks by managing files and browsing the web. This review explores its features, performance, and limitations.

Stevia PutriStevia PutriFeb 6, 2026
Image alt text
Trending

An overview of Gemini Agentic Vision: How it works and what it means for AI

Google's Gemini Agentic Vision is a new feature in the Gemini 3 Flash model that changes how AI interacts with images, turning passive viewing into an active, multi-step investigation for greater accuracy.

Stevia PutriStevia PutriJan 30, 2026
Banner image for 7 best helpdesk software for ecommerce in 2026: I tested the top platforms
Alternatives

7 best helpdesk software for ecommerce in 2026: I tested the top platforms

We tested 12 platforms through a live ticket backlog to find the 7 best ecommerce helpdesk solutions. From Shopify-native tools to marketplace experts, here is what actually works in 2026.

Katelin TeenKatelin TeenApr 27, 2026
Banner image for Make vs n8n: Complete 2026 comparison for workflow automation
Alternatives

Make vs n8n: Complete 2026 comparison for workflow automation

Make and n8n are leading workflow automation platforms with different strengths. This comparison breaks down pricing, features, and use cases to help you decide.

Stevia PutriStevia PutriMar 30, 2026
Banner image for Claude Sonnet 4.6 review: The sweet spot between performance and price
Trending

Claude Sonnet 4.6 review: The sweet spot between performance and price

Anthropic's Claude Sonnet 4.6 punches above its weight class with frontier-level coding performance, a 1M token context window, and significant improvements over Sonnet 4.5.

Stevia PutriStevia PutriFeb 26, 2026
Image alt text
Trending

Claude Opus 4.6: A complete overview of Anthropic’s latest AI model

On February 5, 2026, Anthropic announced Claude Opus 4.6, a significant update in the AI field. This model represents a substantial advancement, particularly for tasks like agentic coding, deep reasoning, and managing complex business workflows.

Stevia PutriStevia PutriFeb 6, 2026
Image alt text
Trending

Understanding OpenAI Frontier pricing: A complete guide

OpenAI has not publicly released pricing information for its new enterprise platform, Frontier. This suggests a 'Contact Sales' model with custom contracts based on usage, complexity, and support levels, positioning it as a solution for large corporations.

Alicia Kirana UtomoAlicia Kirana UtomoFeb 6, 2026
Image alt text
Trending

An honest OpenAI Frontier review: The future of enterprise AI agents?

OpenAI launched Frontier, its new enterprise platform for building AI agents. Our review covers what it is, its core features, who it’s for, its drawbacks, and what it means for the future of AI in business.

Stevia PutriStevia PutriFeb 6, 2026
Image alt text
Trending

OpenAI Frontier vs Claude Cowork: A complete guide

A new era of AI is here, shifting from features to infrastructure. This post compares OpenAI Frontier and Claude Cowork, exploring their different approaches to AI-driven work, target users, and the economic implications for the SaaS industry.

Katelin TeenKatelin TeenFeb 6, 2026

Ready to hire your AI teammate?

Set up in minutes. No credit card required.

Get started free