Our complete GPT 5.3 Codex review: A new era for agentic AI

Written by

Kenneth Pangan

Reviewed by

Katelin Teen

Last edited February 6, 2026

Expert Verified

On February 5, 2026, OpenAI released GPT-5.3-Codex, its newest coding model. The release coincided with Anthropic's Opus 4.6, highlighting the competitive pace of AI development.

OpenAI is positioning this as more than a minor update. They are shifting Codex from a powerful code generator into a general-purpose agent that can operate a computer and handle professional workflows from start to finish. The concept moves from a tool toward an AI teammate.

This article will break down what’s new, review its performance, and analyze what this means for developers and businesses.

What is GPT 5.3 Codex?

At its core, GPT-5.3-Codex is what OpenAI calls its "most capable agentic coding model to date." It follows GPT-5.2-Codex, but with a significantly expanded scope.

According to OpenAI's official announcement, the new model is built on three main principles:

Top-tier agentic skills: The model is designed to handle long, complex tasks across the software development lifecycle and other professional domains.
Improved efficiency: It is reportedly 25% faster and uses fewer tokens than the previous version, which enhances user experience and reduces operational costs.
Self-improvement: Notably, OpenAI states the model helped "create itself." It assisted engineers with tasks like debugging its own training and managing deployments.

The concept is to provide an interactive partner rather than a tool that simply follows commands. This positions it as a teammate that can be guided in real-time, not just an assistant for task delegation.

An infographic detailing the core principles of the GPT 5.3 Codex review: top-tier agentic skills, improved efficiency, and self-improvement.

New capabilities of GPT 5.3 Codex

Let's get into the details of how this new model performs. We’ve dug into OpenAI's claims and the early analysis to see what’s really going on.

Benchmark performance: A leap in agentic skills

OpenAI backed up its release with new scores on key industry benchmarks. These numbers show a significant jump in what the AI can do on its own.

Here’s a look at the data from their blog post, visualized for clarity:

A bar chart infographic for our GPT 5.3 Codex review, comparing its benchmark scores against GPT-5.2-Codex on SWE-Bench Pro, Terminal-Bench 2.0, and OSWorld-Verified.

Benchmark	GPT-5.3-Codex	GPT-5.2-Codex	Improvement
SWE-Bench Pro	56.8%	56.4%	A slight edge in multi-language software engineering.
Terminal-Bench 2.0	77.3%	64.0%	A massive leap in command-line proficiency.
OSWorld-Verified	64.7%	38.2%	A huge jump in general computer productivity tasks.

The improvements in Terminal-Bench and OSWorld are significant. This suggests the model has improved capabilities for operating within a digital environment and using tools like a person would.

However, the competitive landscape is strong. Community analysis shows that while Codex's 77.3% on Terminal-Bench 2.0 beats Anthropic's Opus 4.6 (65.4%), the tables turn on OSWorld. There, Opus 4.6 scores 72.7% to Codex's 64.7%. This indicates that neither model currently leads across all agentic skills.

Yes. And this is from someone who has always hated codex and only used 5.2 high and xhigh. But 5.3-codex-xhigh is amazing, I’ve build more in 4 hours than I have in the last week.

From coding assistant to professional collaborator

OpenAI is clearly positioning Codex as more than just a tool for developers. They are showing off its ability to manage entire professional workflows.

For example, they shared demos where Codex created a 10-slide PowerPoint presentation for a financial advisor and built fully functional racing and diving games from scratch. This capability extends far beyond suggesting the next line of code.

Regarding the "built itself" claim, it means the model was powerful enough to accelerate its own development. OpenAI's engineers used it to help data scientists build new data pipelines and even had it dynamically scale GPU clusters during the launch. It is a proof of concept for how agentic AI can accelerate complex technical work.

The practical gap for businesses

This capability is impressive. For many businesses, however, this serves as a foundational technology that requires further development for specific applications.

It still takes a lot of technical know-how and engineering time to turn it into a reliable tool for a specific job, like customer support or sales.

Many companies require AI solutions tailored to specific business functions, such as an AI teammate that can learn their products, understand refund policies, and begin handling support tickets. This highlights the gap between a general-purpose model and a business-ready solution.

User experience and accessibility

Beyond its raw power, how does it feel to use GPT-5.3-Codex? And more importantly, who can get access to it?

A more interactive and steerable AI

One of the notable new features is called "steering." It lets you interact with the model while it's working on a task. You can jump in to ask questions, give feedback, and nudge it in the right direction in real time.

This is a significant shift from the typical "black box" approach where a user provides a prompt and waits for the final output. It adds a layer of transparency and control, letting you see the agent's "thought process" and fix its course before it goes too far down the wrong path. It feels less like giving instructions and more like actual collaboration.

Exactly I wouldn't mind if it needed to work 20 hours instead of 1 hour if it could deliver same quality of code I can write myself.

The biggest limitation: No API access

So, how can you try it out? GPT-5.3-Codex is available through the Codex app, a CLI, IDE extensions, and the web interface for paid ChatGPT users.

However, a significant limitation for businesses is that API access is not yet available. OpenAI says it's "rolling out soon," but for now, that's the main roadblock preventing companies from building this power into their own products or internal workflows. Without an API, it remains a powerful but standalone tool, not a scalable part of your tech stack.

This delay presents a challenge for businesses. While businesses wait for API access to build custom solutions, other platforms offer ready-to-deploy applications. For instance, eesel AI provides an AI teammate designed to integrate with help desks like Zendesk, Gorgias, and Intercom. The eesel AI Agent learns from a company's data and can begin handling customer support issues, without requiring custom development.

A view of the eesel AI Agent, an alternative solution mentioned in this GPT 5.3 Codex review, handling customer support tickets autonomously.

Pricing and the new cybersecurity model

The last pieces of the puzzle are cost and security.

How much does it cost?

Right now, OpenAI hasn't announced any specific pricing for GPT-5.3-Codex. Access is included with paid ChatGPT plans.

Because there’s no API access yet, there's no API pricing available either. This creates uncertainty for businesses planning their AI initiatives, as the cost at scale is unknown, making budgeting difficult.

Some platforms provide more predictable pricing structures. For example, eesel AI's pricing is based on a pay-per-interaction model. This model is not tied to the number of user seats, which can help businesses forecast costs and calculate ROI as they scale their use of AI for customer support.

A "high capability" model for cybersecurity

OpenAI has labeled GPT-5.3-Codex as a "High capability" model for cybersecurity under its Preparedness Framework. This is because it was trained to find software vulnerabilities, making it a strong tool for security professionals.

To manage the risks, OpenAI has rolled out safety measures like the "Trusted Access for Cyber" program, which gives access to vetted cybersecurity experts, and a $10M grant to speed up cyber defense research.

This level of capability has significant security implications. While it is a powerful tool for defense, it also introduces risks that businesses must manage. A managed platform can help address these concerns by offering built-in security and compliance features. For example, eesel AI states that customer data is isolated and never used for training, providing AI capabilities with established security protocols.

A glimpse into the future

GPT-5.3-Codex is a significant step forward for agentic AI. Its performance, speed, and wider skill set make it a powerful tool for developers and other tech professionals. It offers a glimpse into a future where AI agents are our daily collaborators.

However, for many businesses, its current limitations are significant. The missing API access, unknown costs, and the work required to turn a general model into a specific business tool mean it is more of a preview of future capabilities than a solution for immediate implementation.

To see GPT-5.3-Codex in action and hear more detailed first-hand experiences, the following review provides a comprehensive look at its new features and what they mean for the future of AI-assisted development.


A detailed review of OpenAI's GPT-5.3-Codex, covering its new features, performance benchmarks, and its impact on the software world.

How to deploy an AI agent today

A key challenge is that a powerful foundational model like Codex is the engine, but businesses still need to build the application around it. These models are not designed for direct, out-of-the-box business use.

This is where a platform like eesel AI can provide a complete solution. Instead of setting up a tool, you "hire" an AI teammate. The eesel AI Agent connects to the tools you already use, learns your business in minutes, and starts working with your team to handle customer support tickets on its own.

This allows businesses to start using AI agents without waiting for foundational models to become fully productized. Explore how the eesel AI Agent can be applied to customer service operations.

Frequently Asked Questions

The main takeaway is that GPT-5.3-Codex is a significant step forward for agentic AI, especially for developers. However, its lack of an API and undefined pricing make it more of a future-facing tool than a practical business solution you can implement today.

The comparison is mixed. Codex beats Opus 4.6 on the Terminal-Bench 2.0 benchmark, showing better command-line skills. But Opus 4.6 scores higher on OSWorld, indicating better performance on general computer tasks. Neither model is the clear winner across the board.

Not directly. While powerful, GPT-5.3-Codex is a general-purpose model that requires significant engineering to be turned into a specialized tool for customer support. For that, a ready-to-use platform like eesel AI, which is specifically built for this purpose, may be a more direct solution.

The single biggest limitation for businesses is the lack of API access. Without an API, companies can't integrate Codex's capabilities into their own products or internal systems, making it a standalone tool for now.

Developers and technical professionals are the primary audience for this release, given the model's capabilities in coding, debugging, and infrastructure management.

"Steering" is an interactive feature that lets you guide the model while it's working. You can ask questions, provide feedback, and correct its course in real-time, making it feel more like a collaborative partner than a black-box tool.

Share this post

Article by

Kenneth Pangan

Writer and marketer for over ten years, Kenneth Pangan splits his time between history, politics, and art with plenty of interruptions from his dogs demanding attention.

Our complete GPT 5.3 Codex review: A new era for agentic AI

What is GPT 5.3 Codex?

New capabilities of GPT 5.3 Codex

Benchmark performance: A leap in agentic skills