A practical guide to the OpenAI Batch API reference

Kenneth Pangan
Written by

Kenneth Pangan

Amogh Sarda
Reviewed by

Amogh Sarda

Last edited October 13, 2025

Expert Verified

Ever hit a rate limit right when you have a mountain of data to process? It’s a classic developer headache. You have a massive job to run, but sending thousands of API requests one by one is slow, drains your budget, and is a surefire way to get throttled. But what if you don't need all the answers this very second?

This is where the OpenAI Batch API fits in. It’s a tool designed specifically for asynchronous tasks, letting you submit huge jobs, walk away, and come back to the results later. The best part? It comes in at half the cost and with much higher rate limits.

In this guide, we’ll walk through what the Batch API is, how it actually works, and where it really shines. We'll also get into the pricing and, most importantly, discuss when a real-time AI solution is a much better fit, especially for things like customer support.

What is the Batch API?

Before we jump in, let’s quickly clear up the difference between synchronous and asynchronous APIs. A synchronous API call is like a phone call: you ask a question and have to wait on the line for an immediate answer. An asynchronous call, like the Batch API, is more like sending an email. You send your request, get back to your other work, and get a notification when the response is ready.

The OpenAI Batch API is built for exactly that kind of large-scale, non-urgent work. According to OpenAI's own documentation, it processes these jobs within a 24-hour window and gives you a nice 50% discount compared to its real-time cousins.

This makes it incredibly useful for a few reasons:

  • It saves you money: That 50% discount is a pretty big deal when you’re classifying thousands of product reviews or embedding a huge library of content.

  • It has higher rate limits: The Batch API runs on a separate, more generous quota based on the number of tokens you send its way. This means your big, offline jobs won't get in the way of your application's day-to-day real-time API calls.

  • It’s made for bulk tasks: If you need to run evaluations, generate content for an entire website, or chew through a massive dataset, doing it in one go is way more straightforward than building out a complicated queueing system for synchronous calls.

How the OpenAI Batch API works step-by-step

Getting started with the Batch API is a pretty simple, five-step workflow. Let's break it down.

Step 1: Prepare your batch file in JSONL format

First up, you need to get all your individual requests bundled into a single file. The Batch API uses the JSON Lines format, or ".jsonl", which is really just a text file where each line is its own valid JSON object. Think of each object as a single API request you want to make.

Here’s what two requests would look like in a ".jsonl" file for the "/v1/chat/completions" endpoint:


{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}]}}  

{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-4o-mini", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Summarize the plot of 'Dune'."}]}}  

Pro Tip
Don't ignore the `custom_id`. The Batch API doesn't promise to return results in the same order you sent them. This ID is how you’ll match each response back to its original request, so make sure it's unique for every single line.

Step 2: Upload your file

Once your ".jsonl" file is ready, you upload it using the OpenAI Files API. The important part here is to set the "purpose" parameter to "batch". This tells OpenAI that the file is meant for a batch processing job.

Step 3: Create and run the batch job

With your file uploaded, you can now kick off the batch job. You'll use the "input_file_id" you got from the file upload step. The "completion_window" is fixed at "24h" for now, so you just need to point it at the endpoint you’re targeting, like "/v1/chat/completions".

Step 4: Check the job status

After you create the job, it doesn't start immediately. It has to go through a few stages. You can check on its progress anytime by pinging the batch endpoint with your job ID. The status will be one of these:

  • validating: The input file is being checked for errors.

  • in_progress: The job is up and running.

  • completed: It's all done, and your results are ready.

  • failed: Something went wrong during validation or processing.

  • cancelled: You (or someone on your team) stopped the job manually.

Step 5: Get your results

When the status finally hits "completed", the response object will contain two new file IDs: an "output_file_id" for all the successful requests and an "error_file_id" for any that failed along the way.

You can then download the content of the output file. It will be another ".jsonl" file, where each line holds the result of one of your original requests, conveniently matched up with its "custom_id".

Key use cases for the OpenAI Batch API (and when to find another tool)

The Batch API is a great tool, but it's not the right tool for every job. Knowing when to use it, and when not to, is half the battle.

Perfect fit use cases

The Batch API is your best friend for any large-scale task where you don't need an answer right away. Think of things like:

  • Large-scale data classification: Running sentiment analysis on thousands of customer reviews overnight while you sleep.

  • Offline content generation: Whipping up SEO meta descriptions for every page of a website or product summaries for an entire e-commerce catalog.

  • Model evaluations: Testing a fine-tuned model against a huge dataset to see how well it performs.

  • Data pre-processing: Cleaning, formatting, or translating massive text datasets before you feed them into another system.

When not to use the Batch API: The need for real-time answers

The biggest drawback of the Batch API is that it's asynchronous by design. That 24-hour turnaround time, even if it's often faster, makes it a non-starter for any task that needs an immediate, conversational response.

This is especially true for customer support. If a customer is in a live chat asking for help, they can't wait hours, let alone a whole day, for an answer. This is where the Batch API approach just doesn't work and a purpose-built, real-time solution is the only way to go.

Trying to build a support automation system with the Batch API is a heavy lift. It involves a lot of custom code, file wrangling, and managing a multi-step API workflow. It’s certainly not a plug-and-play solution that a support manager could set up on their own.

For tasks that demand instant interaction, like powering a live chatbot, drafting agent replies in the moment, or triaging tickets as they come in, you need a platform designed for those real-time conversations. That’s where a solution like eesel AI comes into the picture. It’s built from the ground up for the exact use cases where the Batch API can't compete, offering instant, autonomous support right inside the tools you already use.

Understanding pricing and rate limits

One of the most appealing things about the Batch API is how much money it can save you. Here’s a quick look at how it works.

A breakdown of the pricing model

The pricing is refreshingly simple: you get a 50% discount compared to the standard synchronous API endpoints. On large jobs, those savings can really add up.

Let's look at a quick comparison for "gpt-4o-mini", which is a popular and very capable model:

ModelTierInput (per 1M tokens)Output (per 1M tokens)
"gpt-4o-mini"Standard$0.15$0.60
"gpt-4o-mini"Batch$0.075$0.30

Source: OpenAI Pricing Page

As you can see, the costs are literally cut in half. That makes batch processing a very attractive option for any non-urgent, high-volume task you can think of.

Navigating rate limits

Another big plus is that the Batch API rate limits are completely separate from your standard API limits. This means you can kick off a massive batch job without worrying about it blocking the real-time requests that keep your main application running.

The limits for the Batch API are mostly based on:

  1. Per-batch limits: You can stuff up to 50,000 requests into a single file.

  2. Enqueued tokens per model: Each model has a cap on the total number of tokens you can have "in the queue" at any one time.

You can always find your organization’s specific rate limits on your OpenAI Platform Settings page.

Automating customer support: Batch API vs. a dedicated AI agent

So, could you build a customer support automation system using the Batch API? In theory, yes. But should you? Probably not. Let's compare the two approaches.

The Batch API approach

To automate support with the Batch API, a developer would have to stitch together a fairly complex and manual workflow:

  • First, you'd need to periodically export new support tickets from your helpdesk.

  • Then, you’d write a script to format them all into the required ".jsonl" file.

  • You'd submit the batch job to OpenAI.

  • Then you wait, potentially for up to 24 hours.

  • Once it’s done, you download the results and write another script to parse them.

  • Finally, you import the generated responses back into your helpdesk.

The limitations here are pretty clear. The whole process is slow, clunky, and completely misses the point of real-time customer service. It can't handle a live chat, resolve an urgent ticket, or give customers the quick answers they've come to expect.

The eesel AI approach

Now, let's look at how a platform like eesel AI, which was built for this exact problem, handles it. It’s designed to get you live in minutes.

  • You can set it up yourself: Forget about booking demos or sitting through long sales calls. You can sign up and get your first AI agent running in just a few minutes, all on your own.

  • One-click integrations: eesel AI plugs directly into popular helpdesks like Zendesk, Freshdesk, and Intercom. It learns from your past tickets and knowledge bases automatically, with no manual file formatting or uploading needed.

  • Real-time and autonomous: eesel AI agents work right inside your helpdesk, responding to tickets on their own as they arrive, 24/7. It’s built for live interaction, not overnight batch jobs.

  • Total control and simulation: Before you even go live, you can run a simulation on thousands of your past tickets. This shows you exactly how the AI will perform and what your resolution rate will be, so you can launch with confidence. That kind of risk-free testing is something you just can't get when building a custom solution from scratch.

Get started with real-time AI automation in minutes

The OpenAI Batch API is an excellent, budget-friendly tool for developers who need to process large, asynchronous jobs. For tasks like data analysis or offline content generation, it’s a fantastic option.

But when it comes to the fast-paced, conversational world of customer and employee support, you need a solution built for immediate action. Batch processing simply can't keep up.

If you need to automate support tickets, power a live chatbot, or give your team instant answers, a dedicated platform is the way to go. Ready to see what real-time support automation actually looks like? Get started with eesel AI for free.

Frequently asked questions

The primary purpose is to process large volumes of non-urgent data asynchronously. It allows you to submit numerous API requests in one go and retrieve the results later, ideal for bulk tasks.

The OpenAI Batch API Reference offers a significant 50% discount compared to standard synchronous API calls. This makes it a very cost-effective solution for processing massive datasets or generating content offline.

You should avoid using the OpenAI Batch API Reference for any task requiring immediate, real-time responses, such as live customer support or interactive chatbots. Its asynchronous nature and potential 24-hour turnaround time make it unsuitable for instant interactions.

You need to prepare your batch file in JSON Lines (".jsonl") format. Each line in this file should be a valid JSON object representing an individual API request, including a unique "custom_id".

No, the rate limits for the OpenAI Batch API Reference are completely separate and more generous than those for standard real-time API calls. This ensures that large batch jobs do not interfere with your application's immediate operational needs.

While theoretically possible with extensive custom development, it is strongly not recommended for live customer support chatbots. The inherent delays in batch processing are incompatible with the need for immediate responses in real-time customer service interactions.

Share this post

Kenneth undefined

Article by

Kenneth Pangan

Writer and marketer for over ten years, Kenneth Pangan splits his time between history, politics, and art with plenty of interruptions from his dogs demanding attention.