8:["$","div",null,{"className":"page bg-white","children":[["$","article",null,{"className":"mb-10 p-6 tblsm:p-10 dsk:px-[72px] dsk:pt-[120px] pb-0 max-w-[1644px] mx-auto [&_section]:mb-[50px] [&_[data-quote]]:mt-0 [&_.container]:p-0 tblsm:[&_.container]:p-0 tblsm:[&_.columns]:!block tblsm:pt-8 ","children":[["$","$L20",null,{"data":{"id":"cG9zdDo0NzgzMA==","title":"An engineer's guide to the OpenAI Realtime API reference","excerpt":"

Dive into our comprehensive overview of the OpenAI Realtime API reference. We cover key features, connection methods, pricing, and the practical challenges of implementation.

\n","slug":"openai-realtime-api-reference-en","date":"2025-10-12T21:12:22","dateGmt":"2025-10-12T21:12:22","modified":"2025-11-14T14:37:54","language":{"slug":"en"},"featuredImage":{"node":{"altText":"","mediaDetails":{"width":1785,"height":949},"sourceUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/10/Banner-Product-GPT-realtime-mini_-A-practical-guide-to-OpenAIs-voice-AI-model.png"}},"postMeta":{"banner":null,"minsRead":null,"hideHeroImage":false,"reviewer":{"nodes":[{"name":"Katelin Teen","firstName":"Katelin","lastName":"Teen","authors":{"avatar":{"node":{"altText":"","mediaItemUrl":"https://website-cms.eesel.ai/wp-content/uploads/2024/10/katelin-profile-e1752733682107.jpeg","mediaDetails":{"width":752,"height":765}}}}}]}},"author":{"node":{"firstName":"Kenneth","lastName":"Pangan","description":"Writer and marketer for over ten years, Kenneth Pangan splits his time between history, politics, and art with plenty of interruptions from his dogs demanding attention.","email":null,"seo":{"social":{"facebook":"","instagram":"","linkedIn":"https://www.linkedin.com/in/kenneth-pangan-b0b93522b/","twitter":""}},"authors":{"avatar":{"node":{"altText":"","mediaItemUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/01/ff982460-eca1-4f0e-b1db-aa9ad25df868.jpg","mediaDetails":{"width":1894,"height":3718}}},"role":"Writer","roleFrench":"Écrivain","roleGerman":"Schriftsteller","roleSpanish":"Escritor","rolePortuguese":"Escritor","roleJapanese":"作家"}}},"categories":{"nodes":[{"slug":"guides-en","name":"Guides"}]},"tags":{"edges":[]},"seo":{"canonical":"https://www.eesel.ai//openai-realtime-api-reference-en","title":"An engineer's guide to the OpenAI Realtime API reference - eesel AI","metaDesc":"Explore the OpenAI Realtime API with our complete reference guide. Learn about WebSockets, WebRTC, pricing, and the challenges of building voice AI agents.","focuskw":"","opengraphTitle":"An engineer's guide to the OpenAI Realtime API reference","opengraphDescription":"Explore the OpenAI Realtime API with our complete reference guide. Learn about WebSockets, WebRTC, pricing, and the challenges of building voice AI agents.","opengraphImage":{"altText":"","sourceUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/10/Banner-Product-GPT-realtime-mini_-A-practical-guide-to-OpenAIs-voice-AI-model.png","srcSet":"https://website-cms.eesel.ai/wp-content/uploads/2025/10/Banner-Product-GPT-realtime-mini_-A-practical-guide-to-OpenAIs-voice-AI-model-300x159.png 300w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/Banner-Product-GPT-realtime-mini_-A-practical-guide-to-OpenAIs-voice-AI-model-1024x544.png 1024w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/Banner-Product-GPT-realtime-mini_-A-practical-guide-to-OpenAIs-voice-AI-model-768x408.png 768w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/Banner-Product-GPT-realtime-mini_-A-practical-guide-to-OpenAIs-voice-AI-model-1536x817.png 1536w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/Banner-Product-GPT-realtime-mini_-A-practical-guide-to-OpenAIs-voice-AI-model.png 1785w"},"opengraphUrl":"https://www.eesel.ai//openai-realtime-api-reference-en","opengraphSiteName":"eesel AI","opengraphModifiedTime":"2025-11-14T14:37:54+00:00","breadcrumbs":[{"url":"https://website-cms.eesel.ai/","text":"Home"},{"url":"https://www.eesel.ai//openai-realtime-api-reference-en/","text":"An engineer's guide to the OpenAI Realtime API reference"}],"readingTime":0},"editorBlocks":[{"__typename":"AcfTextblock","parentClientId":null,"clientId":"69304ffa6ef0a","innerBlocks":[],"textBlock":{"marginBottomReduced":false,"heading":null,"content":"$21","contentType":["markdownV2"]}},{"__typename":"AcfFaqs","parentClientId":null,"clientId":"69304ffa6ef1b","innerBlocks":[],"faqs":{"type":["default"],"heading":"Frequently asked questions","answerType":["markdown"],"faqs":[{"question":"What exactly is the OpenAI Realtime API Reference and what is its primary purpose?","answer":"

The OpenAI Realtime API Reference describes an API built for fast, multimodal conversations. Its primary purpose is to enable genuine, flowing [speech-to-speech interaction](https://www.eesel.ai/blog/what-is-conversational-ai) by keeping a continuous connection open and utilizing a single model like GPT-4o for STT, LLM, and TTS.

\n"},{"question":"How do developers typically connect to the OpenAI Realtime API Reference for their applications?","answer":"

Developers typically connect to the OpenAI Realtime API Reference using either WebSockets or WebRTC. WebSockets are ideal for server-to-server applications, while WebRTC is recommended for client-side applications running on user devices due to its better handling of variable network conditions.

\n"},{"question":"What are the key features highlighted in the OpenAI Realtime API Reference that enable advanced voice AI?","answer":"

The OpenAI Realtime API Reference highlights key features such as speech-to-speech conversation for interactive agents, live transcription for real-time text output, and function calling/tool use, allowing the AI to interact with external systems.

\n"},{"question":"What are some of the significant technical challenges encountered when implementing solutions using the raw OpenAI Realtime API Reference?","answer":"

Implementing solutions with the raw OpenAI Realtime API Reference presents challenges like managing complex connections and audio streams, handling latency and user interruptions, maintaining conversation context beyond short sessions, and dealing with potentially unpredictable costs.

\n"},{"question":"How does the pricing model for the OpenAI Realtime API Reference work, and what are the typical costs?","answer":"

The OpenAI Realtime API Reference pricing is based on minutes of audio processed for both input and output, with different rates for each. While OpenAI caches input tokens to reduce costs, a 10-minute conversation can still cost around $2.68, making predictable budgeting a challenge without optimization.

\n"},{"question":"Can the OpenAI Realtime API Reference integrate with other systems or knowledge bases for more effective AI agents?","answer":"

Yes, the OpenAI Realtime API Reference supports function calling, enabling the AI to interact with external tools and systems. For broader knowledge integration and simplified management, platforms like eesel AI offer managed solutions that connect to existing help centers and documents.

\n"}],"questionText":null,"supportLink":null}}]},"shareUrl":"https://www.eesel.ai/en/blog/openai-realtime-api-reference-en"}],["$","span",null,{"className":"my-8 tblsm:my-[60px] dsk:my-18 dskxl:my-20 block w-full h-px bg-border-light dsklg:my-[72px] "}],["$","$L22",null,{"image":"$23","className":"w-full max-h-[780px] overflow-hidden h-auto object-cover mb-10 rounded-xl tblsm:mb-10 dsk:mb-[60px] dsklg:mb-[72px] dsklg:max-w-[1150px] dsklg:mx-auto","priority":true,"sizes":"(max-width: 500px) 300px,(max-width: 1600px) 100vw, 1600px","quality":80}],["$","div",null,{"className":"","children":[["$","div",null,{"className":"grid gap-[70px] grid-cols-1 dsklg:grid-cols-[1fr_600px_1fr] dskxl:grid-cols-[1fr_800px_1fr]","children":[["$","div",null,{"className":"relative hidden dsk:flex flex-col gap-6 ","children":["$","div",null,{"className":"sticky top-[92px]","children":["$","$L25",null,{}]}]}],["$","div",null,{"className":"","children":["$undefined",["$","div",null,{"className":"relative [&_.faqWrapper]:!mt-5","data-content":true,"children":[["$","div",null,{"className":"relative [&_.faqWrapper]:!mt-5","dangerouslySetInnerHTML":{"__html":" "}}],["$","div",null,{"children":[["$","$11",null,{"fallback":null,"children":["$","section",null,{"className":"relative !mb-0 data-[margin-bottom-reduced=true]:mb-[30px]","data-margin-bottom-reduced":false,"children":["$","div",null,{"className":"container mx-auto","children":[null,false,["$","div",null,{"className":"$26","children":[["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Voice is quickly becoming the way we interact with our devices, and real-time conversation is at the center of it all. If you're a developer looking to build an app that talks back, you've probably come across the ","position":{"start":{"line":1,"column":1,"offset":0},"end":{"line":1,"column":215,"offset":214}}},{"type":"element","tagName":"a","properties":{"href":"https://platform.openai.com/docs/api-reference/realtime"},"children":[{"type":"text","value":"OpenAI Realtime API","position":{"start":{"line":1,"column":216,"offset":215},"end":{"line":1,"column":235,"offset":234}}}],"position":{"start":{"line":1,"column":215,"offset":214},"end":{"line":1,"column":293,"offset":292}}},{"type":"text","value":". It's a seriously powerful tool that gives you direct access to models like GPT-4o for incredibly fast, speech-to-speech experiences.","position":{"start":{"line":1,"column":293,"offset":292},"end":{"line":1,"column":427,"offset":426}}}],"position":{"start":{"line":1,"column":1,"offset":0},"end":{"line":1,"column":429,"offset":428}}},"children":["Voice is quickly becoming the way we interact with our devices, and real-time conversation is at the center of it all. If you're a developer looking to build an app that talks back, you've probably come across the ",["$","a",null,{"href":"https://platform.openai.com/docs/api-reference/realtime","node":"$27","children":"OpenAI Realtime API"}],". It's a seriously powerful tool that gives you direct access to models like GPT-4o for incredibly fast, speech-to-speech experiences."]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"But here’s the thing about working with a raw, powerful API: it comes with its own set of headaches. You’re not just plugging something in; you’re managing complex connections, handling audio streams, and trying to make the user experience feel seamless.","position":{"start":{"line":3,"column":1,"offset":430},"end":{"line":3,"column":255,"offset":684}}}],"position":{"start":{"line":3,"column":1,"offset":430},"end":{"line":3,"column":257,"offset":686}}},"children":"But here’s the thing about working with a raw, powerful API: it comes with its own set of headaches. You’re not just plugging something in; you’re managing complex connections, handling audio streams, and trying to make the user experience feel seamless."}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"This guide is a practical walkthrough of the OpenAI Realtime API Reference. We’ll break down its key parts, what you can do with it, and the real-world hurdles you'll face. We'll also look at how other platforms can handle all that complexity for you, so you can focus on building something cool instead of wrestling with infrastructure.","position":{"start":{"line":5,"column":1,"offset":688},"end":{"line":5,"column":338,"offset":1025}}}],"position":{"start":{"line":5,"column":1,"offset":688},"end":{"line":5,"column":340,"offset":1027}}},"children":"This guide is a practical walkthrough of the OpenAI Realtime API Reference. We’ll break down its key parts, what you can do with it, and the real-world hurdles you'll face. We'll also look at how other platforms can handle all that complexity for you, so you can focus on building something cool instead of wrestling with infrastructure."}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"What is the OpenAI Realtime API?","position":{"start":{"line":7,"column":4,"offset":1032},"end":{"line":7,"column":36,"offset":1064}}}],"position":{"start":{"line":7,"column":1,"offset":1029},"end":{"line":7,"column":38,"offset":1066}}},"children":"What is the OpenAI Realtime API?"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"At its core, the OpenAI Realtime API is built for one thing: fast, multimodal conversations. Unlike the APIs you might be used to, which work on a simple request-and-response basis, this one keeps a connection open to stream data back and forth. This is what makes a ","position":{"start":{"line":9,"column":1,"offset":1068},"end":{"line":9,"column":268,"offset":1335}}},{"type":"element","tagName":"a","properties":{"href":"https://platform.openai.com/docs/guides/realtime-conversations"},"children":[{"type":"text","value":"genuine, flowing speech-to-speech conversation","position":{"start":{"line":9,"column":269,"offset":1336},"end":{"line":9,"column":315,"offset":1382}}}],"position":{"start":{"line":9,"column":268,"offset":1335},"end":{"line":9,"column":380,"offset":1447}}},{"type":"text","value":" possible.","position":{"start":{"line":9,"column":380,"offset":1447},"end":{"line":9,"column":390,"offset":1457}}}],"position":{"start":{"line":9,"column":1,"offset":1068},"end":{"line":9,"column":392,"offset":1459}}},"children":["At its core, the OpenAI Realtime API is built for one thing: fast, multimodal conversations. Unlike the APIs you might be used to, which work on a simple request-and-response basis, this one keeps a connection open to stream data back and forth. This is what makes a ",["$","a",null,{"href":"https://platform.openai.com/docs/guides/realtime-conversations","node":"$31","children":"genuine, flowing speech-to-speech conversation"}]," possible."]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Instead of chaining together separate services for Speech-to-Text (STT), a Large Language Model (LLM), and Text-to-Speech (TTS), the Realtime API uses a single, multimodal model like GPT-4o. This all-in-one approach means the model can listen to audio, understand what's being said, figure out a reply, and stream synthesized speech back to the user in one continuous flow.","position":{"start":{"line":11,"column":1,"offset":1461},"end":{"line":11,"column":374,"offset":1834}}}],"position":{"start":{"line":11,"column":1,"offset":1461},"end":{"line":11,"column":376,"offset":1836}}},"children":"Instead of chaining together separate services for Speech-to-Text (STT), a Large Language Model (LLM), and Text-to-Speech (TTS), the Realtime API uses a single, multimodal model like GPT-4o. This all-in-one approach means the model can listen to audio, understand what's being said, figure out a reply, and stream synthesized speech back to the user in one continuous flow."}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The whole thing is built around a system of events. You send \"client events\" to tell the API what to do, and you listen for \"server events\" to react to what's happening on the other end. It’s a great setup for building things like live transcription services or ","position":{"start":{"line":15,"column":1,"offset":1842},"end":{"line":15,"column":263,"offset":2104}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/ai-agent-examples"},"children":[{"type":"text","value":"interactive voice agents","position":{"start":{"line":15,"column":264,"offset":2105},"end":{"line":15,"column":288,"offset":2129}}}],"position":{"start":{"line":15,"column":263,"offset":2104},"end":{"line":15,"column":334,"offset":2175}}},{"type":"text","value":", but as we'll get into, managing that constant back-and-forth takes a lot of work.","position":{"start":{"line":15,"column":334,"offset":2175},"end":{"line":15,"column":417,"offset":2258}}}],"position":{"start":{"line":15,"column":1,"offset":1842},"end":{"line":15,"column":419,"offset":2260}}},"children":["The whole thing is built around a system of events. You send \"client events\" to tell the API what to do, and you listen for \"server events\" to react to what's happening on the other end. It’s a great setup for building things like live transcription services or ",["$","a",null,{"href":"https://www.eesel.ai/blog/ai-agent-examples","node":"$3b","children":"interactive voice agents"}],", but as we'll get into, managing that constant back-and-forth takes a lot of work."]}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"How to connect to the API","position":{"start":{"line":17,"column":4,"offset":2265},"end":{"line":17,"column":29,"offset":2290}}}],"position":{"start":{"line":17,"column":1,"offset":2262},"end":{"line":17,"column":31,"offset":2292}}},"children":"How to connect to the API"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"To get started, you need to establish a connection that stays open. You have two main options: WebSockets and WebRTC. The one you pick really depends on what you're trying to build.","position":{"start":{"line":19,"column":1,"offset":2294},"end":{"line":19,"column":182,"offset":2475}}}],"position":{"start":{"line":19,"column":1,"offset":2294},"end":{"line":19,"column":184,"offset":2477}}},"children":"To get started, you need to establish a connection that stays open. You have two main options: WebSockets and WebRTC. The one you pick really depends on what you're trying to build."}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"WebSockets","position":{"start":{"line":21,"column":5,"offset":2483},"end":{"line":21,"column":15,"offset":2493}}}],"position":{"start":{"line":21,"column":1,"offset":2479},"end":{"line":21,"column":17,"offset":2495}}},"children":"WebSockets"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"a","properties":{"href":"https://platform.openai.com/docs/guides/realtime-websocket"},"children":[{"type":"text","value":"WebSockets","position":{"start":{"line":23,"column":2,"offset":2498},"end":{"line":23,"column":12,"offset":2508}}}],"position":{"start":{"line":23,"column":1,"offset":2497},"end":{"line":23,"column":73,"offset":2569}}},{"type":"text","value":" create a two-way communication channel over a single, long-running connection. This is generally the best choice for server-to-server applications, like a backend service that hooks into a phone system.","position":{"start":{"line":23,"column":73,"offset":2569},"end":{"line":23,"column":276,"offset":2772}}}],"position":{"start":{"line":23,"column":1,"offset":2497},"end":{"line":23,"column":278,"offset":2774}}},"children":[["$","a",null,{"href":"https://platform.openai.com/docs/guides/realtime-websocket","node":"$45","children":"WebSockets"}]," create a two-way communication channel over a single, long-running connection. This is generally the best choice for server-to-server applications, like a backend service that hooks into a phone system."]}],"\n",["$","ul",null,{"className":"flex flex-col m-0 ml-5 list-disc gap-2 ps-0 mb-6 [&>:last-child]:mb-0","node":{"type":"element","tagName":"ul","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Best for:","position":{"start":{"line":25,"column":7,"offset":2782},"end":{"line":25,"column":16,"offset":2791}}}],"position":{"start":{"line":25,"column":5,"offset":2780},"end":{"line":25,"column":18,"offset":2793}}},{"type":"text","value":" Server-side setups, like a voice agent that answers phone calls.","position":{"start":{"line":25,"column":18,"offset":2793},"end":{"line":25,"column":83,"offset":2858}}}],"position":{"start":{"line":25,"column":5,"offset":2780},"end":{"line":25,"column":85,"offset":2860}}},{"type":"text","value":"\n"}],"position":{"start":{"line":25,"column":1,"offset":2776},"end":{"line":25,"column":85,"offset":2860}}},{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"How it works:","position":{"start":{"line":27,"column":7,"offset":2868},"end":{"line":27,"column":20,"offset":2881}}}],"position":{"start":{"line":27,"column":5,"offset":2866},"end":{"line":27,"column":22,"offset":2883}}},{"type":"text","value":" Your server connects to the API endpoint (\"wss://api.openai.com/v1/realtime\") using your standard OpenAI API key. From there, it's up to you to manage everything, including encoding raw audio into base64 and juggling the 37+ different events that manage the session.","position":{"start":{"line":27,"column":22,"offset":2883},"end":{"line":27,"column":289,"offset":3150}}}],"position":{"start":{"line":27,"column":5,"offset":2866},"end":{"line":27,"column":291,"offset":3152}}},{"type":"text","value":"\n"}],"position":{"start":{"line":27,"column":1,"offset":2862},"end":{"line":27,"column":291,"offset":3152}}},{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Limitation:","position":{"start":{"line":29,"column":7,"offset":3160},"end":{"line":29,"column":18,"offset":3171}}}],"position":{"start":{"line":29,"column":5,"offset":3158},"end":{"line":29,"column":20,"offset":3173}}},{"type":"text","value":" WebSockets run on TCP, which can sometimes introduce lag if packets need to be resent. This makes them a bit less reliable for apps running on a user's device where network conditions can be all over the place.","position":{"start":{"line":29,"column":20,"offset":3173},"end":{"line":29,"column":231,"offset":3384}}}],"position":{"start":{"line":29,"column":5,"offset":3158},"end":{"line":29,"column":233,"offset":3386}}},{"type":"text","value":"\n"}],"position":{"start":{"line":29,"column":1,"offset":3154},"end":{"line":29,"column":233,"offset":3386}}},{"type":"text","value":"\n"}],"position":{"start":{"line":25,"column":1,"offset":2776},"end":{"line":29,"column":233,"offset":3386}}},"children":["\n",["$","li","li-0",{"children":["\n",["$","p",null,{"className":"","node":"$4f","children":[["$","strong",null,{"className":"font-semibold","node":"$52","children":"Best for:"}]," Server-side setups, like a voice agent that answers phone calls."]}],"\n"]}],"\n",["$","li","li-1",{"children":["\n",["$","p",null,{"className":"","node":"$63","children":[["$","strong",null,{"className":"font-semibold","node":"$66","children":"How it works:"}]," Your server connects to the API endpoint (\"wss://api.openai.com/v1/realtime\") using your standard OpenAI API key. From there, it's up to you to manage everything, including encoding raw audio into base64 and juggling the 37+ different events that manage the session."]}],"\n"]}],"\n",["$","li","li-2",{"children":["\n",["$","p",null,{"className":"","node":"$77","children":[["$","strong",null,{"className":"font-semibold","node":"$7a","children":"Limitation:"}]," WebSockets run on TCP, which can sometimes introduce lag if packets need to be resent. This makes them a bit less reliable for apps running on a user's device where network conditions can be all over the place."]}],"\n"]}],"\n"]}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"WebRTC","position":{"start":{"line":31,"column":5,"offset":3392},"end":{"line":31,"column":11,"offset":3398}}}],"position":{"start":{"line":31,"column":1,"offset":3388},"end":{"line":31,"column":13,"offset":3400}}},"children":"WebRTC"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"a","properties":{"href":"https://platform.openai.com/docs/guides/realtime-webrtc"},"children":[{"type":"text","value":"WebRTC","position":{"start":{"line":33,"column":2,"offset":3403},"end":{"line":33,"column":8,"offset":3409}}}],"position":{"start":{"line":33,"column":1,"offset":3402},"end":{"line":33,"column":66,"offset":3467}}},{"type":"text","value":" is the technology that powers most real-time video and audio calls on the web. It's designed for peer-to-peer connections and is the way to go for any application running on the client side.","position":{"start":{"line":33,"column":66,"offset":3467},"end":{"line":33,"column":257,"offset":3658}}}],"position":{"start":{"line":33,"column":1,"offset":3402},"end":{"line":33,"column":259,"offset":3660}}},"children":[["$","a",null,{"href":"https://platform.openai.com/docs/guides/realtime-webrtc","node":"$8b","children":"WebRTC"}]," is the technology that powers most real-time video and audio calls on the web. It's designed for peer-to-peer connections and is the way to go for any application running on the client side."]}],"\n",["$","ul",null,{"className":"flex flex-col m-0 ml-5 list-disc gap-2 ps-0 mb-6 [&>:last-child]:mb-0","node":{"type":"element","tagName":"ul","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Best for:","position":{"start":{"line":35,"column":7,"offset":3668},"end":{"line":35,"column":16,"offset":3677}}}],"position":{"start":{"line":35,"column":5,"offset":3666},"end":{"line":35,"column":18,"offset":3679}}},{"type":"text","value":" Web or mobile apps running directly on a user's device.","position":{"start":{"line":35,"column":18,"offset":3679},"end":{"line":35,"column":74,"offset":3735}}}],"position":{"start":{"line":35,"column":5,"offset":3666},"end":{"line":35,"column":76,"offset":3737}}},{"type":"text","value":"\n"}],"position":{"start":{"line":35,"column":1,"offset":3662},"end":{"line":35,"column":76,"offset":3737}}},{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"How it works:","position":{"start":{"line":37,"column":7,"offset":3745},"end":{"line":37,"column":20,"offset":3758}}}],"position":{"start":{"line":37,"column":5,"offset":3743},"end":{"line":37,"column":22,"offset":3760}}},{"type":"text","value":" The user's browser connects directly to the Realtime API. You’d typically have your backend server generate a short-lived token for this, which keeps your main API key safe. WebRTC is much better at handling the messy reality of user networks, automatically adjusting for things like jitter and packet loss.","position":{"start":{"line":37,"column":22,"offset":3760},"end":{"line":37,"column":330,"offset":4068}}}],"position":{"start":{"line":37,"column":5,"offset":3743},"end":{"line":37,"column":332,"offset":4070}}},{"type":"text","value":"\n"}],"position":{"start":{"line":37,"column":1,"offset":3739},"end":{"line":37,"column":332,"offset":4070}}},{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Benefit:","position":{"start":{"line":39,"column":7,"offset":4078},"end":{"line":39,"column":15,"offset":4086}}}],"position":{"start":{"line":39,"column":5,"offset":4076},"end":{"line":39,"column":17,"offset":4088}}},{"type":"text","value":" It just works better for end-user devices. The connection is more stable and the latency is generally lower because it's built for streaming media.","position":{"start":{"line":39,"column":17,"offset":4088},"end":{"line":39,"column":165,"offset":4236}}}],"position":{"start":{"line":39,"column":5,"offset":4076},"end":{"line":39,"column":167,"offset":4238}}},{"type":"text","value":"\n"}],"position":{"start":{"line":39,"column":1,"offset":4072},"end":{"line":39,"column":167,"offset":4238}}},{"type":"text","value":"\n"}],"position":{"start":{"line":35,"column":1,"offset":3662},"end":{"line":39,"column":167,"offset":4238}}},"children":["\n",["$","li","li-0",{"children":["\n",["$","p",null,{"className":"","node":"$95","children":[["$","strong",null,{"className":"font-semibold","node":"$98","children":"Best for:"}]," Web or mobile apps running directly on a user's device."]}],"\n"]}],"\n",["$","li","li-1",{"children":["\n",["$","p",null,{"className":"","node":"$a9","children":[["$","strong",null,{"className":"font-semibold","node":"$ac","children":"How it works:"}]," The user's browser connects directly to the Realtime API. You’d typically have your backend server generate a short-lived token for this, which keeps your main API key safe. WebRTC is much better at handling the messy reality of user networks, automatically adjusting for things like jitter and packet loss."]}],"\n"]}],"\n",["$","li","li-2",{"children":["\n",["$","p",null,{"className":"","node":"$bd","children":[["$","strong",null,{"className":"font-semibold","node":"$c0","children":"Benefit:"}]," It just works better for end-user devices. The connection is more stable and the latency is generally lower because it's built for streaming media."]}],"\n"]}],"\n"]}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"Core features and use cases","position":{"start":{"line":43,"column":4,"offset":4247},"end":{"line":43,"column":31,"offset":4274}}}],"position":{"start":{"line":43,"column":1,"offset":4244},"end":{"line":43,"column":33,"offset":4276}}},"children":"Core features and use cases"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The Realtime API is about more than just speed; it opens the door to a whole new type of interactive app. Let's dig into what it can actually do.","position":{"start":{"line":45,"column":1,"offset":4278},"end":{"line":45,"column":146,"offset":4423}}}],"position":{"start":{"line":45,"column":1,"offset":4278},"end":{"line":45,"column":148,"offset":4425}}},"children":"The Realtime API is about more than just speed; it opens the door to a whole new type of interactive app. Let's dig into what it can actually do."}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Speech-to-speech conversation","position":{"start":{"line":47,"column":5,"offset":4431},"end":{"line":47,"column":34,"offset":4460}}}],"position":{"start":{"line":47,"column":1,"offset":4427},"end":{"line":47,"column":36,"offset":4462}}},"children":"Speech-to-speech conversation"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"This is the main event. The API can listen to a stream of audio, understand it, and generate a spoken reply almost instantly. And because it's using an \"omni-model\" like GPT-4o, it can pick up on the user's tone and even respond with its own personality.","position":{"start":{"line":49,"column":1,"offset":4464},"end":{"line":49,"column":255,"offset":4718}}}],"position":{"start":{"line":49,"column":1,"offset":4464},"end":{"line":49,"column":257,"offset":4720}}},"children":"This is the main event. The API can listen to a stream of audio, understand it, and generate a spoken reply almost instantly. And because it's using an \"omni-model\" like GPT-4o, it can pick up on the user's tone and even respond with its own personality."}],"\n",["$","ul",null,{"className":"flex flex-col m-0 ml-5 list-disc gap-2 ps-0 mb-6 [&>:last-child]:mb-0","node":{"type":"element","tagName":"ul","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Use case:","position":{"start":{"line":51,"column":7,"offset":4728},"end":{"line":51,"column":16,"offset":4737}}}],"position":{"start":{"line":51,"column":5,"offset":4726},"end":{"line":51,"column":18,"offset":4739}}},{"type":"text","value":" Building ","position":{"start":{"line":51,"column":18,"offset":4739},"end":{"line":51,"column":28,"offset":4749}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/ai-personal-assistants"},"children":[{"type":"text","value":"voice-first personal assistants","position":{"start":{"line":51,"column":29,"offset":4750},"end":{"line":51,"column":60,"offset":4781}}}],"position":{"start":{"line":51,"column":28,"offset":4749},"end":{"line":51,"column":111,"offset":4832}}},{"type":"text","value":", creating interactive stories, or designing hands-free controls for devices.","position":{"start":{"line":51,"column":111,"offset":4832},"end":{"line":51,"column":188,"offset":4909}}}],"position":{"start":{"line":51,"column":5,"offset":4726},"end":{"line":51,"column":190,"offset":4911}}},{"type":"text","value":"\n"}],"position":{"start":{"line":51,"column":1,"offset":4722},"end":{"line":51,"column":190,"offset":4911}}},{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"How it works:","position":{"start":{"line":53,"column":7,"offset":4919},"end":{"line":53,"column":20,"offset":4932}}}],"position":{"start":{"line":53,"column":5,"offset":4917},"end":{"line":53,"column":22,"offset":4934}}},{"type":"text","value":" You send audio from a microphone and get audio back from the model. The API does all the heavy lifting in between, which makes it much faster than a clunky STT -> LLM -> TTS pipeline.","position":{"start":{"line":53,"column":22,"offset":4934},"end":{"line":53,"column":206,"offset":5118}}}],"position":{"start":{"line":53,"column":5,"offset":4917},"end":{"line":53,"column":208,"offset":5120}}},{"type":"text","value":"\n"}],"position":{"start":{"line":53,"column":1,"offset":4913},"end":{"line":53,"column":208,"offset":5120}}},{"type":"text","value":"\n"}],"position":{"start":{"line":51,"column":1,"offset":4722},"end":{"line":53,"column":208,"offset":5120}}},"children":["\n",["$","li","li-0",{"children":["\n",["$","p",null,{"className":"","node":"$d1","children":[["$","strong",null,{"className":"font-semibold","node":"$d4","children":"Use case:"}]," Building ",["$","a",null,{"href":"https://www.eesel.ai/blog/ai-personal-assistants","node":"$e2","children":"voice-first personal assistants"}],", creating interactive stories, or designing hands-free controls for devices."]}],"\n"]}],"\n",["$","li","li-1",{"children":["\n",["$","p",null,{"className":"","node":"$f3","children":[["$","strong",null,{"className":"font-semibold","node":"$f6","children":"How it works:"}]," You send audio from a microphone and get audio back from the model. The API does all the heavy lifting in between, which makes it much faster than a clunky STT -> LLM -> TTS pipeline."]}],"\n"]}],"\n"]}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Live transcription","position":{"start":{"line":55,"column":5,"offset":5126},"end":{"line":55,"column":23,"offset":5144}}}],"position":{"start":{"line":55,"column":1,"offset":5122},"end":{"line":55,"column":25,"offset":5146}}},"children":"Live transcription"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"You don't have to use the voice generation part. The API works great as a pure transcription service. As you stream audio in, the server sends back text as it recognizes words and phrases.","position":{"start":{"line":57,"column":1,"offset":5148},"end":{"line":57,"column":189,"offset":5336}}}],"position":{"start":{"line":57,"column":1,"offset":5148},"end":{"line":57,"column":191,"offset":5338}}},"children":"You don't have to use the voice generation part. The API works great as a pure transcription service. As you stream audio in, the server sends back text as it recognizes words and phrases."}],"\n",["$","ul",null,{"className":"flex flex-col m-0 ml-5 list-disc gap-2 ps-0 mb-6 [&>:last-child]:mb-0","node":{"type":"element","tagName":"ul","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Use case:","position":{"start":{"line":59,"column":7,"offset":5346},"end":{"line":59,"column":16,"offset":5355}}}],"position":{"start":{"line":59,"column":5,"offset":5344},"end":{"line":59,"column":18,"offset":5357}}},{"type":"text","value":" Adding live captions to meetings, building dictation software, or ","position":{"start":{"line":59,"column":18,"offset":5357},"end":{"line":59,"column":85,"offset":5424}}},{"type":"element","tagName":"a","properties":{"href":"https://eesel.ai/solution/customer-support-automation"},"children":[{"type":"text","value":"monitoring customer support calls","position":{"start":{"line":59,"column":86,"offset":5425},"end":{"line":59,"column":119,"offset":5458}}}],"position":{"start":{"line":59,"column":85,"offset":5424},"end":{"line":59,"column":175,"offset":5514}}},{"type":"text","value":" as they happen.","position":{"start":{"line":59,"column":175,"offset":5514},"end":{"line":59,"column":191,"offset":5530}}}],"position":{"start":{"line":59,"column":5,"offset":5344},"end":{"line":59,"column":193,"offset":5532}}},{"type":"text","value":"\n"}],"position":{"start":{"line":59,"column":1,"offset":5340},"end":{"line":59,"column":193,"offset":5532}}},{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"How it works:","position":{"start":{"line":61,"column":7,"offset":5540},"end":{"line":61,"column":20,"offset":5553}}}],"position":{"start":{"line":61,"column":5,"offset":5538},"end":{"line":61,"column":22,"offset":5555}}},{"type":"text","value":" You just have to enable transcription when you set up the session. The API will then start sending \"conversation.item.input_audio_transcription.delta\" events with the transcribed text.","position":{"start":{"line":61,"column":22,"offset":5555},"end":{"line":61,"column":207,"offset":5740}}}],"position":{"start":{"line":61,"column":5,"offset":5538},"end":{"line":61,"column":209,"offset":5742}}},{"type":"text","value":"\n"}],"position":{"start":{"line":61,"column":1,"offset":5534},"end":{"line":61,"column":209,"offset":5742}}},{"type":"text","value":"\n"}],"position":{"start":{"line":59,"column":1,"offset":5340},"end":{"line":61,"column":209,"offset":5742}}},"children":["\n",["$","li","li-0",{"children":["\n",["$","p",null,{"className":"","node":"$107","children":[["$","strong",null,{"className":"font-semibold","node":"$10a","children":"Use case:"}]," Adding live captions to meetings, building dictation software, or ",["$","a",null,{"href":"https://eesel.ai/solution/customer-support-automation","node":"$118","children":"monitoring customer support calls"}]," as they happen."]}],"\n"]}],"\n",["$","li","li-1",{"children":["\n",["$","p",null,{"className":"","node":"$129","children":[["$","strong",null,{"className":"font-semibold","node":"$12c","children":"How it works:"}]," You just have to enable transcription when you set up the session. The API will then start sending \"conversation.item.input_audio_transcription.delta\" events with the transcribed text."]}],"\n"]}],"\n"]}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Function calling and tool use","position":{"start":{"line":63,"column":5,"offset":5748},"end":{"line":63,"column":34,"offset":5777}}}],"position":{"start":{"line":63,"column":1,"offset":5744},"end":{"line":63,"column":36,"offset":5779}}},"children":"Function calling and tool use"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Just like the main Chat Completions API, the Realtime API can use external tools. This lets the AI do things in other systems. Based on the conversation, the model can decide it needs to call a function, figure out the right arguments, and then use the result to give a better answer.","position":{"start":{"line":65,"column":1,"offset":5781},"end":{"line":65,"column":285,"offset":6065}}}],"position":{"start":{"line":65,"column":1,"offset":5781},"end":{"line":65,"column":287,"offset":6067}}},"children":"Just like the main Chat Completions API, the Realtime API can use external tools. This lets the AI do things in other systems. Based on the conversation, the model can decide it needs to call a function, figure out the right arguments, and then use the result to give a better answer."}],"\n",["$","ul",null,{"className":"flex flex-col m-0 ml-5 list-disc gap-2 ps-0 mb-6 [&>:last-child]:mb-0","node":{"type":"element","tagName":"ul","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Use case:","position":{"start":{"line":67,"column":7,"offset":6075},"end":{"line":67,"column":16,"offset":6084}}}],"position":{"start":{"line":67,"column":5,"offset":6073},"end":{"line":67,"column":18,"offset":6086}}},{"type":"text","value":" A voice agent that can ","position":{"start":{"line":67,"column":18,"offset":6086},"end":{"line":67,"column":42,"offset":6110}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/what-is-an-automated-order-processing-system-benefits-features-and-tools"},"children":[{"type":"text","value":"check a customer's order status","position":{"start":{"line":67,"column":43,"offset":6111},"end":{"line":67,"column":74,"offset":6142}}}],"position":{"start":{"line":67,"column":42,"offset":6110},"end":{"line":67,"column":175,"offset":6243}}},{"type":"text","value":" in your database, pull up the latest weather forecast, or book an appointment in a calendar.","position":{"start":{"line":67,"column":175,"offset":6243},"end":{"line":67,"column":268,"offset":6336}}}],"position":{"start":{"line":67,"column":5,"offset":6073},"end":{"line":67,"column":270,"offset":6338}}},{"type":"text","value":"\n"}],"position":{"start":{"line":67,"column":1,"offset":6069},"end":{"line":67,"column":270,"offset":6338}}},{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"How it works:","position":{"start":{"line":69,"column":7,"offset":6346},"end":{"line":69,"column":20,"offset":6359}}}],"position":{"start":{"line":69,"column":5,"offset":6344},"end":{"line":69,"column":22,"offset":6361}}},{"type":"text","value":" You tell the API what tools are available when you start the session. If the model wants to use one, it sends a \"function_call\" event. Your app does the work, sends the result back with a \"function_call_output\" event, and the model uses that info to carry on the conversation.","position":{"start":{"line":69,"column":22,"offset":6361},"end":{"line":69,"column":299,"offset":6638}}}],"position":{"start":{"line":69,"column":5,"offset":6344},"end":{"line":69,"column":301,"offset":6640}}},{"type":"text","value":"\n"}],"position":{"start":{"line":69,"column":1,"offset":6340},"end":{"line":69,"column":301,"offset":6640}}},{"type":"text","value":"\n"}],"position":{"start":{"line":67,"column":1,"offset":6069},"end":{"line":69,"column":301,"offset":6640}}},"children":["\n",["$","li","li-0",{"children":["\n",["$","p",null,{"className":"","node":"$13d","children":[["$","strong",null,{"className":"font-semibold","node":"$140","children":"Use case:"}]," A voice agent that can ",["$","a",null,{"href":"https://www.eesel.ai/blog/what-is-an-automated-order-processing-system-benefits-features-and-tools","node":"$14e","children":"check a customer's order status"}]," in your database, pull up the latest weather forecast, or book an appointment in a calendar."]}],"\n"]}],"\n",["$","li","li-1",{"children":["\n",["$","p",null,{"className":"","node":"$15f","children":[["$","strong",null,{"className":"font-semibold","node":"$162","children":"How it works:"}]," You tell the API what tools are available when you start the session. If the model wants to use one, it sends a \"function_call\" event. Your app does the work, sends the result back with a \"function_call_output\" event, and the model uses that info to carry on the conversation."]}],"\n"]}],"\n"]}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"The challenges of building with the raw API","position":{"start":{"line":71,"column":4,"offset":6645},"end":{"line":71,"column":47,"offset":6688}}}],"position":{"start":{"line":71,"column":1,"offset":6642},"end":{"line":71,"column":49,"offset":6690}}},"children":"The challenges of building with the raw API"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Okay, while the API is incredibly capable, building a ","position":{"start":{"line":73,"column":1,"offset":6692},"end":{"line":73,"column":55,"offset":6746}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/what-is-an-ai-virtual-assistant"},"children":[{"type":"text","value":"production-ready voice agent","position":{"start":{"line":73,"column":56,"offset":6747},"end":{"line":73,"column":84,"offset":6775}}}],"position":{"start":{"line":73,"column":55,"offset":6746},"end":{"line":73,"column":144,"offset":6835}}},{"type":"text","value":" with it from scratch is a serious engineering project. It's definitely not a plug-and-play solution, and it’s easy to underestimate the amount of work involved.","position":{"start":{"line":73,"column":144,"offset":6835},"end":{"line":73,"column":305,"offset":6996}}}],"position":{"start":{"line":73,"column":1,"offset":6692},"end":{"line":73,"column":307,"offset":6998}}},"children":["Okay, while the API is incredibly capable, building a ",["$","a",null,{"href":"https://www.eesel.ai/blog/what-is-an-ai-virtual-assistant","node":"$173","children":"production-ready voice agent"}]," with it from scratch is a serious engineering project. It's definitely not a plug-and-play solution, and it’s easy to underestimate the amount of work involved."]}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"1. Connection and audio management","position":{"start":{"line":75,"column":5,"offset":7004},"end":{"line":75,"column":39,"offset":7038}}}],"position":{"start":{"line":75,"column":1,"offset":7000},"end":{"line":75,"column":41,"offset":7040}}},"children":"1. Connection and audio management"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Just keeping a WebSocket or WebRTC connection stable is a challenge. You have to build logic to handle random disconnects, retries, and flaky networks. You're also responsible for wrangling raw audio formats like PCM16, which means capturing, encoding (to base64), and sending audio in just the right-sized chunks. A single voice chat can involve ","position":{"start":{"line":77,"column":1,"offset":7042},"end":{"line":77,"column":348,"offset":7389}}},{"type":"element","tagName":"a","properties":{"href":"https://www.latent.space/p/realtime-api"},"children":[{"type":"text","value":"over 37 different server and client events","position":{"start":{"line":77,"column":349,"offset":7390},"end":{"line":77,"column":391,"offset":7432}}}],"position":{"start":{"line":77,"column":348,"offset":7389},"end":{"line":77,"column":433,"offset":7474}}},{"type":"text","value":" you have to listen for and respond to. That's a ton of boilerplate code before you even get to the fun part.","position":{"start":{"line":77,"column":433,"offset":7474},"end":{"line":77,"column":542,"offset":7583}}}],"position":{"start":{"line":77,"column":1,"offset":7042},"end":{"line":77,"column":544,"offset":7585}}},"children":["Just keeping a WebSocket or WebRTC connection stable is a challenge. You have to build logic to handle random disconnects, retries, and flaky networks. You're also responsible for wrangling raw audio formats like PCM16, which means capturing, encoding (to base64), and sending audio in just the right-sized chunks. A single voice chat can involve ",["$","a",null,{"href":"https://www.latent.space/p/realtime-api","node":"$17d","children":"over 37 different server and client events"}]," you have to listen for and respond to. That's a ton of boilerplate code before you even get to the fun part."]}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"2. Latency and interruption handling","position":{"start":{"line":79,"column":5,"offset":7591},"end":{"line":79,"column":41,"offset":7627}}}],"position":{"start":{"line":79,"column":1,"offset":7587},"end":{"line":79,"column":43,"offset":7629}}},"children":"2. Latency and interruption handling"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"For a conversation to feel natural, you need the response time to be under 800 milliseconds. The API is fast, but that only leaves you about 300ms for everything else: the time it takes for data to travel over the network, audio processing on your end, and Voice Activity Detection (VAD). Even a Bluetooth headset can eat up 100-200ms of that budget.","position":{"start":{"line":81,"column":1,"offset":7631},"end":{"line":81,"column":351,"offset":7981}}}],"position":{"start":{"line":81,"column":1,"offset":7631},"end":{"line":81,"column":353,"offset":7983}}},"children":"For a conversation to feel natural, you need the response time to be under 800 milliseconds. The API is fast, but that only leaves you about 300ms for everything else: the time it takes for data to travel over the network, audio processing on your end, and Voice Activity Detection (VAD). Even a Bluetooth headset can eat up 100-200ms of that budget."}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Then there's the problem of interruptions. If a user starts talking while the AI is responding, you need to instantly stop the AI's audio, tell the server to forget what it was about to say, and process the user's new input. Getting this logic to work perfectly every single time is a massive headache.","position":{"start":{"line":83,"column":1,"offset":7985},"end":{"line":83,"column":303,"offset":8287}}}],"position":{"start":{"line":83,"column":1,"offset":7985},"end":{"line":83,"column":305,"offset":8289}}},"children":"Then there's the problem of interruptions. If a user starts talking while the AI is responding, you need to instantly stop the AI's audio, tell the server to forget what it was about to say, and process the user's new input. Getting this logic to work perfectly every single time is a massive headache."}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"3. Context and state management","position":{"start":{"line":85,"column":5,"offset":8295},"end":{"line":85,"column":36,"offset":8326}}}],"position":{"start":{"line":85,"column":1,"offset":8291},"end":{"line":85,"column":38,"offset":8328}}},"children":"3. Context and state management"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The API is pretty good at remembering the conversation history within a single session, but sessions are capped at 15 minutes. If you need a conversation to last longer or be picked up later, you're on your own. You have to build your own system to save and reload the chat history. The message format is also different from the standard Chat Completions API, so you can't easily reuse context between the two without transforming the data first.","position":{"start":{"line":87,"column":1,"offset":8330},"end":{"line":87,"column":447,"offset":8776}}}],"position":{"start":{"line":87,"column":1,"offset":8330},"end":{"line":87,"column":449,"offset":8778}}},"children":"The API is pretty good at remembering the conversation history within a single session, but sessions are capped at 15 minutes. If you need a conversation to last longer or be picked up later, you're on your own. You have to build your own system to save and reload the chat history. The message format is also different from the standard Chat Completions API, so you can't easily reuse context between the two without transforming the data first."}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"4. Cost unpredictability","position":{"start":{"line":89,"column":5,"offset":8784},"end":{"line":89,"column":29,"offset":8808}}}],"position":{"start":{"line":89,"column":1,"offset":8780},"end":{"line":89,"column":31,"offset":8810}}},"children":"4. Cost unpredictability"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The API charges you per minute for both input and output audio. OpenAI does some caching to lower the cost of repeated text, but for long conversations, the bill can get big, fast. A 10-minute chat can cost around $2.68. That might not sound like a lot, but at scale, it becomes a significant and unpredictable expense without some serious optimization work, like summarizing context or converting audio to text.","position":{"start":{"line":91,"column":1,"offset":8812},"end":{"line":91,"column":413,"offset":9224}}}],"position":{"start":{"line":91,"column":1,"offset":8812},"end":{"line":91,"column":415,"offset":9226}}},"children":"The API charges you per minute for both input and output audio. OpenAI does some caching to lower the cost of repeated text, but for long conversations, the bill can get big, fast. A 10-minute chat can cost around $2.68. That might not sound like a lot, but at scale, it becomes a significant and unpredictable expense without some serious optimization work, like summarizing context or converting audio to text."}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"These challenges mean that building directly on the API isn't a weekend project. It requires a team with real experience in real-time communication, audio engineering, and state management.","position":{"start":{"line":93,"column":1,"offset":9228},"end":{"line":93,"column":190,"offset":9417}}}],"position":{"start":{"line":93,"column":1,"offset":9228},"end":{"line":93,"column":192,"offset":9419}}},"children":"These challenges mean that building directly on the API isn't a weekend project. It requires a team with real experience in real-time communication, audio engineering, and state management."}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"A simpler, more powerful alternative: eesel AI","position":{"start":{"line":95,"column":4,"offset":9424},"end":{"line":95,"column":50,"offset":9470}}}],"position":{"start":{"line":95,"column":1,"offset":9421},"end":{"line":95,"column":52,"offset":9472}}},"children":"A simpler, more powerful alternative: eesel AI"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"After reading about all those hurdles, you might be thinking there has to be an easier way. And you're right. For businesses that want to use ","position":{"start":{"line":97,"column":1,"offset":9474},"end":{"line":97,"column":143,"offset":9616}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/product/ai-agent"},"children":[{"type":"text","value":"AI agents for customer support","position":{"start":{"line":97,"column":144,"offset":9617},"end":{"line":97,"column":174,"offset":9647}}}],"position":{"start":{"line":97,"column":143,"offset":9616},"end":{"line":97,"column":214,"offset":9687}}},{"type":"text","value":" or internal help, a platform like ","position":{"start":{"line":97,"column":214,"offset":9687},"end":{"line":97,"column":249,"offset":9722}}},{"type":"element","tagName":"a","properties":{"href":"https://eesel.ai"},"children":[{"type":"text","value":"eesel AI","position":{"start":{"line":97,"column":250,"offset":9723},"end":{"line":97,"column":258,"offset":9731}}}],"position":{"start":{"line":97,"column":249,"offset":9722},"end":{"line":97,"column":277,"offset":9750}}},{"type":"text","value":" handles all that underlying grunt work, letting you focus on the actual user experience.","position":{"start":{"line":97,"column":277,"offset":9750},"end":{"line":97,"column":366,"offset":9839}}}],"position":{"start":{"line":97,"column":1,"offset":9474},"end":{"line":97,"column":368,"offset":9841}}},"children":["After reading about all those hurdles, you might be thinking there has to be an easier way. And you're right. For businesses that want to use ",["$","a",null,{"href":"https://www.eesel.ai/product/ai-agent","node":"$187","children":"AI agents for customer support"}]," or internal help, a platform like ",["$","a",null,{"href":"https://eesel.ai","node":"$191","children":"eesel AI"}]," handles all that underlying grunt work, letting you focus on the actual user experience."]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Here’s how eesel AI sidesteps the challenges of the raw API:","position":{"start":{"line":99,"column":1,"offset":9843},"end":{"line":99,"column":61,"offset":9903}}}],"position":{"start":{"line":99,"column":1,"offset":9843},"end":{"line":99,"column":63,"offset":9905}}},"children":"Here’s how eesel AI sidesteps the challenges of the raw API:"}],"\n",["$","ul",null,{"className":"flex flex-col m-0 ml-5 list-disc gap-2 ps-0 mb-6 [&>:last-child]:mb-0","node":{"type":"element","tagName":"ul","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Go live in minutes, not months:","position":{"start":{"line":101,"column":7,"offset":9913},"end":{"line":101,"column":38,"offset":9944}}}],"position":{"start":{"line":101,"column":5,"offset":9911},"end":{"line":101,"column":40,"offset":9946}}},{"type":"text","value":" Instead of fighting with WebSockets, audio encoding, and a maze of events, eesel AI has one-click integrations for ","position":{"start":{"line":101,"column":40,"offset":9946},"end":{"line":101,"column":156,"offset":10062}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/how-to-use-ai-helpdesk-tools-to-transform-support"},"children":[{"type":"text","value":"help desks","position":{"start":{"line":101,"column":157,"offset":10063},"end":{"line":101,"column":167,"offset":10073}}}],"position":{"start":{"line":101,"column":156,"offset":10062},"end":{"line":101,"column":245,"offset":10151}}},{"type":"text","value":" like ","position":{"start":{"line":101,"column":245,"offset":10151},"end":{"line":101,"column":251,"offset":10157}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/integration/zendesk"},"children":[{"type":"text","value":"Zendesk","position":{"start":{"line":101,"column":252,"offset":10158},"end":{"line":101,"column":259,"offset":10165}}}],"position":{"start":{"line":101,"column":251,"offset":10157},"end":{"line":101,"column":302,"offset":10208}}},{"type":"text","value":" and ","position":{"start":{"line":101,"column":302,"offset":10208},"end":{"line":101,"column":307,"offset":10213}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/integration/freshdesk"},"children":[{"type":"text","value":"Freshdesk","position":{"start":{"line":101,"column":308,"offset":10214},"end":{"line":101,"column":317,"offset":10223}}}],"position":{"start":{"line":101,"column":307,"offset":10213},"end":{"line":101,"column":362,"offset":10268}}},{"type":"text","value":", plus chat platforms like ","position":{"start":{"line":101,"column":362,"offset":10268},"end":{"line":101,"column":389,"offset":10295}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/integration/slack"},"children":[{"type":"text","value":"Slack","position":{"start":{"line":101,"column":390,"offset":10296},"end":{"line":101,"column":395,"offset":10301}}}],"position":{"start":{"line":101,"column":389,"offset":10295},"end":{"line":101,"column":436,"offset":10342}}},{"type":"text","value":". You can get a working AI agent up and running yourself in a few minutes.","position":{"start":{"line":101,"column":436,"offset":10342},"end":{"line":101,"column":510,"offset":10416}}}],"position":{"start":{"line":101,"column":5,"offset":9911},"end":{"line":101,"column":512,"offset":10418}}},{"type":"text","value":"\n"}],"position":{"start":{"line":101,"column":1,"offset":9907},"end":{"line":101,"column":512,"offset":10418}}},{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Total control without the complexity:","position":{"start":{"line":103,"column":7,"offset":10426},"end":{"line":103,"column":44,"offset":10463}}}],"position":{"start":{"line":103,"column":5,"offset":10424},"end":{"line":103,"column":46,"offset":10465}}},{"type":"text","value":" eesel AI gives you a simple UI with a powerful workflow engine. You can decide which tickets the AI handles, tweak its personality with a prompt editor, and set up custom actions (like looking up order info) without having to write a bunch of code to manage function calls.","position":{"start":{"line":103,"column":46,"offset":10465},"end":{"line":103,"column":320,"offset":10739}}}],"position":{"start":{"line":103,"column":5,"offset":10424},"end":{"line":103,"column":322,"offset":10741}}},{"type":"text","value":"\n"}],"position":{"start":{"line":103,"column":1,"offset":10420},"end":{"line":103,"column":322,"offset":10741}}},{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Unified knowledge, instantly:","position":{"start":{"line":105,"column":7,"offset":10749},"end":{"line":105,"column":36,"offset":10778}}}],"position":{"start":{"line":105,"column":5,"offset":10747},"end":{"line":105,"column":38,"offset":10780}}},{"type":"text","value":" One of the biggest wins is that eesel AI automatically ","position":{"start":{"line":105,"column":38,"offset":10780},"end":{"line":105,"column":94,"offset":10836}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/how-to-build-an-ai-knowledge-base-in-2025"},"children":[{"type":"text","value":"learns from your existing knowledge","position":{"start":{"line":105,"column":95,"offset":10837},"end":{"line":105,"column":130,"offset":10872}}}],"position":{"start":{"line":105,"column":94,"offset":10836},"end":{"line":105,"column":200,"offset":10942}}},{"type":"text","value":". It can sync with your past support tickets, help center articles, and other docs living in places like ","position":{"start":{"line":105,"column":200,"offset":10942},"end":{"line":105,"column":305,"offset":11047}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/integration/confluence"},"children":[{"type":"text","value":"Confluence","position":{"start":{"line":105,"column":306,"offset":11048},"end":{"line":105,"column":316,"offset":11058}}}],"position":{"start":{"line":105,"column":305,"offset":11047},"end":{"line":105,"column":362,"offset":11104}}},{"type":"text","value":" or ","position":{"start":{"line":105,"column":362,"offset":11104},"end":{"line":105,"column":366,"offset":11108}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/integration/google-docs"},"children":[{"type":"text","value":"Google Docs","position":{"start":{"line":105,"column":367,"offset":11109},"end":{"line":105,"column":378,"offset":11120}}}],"position":{"start":{"line":105,"column":366,"offset":11108},"end":{"line":105,"column":425,"offset":11167}}},{"type":"text","value":". It pulls everything together into one brain, which is something the Realtime API just doesn't do.","position":{"start":{"line":105,"column":425,"offset":11167},"end":{"line":105,"column":524,"offset":11266}}}],"position":{"start":{"line":105,"column":5,"offset":10747},"end":{"line":105,"column":526,"offset":11268}}},{"type":"text","value":"\n"}],"position":{"start":{"line":105,"column":1,"offset":10743},"end":{"line":105,"column":526,"offset":11268}}},{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Transparent and predictable pricing:","position":{"start":{"line":107,"column":7,"offset":11276},"end":{"line":107,"column":43,"offset":11312}}}],"position":{"start":{"line":107,"column":5,"offset":11274},"end":{"line":107,"column":45,"offset":11314}}},{"type":"text","value":" With eesel AI, you get plans based on a set number of AI interactions, with no extra fees per resolution. This makes your costs predictable, so you're not penalized for having a busy month. It's a lot easier to budget for than the raw API's per-minute pricing.","position":{"start":{"line":107,"column":45,"offset":11314},"end":{"line":107,"column":306,"offset":11575}}}],"position":{"start":{"line":107,"column":5,"offset":11274},"end":{"line":107,"column":308,"offset":11577}}},{"type":"text","value":"\n"}],"position":{"start":{"line":107,"column":1,"offset":11270},"end":{"line":107,"column":308,"offset":11577}}},{"type":"text","value":"\n"}],"position":{"start":{"line":101,"column":1,"offset":9907},"end":{"line":107,"column":308,"offset":11577}}},"children":["\n",["$","li","li-0",{"children":["\n",["$","p",null,{"className":"","node":"$19b","children":[["$","strong",null,{"className":"font-semibold","node":"$19e","children":"Go live in minutes, not months:"}]," Instead of fighting with WebSockets, audio encoding, and a maze of events, eesel AI has one-click integrations for ",["$","a",null,{"href":"https://www.eesel.ai/blog/how-to-use-ai-helpdesk-tools-to-transform-support","node":"$1ac","children":"help desks"}]," like ",["$","a",null,{"href":"https://www.eesel.ai/integration/zendesk","node":"$1ba","children":"Zendesk"}]," and ",["$","a",null,{"href":"https://www.eesel.ai/integration/freshdesk","node":"$1c8","children":"Freshdesk"}],", plus chat platforms like ",["$","a",null,{"href":"https://www.eesel.ai/integration/slack","node":"$1d6","children":"Slack"}],". You can get a working AI agent up and running yourself in a few minutes."]}],"\n"]}],"\n",["$","li","li-1",{"children":["\n",["$","p",null,{"className":"","node":"$1e7","children":[["$","strong",null,{"className":"font-semibold","node":"$1ea","children":"Total control without the complexity:"}]," eesel AI gives you a simple UI with a powerful workflow engine. You can decide which tickets the AI handles, tweak its personality with a prompt editor, and set up custom actions (like looking up order info) without having to write a bunch of code to manage function calls."]}],"\n"]}],"\n",["$","li","li-2",{"children":["\n",["$","p",null,{"className":"","node":"$1fb","children":[["$","strong",null,{"className":"font-semibold","node":"$1fe","children":"Unified knowledge, instantly:"}]," One of the biggest wins is that eesel AI automatically ",["$","a",null,{"href":"https://www.eesel.ai/blog/how-to-build-an-ai-knowledge-base-in-2025","node":"$20c","children":"learns from your existing knowledge"}],". It can sync with your past support tickets, help center articles, and other docs living in places like ",["$","a",null,{"href":"https://www.eesel.ai/integration/confluence","node":"$21a","children":"Confluence"}]," or ",["$","a",null,{"href":"https://www.eesel.ai/integration/google-docs","node":"$228","children":"Google Docs"}],". It pulls everything together into one brain, which is something the Realtime API just doesn't do."]}],"\n"]}],"\n",["$","li","li-3",{"children":["\n",["$","p",null,{"className":"","node":"$239","children":[["$","strong",null,{"className":"font-semibold","node":"$23c","children":"Transparent and predictable pricing:"}]," With eesel AI, you get plans based on a set number of AI interactions, with no extra fees per resolution. This makes your costs predictable, so you're not penalized for having a busy month. It's a lot easier to budget for than the raw API's per-minute pricing."]}],"\n"]}],"\n"]}],"\n",["$","pre",null,{"className":"flex flex-col gap-3 text-base text-[#808080] font-default mb-5 text-wrap","node":{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"img","properties":{"loading":"lazy","decoding":"async","className":["alignnone","size-medium","wp-image"],"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/09/04-Infographic-eeselAI-Knowledge-Integration-Infographic.png","alt":"An infographic showing how eesel AI unifies knowledge from various sources like Zendesk, Freshdesk, and Slack to simplify building powerful AI agents, bypassing the complexities of the raw OpenAI Realtime API Reference.::","width":300,"height":169},"children":[],"position":{"start":{"line":109,"column":6,"offset":11584},"end":{"line":109,"column":460,"offset":12038}}},{"type":"text","value":"An infographic showing how eesel AI unifies knowledge from various sources like Zendesk, Freshdesk, and Slack to simplify building powerful AI agents, bypassing the complexities of the raw OpenAI Realtime API Reference.","position":{"start":{"line":109,"column":460,"offset":12038},"end":{"line":109,"column":679,"offset":12257}}}],"position":{"start":{"line":109,"column":1,"offset":11579},"end":{"line":109,"column":685,"offset":12263}}},"children":[["$","span",null,{"style":{"display":"block","position":"relative","width":"100%","aspectRatio":"300 / 169"},"children":["$","$L22",null,{"image":{"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/09/04-Infographic-eeselAI-Knowledge-Integration-Infographic.png","alt":"An infographic showing how eesel AI unifies knowledge from various sources like Zendesk, Freshdesk, and Slack to simplify building powerful AI agents, bypassing the complexities of the raw OpenAI Realtime API Reference.::","mediaDetails":{"width":300,"height":169}},"fill":true,"style":{"objectFit":"contain"},"className":"w-full h-auto border-2 border-[#e0e0e0] rounded-md overflow-hidden","sizes":"(max-width: 768px) 100vw, 700px"}]}],"An infographic showing how eesel AI unifies knowledge from various sources like Zendesk, Freshdesk, and Slack to simplify building powerful AI agents, bypassing the complexities of the raw OpenAI Realtime API Reference."]}]," \n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Building a good voice agent is about more than just wiring up an API. It's about creating a system that's reliable, smart, and understands context. The OpenAI Realtime API gives you the engine, but a platform like ","position":{"start":{"line":111,"column":1,"offset":12267},"end":{"line":111,"column":215,"offset":12481}}},{"type":"element","tagName":"a","properties":{"href":"https://eesel.ai"},"children":[{"type":"text","value":"eesel AI","position":{"start":{"line":111,"column":216,"offset":12482},"end":{"line":111,"column":224,"offset":12490}}}],"position":{"start":{"line":111,"column":215,"offset":12481},"end":{"line":111,"column":243,"offset":12509}}},{"type":"text","value":" gives you the whole car, ready to go.","position":{"start":{"line":111,"column":243,"offset":12509},"end":{"line":111,"column":281,"offset":12547}}}],"position":{"start":{"line":111,"column":1,"offset":12267},"end":{"line":111,"column":283,"offset":12549}}},"children":["Building a good voice agent is about more than just wiring up an API. It's about creating a system that's reliable, smart, and understands context. The OpenAI Realtime API gives you the engine, but a platform like ",["$","a",null,{"href":"https://eesel.ai","node":"$24d","children":"eesel AI"}]," gives you the whole car, ready to go."]}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"OpenAI Realtime API pricing","position":{"start":{"line":113,"column":4,"offset":12554},"end":{"line":113,"column":31,"offset":12581}}}],"position":{"start":{"line":113,"column":1,"offset":12551},"end":{"line":113,"column":33,"offset":12583}}},"children":"OpenAI Realtime API pricing"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Let's break down the numbers. The OpenAI Realtime API is ","position":{"start":{"line":115,"column":1,"offset":12585},"end":{"line":115,"column":58,"offset":12642}}},{"type":"element","tagName":"a","properties":{"href":"https://openai.com/pricing"},"children":[{"type":"text","value":"priced based on how many minutes of audio are processed","position":{"start":{"line":115,"column":59,"offset":12643},"end":{"line":115,"column":114,"offset":12698}}}],"position":{"start":{"line":115,"column":58,"offset":12642},"end":{"line":115,"column":143,"offset":12727}}},{"type":"text","value":", with different rates for input and output. Based on what developers in the community have shared, the costs shake out to something like this:","position":{"start":{"line":115,"column":143,"offset":12727},"end":{"line":115,"column":286,"offset":12870}}}],"position":{"start":{"line":115,"column":1,"offset":12585},"end":{"line":115,"column":288,"offset":12872}}},"children":["Let's break down the numbers. The OpenAI Realtime API is ",["$","a",null,{"href":"https://openai.com/pricing","node":"$257","children":"priced based on how many minutes of audio are processed"}],", with different rates for input and output. Based on what developers in the community have shared, the costs shake out to something like this:"]}],"\n",["$","ul",null,{"className":"flex flex-col m-0 ml-5 list-disc gap-2 ps-0 mb-6 [&>:last-child]:mb-0","node":{"type":"element","tagName":"ul","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Audio Input:","position":{"start":{"line":117,"column":7,"offset":12880},"end":{"line":117,"column":19,"offset":12892}}}],"position":{"start":{"line":117,"column":5,"offset":12878},"end":{"line":117,"column":21,"offset":12894}}},{"type":"text","value":" ~$0.06 per minute","position":{"start":{"line":117,"column":21,"offset":12894},"end":{"line":117,"column":39,"offset":12912}}}],"position":{"start":{"line":117,"column":5,"offset":12878},"end":{"line":117,"column":41,"offset":12914}}},{"type":"text","value":"\n"}],"position":{"start":{"line":117,"column":1,"offset":12874},"end":{"line":117,"column":41,"offset":12914}}},{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Audio Output:","position":{"start":{"line":119,"column":7,"offset":12922},"end":{"line":119,"column":20,"offset":12935}}}],"position":{"start":{"line":119,"column":5,"offset":12920},"end":{"line":119,"column":22,"offset":12937}}},{"type":"text","value":" ~$0.24 per minute","position":{"start":{"line":119,"column":22,"offset":12937},"end":{"line":119,"column":40,"offset":12955}}}],"position":{"start":{"line":119,"column":5,"offset":12920},"end":{"line":119,"column":42,"offset":12957}}},{"type":"text","value":"\n"}],"position":{"start":{"line":119,"column":1,"offset":12916},"end":{"line":119,"column":42,"offset":12957}}},{"type":"text","value":"\n"}],"position":{"start":{"line":117,"column":1,"offset":12874},"end":{"line":119,"column":42,"offset":12957}}},"children":["\n",["$","li","li-0",{"children":["\n",["$","p",null,{"className":"","node":"$261","children":[["$","strong",null,{"className":"font-semibold","node":"$264","children":"Audio Input:"}]," ~$0.06 per minute"]}],"\n"]}],"\n",["$","li","li-1",{"children":["\n",["$","p",null,{"className":"","node":"$275","children":[["$","strong",null,{"className":"font-semibold","node":"$278","children":"Audio Output:"}]," ~$0.24 per minute"]}],"\n"]}],"\n"]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"OpenAI automatically caches input tokens, which can cut the cost of repeated context in a long conversation by around 80%. But even with that discount, the costs add up. A 10-minute conversation where people are talking 70% of the time can cost about ","position":{"start":{"line":121,"column":1,"offset":12959},"end":{"line":121,"column":252,"offset":13210}}},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"$$2.68","position":{"start":{"line":121,"column":254,"offset":13212},"end":{"line":121,"column":259,"offset":13217}}}],"position":{"start":{"line":121,"column":252,"offset":13210},"end":{"line":121,"column":261,"offset":13219}}},{"type":"text","value":". For a business, this usage-based model can make your monthly bill a bit of a guessing game.","position":{"start":{"line":121,"column":261,"offset":13219},"end":{"line":121,"column":354,"offset":13312}}}],"position":{"start":{"line":121,"column":1,"offset":12959},"end":{"line":121,"column":356,"offset":13314}}},"children":["OpenAI automatically caches input tokens, which can cut the cost of repeated context in a long conversation by around 80%. But even with that discount, the costs add up. A 10-minute conversation where people are talking 70% of the time can cost about ",["$","strong",null,{"className":"font-semibold","node":"$289","children":"$$2.68"}],". For a business, this usage-based model can make your monthly bill a bit of a guessing game."]}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"Final thoughts on the OpenAI Realtime API Reference","position":{"start":{"line":123,"column":4,"offset":13319},"end":{"line":123,"column":55,"offset":13370}}}],"position":{"start":{"line":123,"column":1,"offset":13316},"end":{"line":123,"column":57,"offset":13372}}},"children":"Final thoughts on the OpenAI Realtime API Reference"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The OpenAI Realtime API is a fantastic tool for building voice-first AI apps. It has the speed and multimodal power needed for conversations that feel natural. However, a close look at the \"OpenAI Realtime API Reference\" shows it's a low-level tool that takes a lot of engineering work to use well. From managing connections and audio streams to handling interruptions and unpredictable costs, building a production-ready agent is a serious undertaking.","position":{"start":{"line":125,"column":1,"offset":13374},"end":{"line":125,"column":454,"offset":13827}}}],"position":{"start":{"line":125,"column":1,"offset":13374},"end":{"line":125,"column":456,"offset":13829}}},"children":"The OpenAI Realtime API is a fantastic tool for building voice-first AI apps. It has the speed and multimodal power needed for conversations that feel natural. However, a close look at the \"OpenAI Realtime API Reference\" shows it's a low-level tool that takes a lot of engineering work to use well. From managing connections and audio streams to handling interruptions and unpredictable costs, building a production-ready agent is a serious undertaking."}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"For businesses that just want to ","position":{"start":{"line":127,"column":1,"offset":13831},"end":{"line":127,"column":34,"offset":13864}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/how-can-ai-automate-customer-support-a-helpful-guide"},"children":[{"type":"text","value":"automate support","position":{"start":{"line":127,"column":35,"offset":13865},"end":{"line":127,"column":51,"offset":13881}}}],"position":{"start":{"line":127,"column":34,"offset":13864},"end":{"line":127,"column":132,"offset":13962}}},{"type":"text","value":" and work more efficiently, a platform that hides all that complexity is a life-saver. ","position":{"start":{"line":127,"column":132,"offset":13962},"end":{"line":127,"column":219,"offset":14049}}},{"type":"element","tagName":"a","properties":{"href":"https://eesel.ai"},"children":[{"type":"text","value":"eesel AI","position":{"start":{"line":127,"column":220,"offset":14050},"end":{"line":127,"column":228,"offset":14058}}}],"position":{"start":{"line":127,"column":219,"offset":14049},"end":{"line":127,"column":247,"offset":14077}}},{"type":"text","value":" provides a fully-managed solution that lets you launch powerful, custom agents in minutes, all with pricing that makes sense.","position":{"start":{"line":127,"column":247,"offset":14077},"end":{"line":127,"column":373,"offset":14203}}}],"position":{"start":{"line":127,"column":1,"offset":13831},"end":{"line":127,"column":375,"offset":14205}}},"children":["For businesses that just want to ",["$","a",null,{"href":"https://www.eesel.ai/blog/how-can-ai-automate-customer-support-a-helpful-guide","node":"$293","children":"automate support"}]," and work more efficiently, a platform that hides all that complexity is a life-saver. ",["$","a",null,{"href":"https://eesel.ai","node":"$29d","children":"eesel AI"}]," provides a fully-managed solution that lets you launch powerful, custom agents in minutes, all with pricing that makes sense."]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Ready to see what a production-ready AI agent can do for your team? ","position":{"start":{"line":129,"column":1,"offset":14207},"end":{"line":129,"column":69,"offset":14275}}},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"element","tagName":"a","properties":{"href":"https://dashboard.eesel.ai/api/auth/signup?returnTo=v2"},"children":[{"type":"text","value":"Start your free eesel AI trial today","position":{"start":{"line":129,"column":72,"offset":14278},"end":{"line":129,"column":108,"offset":14314}}}],"position":{"start":{"line":129,"column":71,"offset":14277},"end":{"line":129,"column":165,"offset":14371}}}],"position":{"start":{"line":129,"column":69,"offset":14275},"end":{"line":129,"column":167,"offset":14373}}},{"type":"text","value":".","position":{"start":{"line":129,"column":167,"offset":14373},"end":{"line":129,"column":168,"offset":14374}}}],"position":{"start":{"line":129,"column":1,"offset":14207},"end":{"line":129,"column":170,"offset":14376}}},"children":["Ready to see what a production-ready AI agent can do for your team? ",["$","strong",null,{"className":"font-semibold","node":"$2a7","children":["$","a",null,{"href":"https://dashboard.eesel.ai/api/auth/signup?returnTo=v2","node":"$2aa","children":"Start your free eesel AI trial today"}]}],"."]}],"\n",["$","$L2b7",null,{"categoryName":"guides-en"}]]}]]}]}]}]]}],false,["$","div",null,{"children":[["$","$L2b8","0-AcfFaqs",{"children":["$","$11",null,{"fallback":null,"children":["$","$L2b9",null,{"_data":"$2ba","extra":{"faqs":{"hasTopMargin":true,"isBlogPage":true},"blogCategory":"guides-en","textBlock":{"isFirstTextBlock":false}}}]}]}]]}],false]}]]}],["$","div",null,{"className":"relative hidden dskxl:flex flex-col gap-6 ","children":["$","div",null,{"className":"sticky top-[92px]","children":["$","$L2c6",null,{"BASE_URL":"https://www.eesel.ai","locale":"EN","shareUrl":"https://www.eesel.ai/en/blog/openai-realtime-api-reference-en","categoryName":"guides-en"}]}]}]]}],["$","div",null,{"className":"grid gap-[72px] place-items-center py-12 tblsm:py-18 h-fit max-w-[800px] mx-auto dsklg:max-w-full","children":[["$","$L2c7",null,{"url":"https://www.eesel.ai/en/blog/openai-realtime-api-reference-en","title":"An engineer's guide to the OpenAI Realtime API reference - eesel AI","isTextCentered":true}],["$","$L2c8",null,{"data":"$2c9"}]]}]]}]]}],["$","$L2ec",null,{"relateds":[{"id":"cG9zdDo3NTYyNQ==","title":"Koala AI pricing in 2025: A complete breakdown","excerpt":"

Is Koala AI pricing worth it? We break down every plan, the hidden costs of using GPT-4, and the real cost per article to help you decide.

\n","slug":"koala-ai-pricing-en","date":"2025-11-25T06:25:11","language":{"slug":"en"},"featuredImage":{"node":{"altText":"","mediaDetails":{"width":1785,"height":949},"sourceUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/Banner-Top-7-solutions-for-AI-for-ticketing-systems-in-2025.png"}},"author":{"node":{"firstName":"Stevia","lastName":"Putri","authors":{"avatar":{"node":{"altText":"","mediaItemUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/IMG-20250812-WA0014-e1755016187283.jpg","mediaDetails":{"width":544,"height":1013}}},"role":"Writer","roleFrench":"Writer","roleGerman":"Writer","roleSpanish":"Writer","rolePortuguese":"Writer","roleJapanese":"Writer"}}},"postMeta":{"minsRead":null}},{"id":"cG9zdDo3NTYxNA==","title":"Koala AI review","excerpt":"

Our in-depth Koala AI review explores its features, pros, and cons. Discover if this AI writer is right for you or if its pricing and support issues are a deal-breaker.

\n","slug":"koala-ai-review-en","date":"2025-11-25T06:16:50","language":{"slug":"en"},"featuredImage":{"node":{"altText":"","mediaDetails":{"width":1785,"height":949},"sourceUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/Banner-The-6-best-AI-chat-for-e-commerce-solutions-for-brands-in-2025.png"}},"author":{"node":{"firstName":"Stevia","lastName":"Putri","authors":{"avatar":{"node":{"altText":"","mediaItemUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/IMG-20250812-WA0014-e1755016187283.jpg","mediaDetails":{"width":544,"height":1013}}},"role":"Writer","roleFrench":"Writer","roleGerman":"Writer","roleSpanish":"Writer","rolePortuguese":"Writer","roleJapanese":"Writer"}}},"postMeta":{"minsRead":null}},{"id":"cG9zdDo3NTYxMw==","title":"What is Koala AI? A clear guide to the name on everyone's lips in 2025","excerpt":"

Confused by \"Koala AI\"? You're not alone. This guide breaks down the different tools, from content writers to chatbots, and helps you find the right solution.

\n","slug":"koala-ai-en","date":"2025-11-25T06:15:45","language":{"slug":"en"},"featuredImage":{"node":{"altText":"","mediaDetails":{"width":1785,"height":949},"sourceUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/Banner-The-7-Best-AI-Scheduling-Assistant-Tools-in-2025-Features-Pricing.png"}},"author":{"node":{"firstName":"Kenneth","lastName":"Pangan","authors":{"avatar":{"node":{"altText":"","mediaItemUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/01/ff982460-eca1-4f0e-b1db-aa9ad25df868.jpg","mediaDetails":{"width":1894,"height":3718}}},"role":"Writer","roleFrench":"Écrivain","roleGerman":"Schriftsteller","roleSpanish":"Escritor","rolePortuguese":"Escritor","roleJapanese":"作家"}}},"postMeta":{"minsRead":null}}]}]]}]