8:["$","div",null,{"className":"page bg-white","children":[["$","article",null,{"className":"mb-10 p-6 tblsm:p-10 dsk:px-[72px] dsk:pt-[120px] pb-0 max-w-[1644px] mx-auto [&_section]:mb-[50px] [&_[data-quote]]:mt-0 [&_.container]:p-0 tblsm:[&_.container]:p-0 tblsm:[&_.columns]:!block tblsm:pt-8 ","children":[["$","$L20",null,{"data":{"id":"cG9zdDo0ODEwOQ==","title":"A practical guide to OpenAI Evaluation for LLM applications","excerpt":"

OpenAI Evaluation is a powerful framework for testing LLMs, but it’s complex and developer-focused. This guide breaks down the essentials and introduces a user-friendly alternative for businesses to confidently test and automate their support workflows.

\n","slug":"openai-evaluation-en","date":"2025-10-13T00:15:51","dateGmt":"2025-10-13T00:15:51","modified":"2025-11-14T14:39:42","language":{"slug":"en"},"featuredImage":{"node":{"altText":"","mediaDetails":{"width":1784,"height":948},"sourceUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/10/Banner-Product-A-practical-guide-to-OpenAI-Evaluation-for-LLM-applications.png"}},"postMeta":{"banner":null,"minsRead":null,"hideHeroImage":false,"reviewer":{"nodes":[{"name":"Katelin Teen","firstName":"Katelin","lastName":"Teen","authors":{"avatar":{"node":{"altText":"","mediaItemUrl":"https://website-cms.eesel.ai/wp-content/uploads/2024/10/katelin-profile-e1752733682107.jpeg","mediaDetails":{"width":752,"height":765}}}}}]}},"author":{"node":{"firstName":"Stevia","lastName":"Putri","description":"Stevia Putri is a marketing generalist at eesel AI, where she helps turn powerful AI tools into stories that resonate. She’s driven by curiosity, clarity, and the human side of technology.","email":null,"seo":{"social":{"facebook":"","instagram":"instagram.com/steviaanlena","linkedIn":"https://www.linkedin.com/in/steviaputri/","twitter":"https://x.com/steviaanlena"}},"authors":{"avatar":{"node":{"altText":"","mediaItemUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/IMG-20250812-WA0014-e1755016187283.jpg","mediaDetails":{"width":544,"height":1013}}},"role":"Writer","roleFrench":"Writer","roleGerman":"Writer","roleSpanish":"Writer","rolePortuguese":"Writer","roleJapanese":"Writer"}}},"categories":{"nodes":[{"slug":"guides-en","name":"Guides"}]},"tags":{"edges":[]},"seo":{"canonical":"https://www.eesel.ai//openai-evaluation-en","title":"A practical guide to OpenAI Evaluation for LLM applications - eesel AI","metaDesc":"Learn what OpenAI Evaluation is, how it works, and its limitations. Discover a smarter, no-code way to test and deploy reliable AI for customer support.","focuskw":"","opengraphTitle":"A practical guide to OpenAI Evaluation for LLM applications","opengraphDescription":"Learn what OpenAI Evaluation is, how it works, and its limitations. Discover a smarter, no-code way to test and deploy reliable AI for customer support.","opengraphImage":{"altText":"","sourceUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/10/Banner-Product-A-practical-guide-to-OpenAI-Evaluation-for-LLM-applications.png","srcSet":"https://website-cms.eesel.ai/wp-content/uploads/2025/10/Banner-Product-A-practical-guide-to-OpenAI-Evaluation-for-LLM-applications-300x159.png 300w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/Banner-Product-A-practical-guide-to-OpenAI-Evaluation-for-LLM-applications-1024x544.png 1024w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/Banner-Product-A-practical-guide-to-OpenAI-Evaluation-for-LLM-applications-768x408.png 768w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/Banner-Product-A-practical-guide-to-OpenAI-Evaluation-for-LLM-applications-1536x816.png 1536w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/Banner-Product-A-practical-guide-to-OpenAI-Evaluation-for-LLM-applications.png 1784w"},"opengraphUrl":"https://www.eesel.ai//openai-evaluation-en","opengraphSiteName":"eesel AI","opengraphModifiedTime":"2025-11-14T14:39:42+00:00","breadcrumbs":[{"url":"https://website-cms.eesel.ai/","text":"Home"},{"url":"https://www.eesel.ai//openai-evaluation-en/","text":"A practical guide to OpenAI Evaluation for LLM applications"}],"readingTime":1},"editorBlocks":[{"__typename":"AcfTextblock","parentClientId":null,"clientId":"693035a115a18","innerBlocks":[],"textBlock":{"marginBottomReduced":false,"heading":null,"content":"$21","contentType":["markdownV2"]}},{"__typename":"AcfFaqs","parentClientId":null,"clientId":"693035a115a25","innerBlocks":[],"faqs":{"type":["default"],"heading":"Frequently asked questions","answerType":["markdown"],"faqs":[{"question":"What is OpenAI Evaluation and what is its primary purpose?","answer":"

[OpenAI Evaluation, often called Evals](https://evals.openai.com/), is a toolkit designed for developers to create and run tests on language models. Its primary purpose is to quality-check AI models, ensuring they perform as expected and identifying any regressions during updates.

\n"},{"question":"Why is OpenAI Evaluation considered more suitable for developers than for business teams?","answer":"

The entire OpenAI Evaluation process, from creating specific \"JSONL\" files to interpreting complex log data, [requires coding skills and technical expertise](https://medium.com/@rudresh.narwal/openai-evals-dea94f7f2012). This makes it challenging for non-technical business teams, like support managers, to set up, run, and manage effectively.

\n"},{"question":"How does a standard OpenAI Evaluation work in practice, from data setup to analysis?","answer":"

First, a developer [prepares a \"ground truth\" dataset](https://platform.openai.com/docs/guides/evals) of questions and correct answers in \"JSONL\" format. Next, they create a configuration file defining the AI prompt and the grader rules. Finally, the evaluation is run from the command line, generating log files with performance metrics like accuracy.

\n"},{"question":"What are the main limitations of using OpenAI Evaluation for businesses, particularly regarding test data?","answer":"

A significant limitation is the need to manually create and constantly update test datasets, which quickly become outdated as business needs change. This makes maintaining relevant and comprehensive tests a continuous, resource-intensive task for businesses.

\n"},{"question":"Are there direct costs associated with running an OpenAI Evaluation, and how is pricing structured?","answer":"

Yes, running tests with OpenAI Evaluation incurs costs because it uses API tokens for every prompt sent and answer generated by the models. Pricing is typically pay-as-you-go, based on the number of input and output tokens, which can lead to unpredictable monthly bills.

\n"},{"question":"Can OpenAI Evaluation test beyond just text output, such as entire AI-driven workflows with actions?","answer":"

Standard OpenAI Evaluation is excellent for checking text replies but doesn't inherently test a complete workflow or actions an AI might take, like tagging tickets or looking up order statuses. It [primarily focuses on the correctness](https://datanorth.ai/blog/evals-openais-framework-for-evaluating-llms) of verbal or textual responses.

\n"}],"questionText":null,"supportLink":null}}]},"shareUrl":"https://www.eesel.ai/en/blog/openai-evaluation-en"}],["$","span",null,{"className":"my-8 tblsm:my-[60px] dsk:my-18 dskxl:my-20 block w-full h-px bg-border-light dsklg:my-[72px] "}],["$","$L22",null,{"image":"$23","className":"w-full max-h-[780px] overflow-hidden h-auto object-cover mb-10 rounded-xl tblsm:mb-10 dsk:mb-[60px] dsklg:mb-[72px] dsklg:max-w-[1150px] dsklg:mx-auto","priority":true,"sizes":"(max-width: 500px) 300px,(max-width: 1600px) 100vw, 1600px","quality":80}],["$","div",null,{"className":"","children":[["$","div",null,{"className":"grid gap-[70px] grid-cols-1 dsklg:grid-cols-[1fr_600px_1fr] dskxl:grid-cols-[1fr_800px_1fr]","children":[["$","div",null,{"className":"relative hidden dsk:flex flex-col gap-6 ","children":["$","div",null,{"className":"sticky top-[92px]","children":["$","$L25",null,{}]}]}],["$","div",null,{"className":"","children":["$undefined",["$","div",null,{"className":"relative [&_.faqWrapper]:!mt-5","data-content":true,"children":[["$","div",null,{"className":"relative [&_.faqWrapper]:!mt-5","dangerouslySetInnerHTML":{"__html":"\n\n"}}],["$","div",null,{"children":[["$","$11",null,{"fallback":null,"children":["$","section",null,{"className":"relative !mb-0 data-[margin-bottom-reduced=true]:mb-[30px]","data-margin-bottom-reduced":false,"children":["$","div",null,{"className":"container mx-auto","children":[null,false,["$","div",null,{"className":"$26","children":[["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"So, you’re thinking about using an LLM to help run your business. That's a great move. But there’s always that nagging question: how do you make sure it’s actually reliable and not just a ticking time bomb of weird answers? You can’t just flip a switch on a large language model (LLM) and cross your fingers.","position":{"start":{"line":1,"column":1,"offset":0},"end":{"line":1,"column":309,"offset":308}}}],"position":{"start":{"line":1,"column":1,"offset":0},"end":{"line":1,"column":311,"offset":310}}},"children":"So, you’re thinking about using an LLM to help run your business. That's a great move. But there’s always that nagging question: how do you make sure it’s actually reliable and not just a ticking time bomb of weird answers? You can’t just flip a switch on a large language model (LLM) and cross your fingers."}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"If you don't test it properly, your AI could start giving out wrong information, adopting a bizarre tone that’s totally off-brand, or just failing to follow simple instructions. All of that adds up to a terrible ","position":{"start":{"line":3,"column":1,"offset":312},"end":{"line":3,"column":213,"offset":524}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/customer-experience-automation"},"children":[{"type":"text","value":"customer experience","position":{"start":{"line":3,"column":214,"offset":525},"end":{"line":3,"column":233,"offset":544}}}],"position":{"start":{"line":3,"column":213,"offset":524},"end":{"line":3,"column":292,"offset":603}}},{"type":"text","value":". This is why having a solid way to test your AI isn't just a nice-to-have; it's essential.","position":{"start":{"line":3,"column":292,"offset":603},"end":{"line":3,"column":383,"offset":694}}}],"position":{"start":{"line":3,"column":1,"offset":312},"end":{"line":3,"column":385,"offset":696}}},"children":["If you don't test it properly, your AI could start giving out wrong information, adopting a bizarre tone that’s totally off-brand, or just failing to follow simple instructions. All of that adds up to a terrible ",["$","a",null,{"href":"https://www.eesel.ai/blog/customer-experience-automation","node":"$27","children":"customer experience"}],". This is why having a solid way to test your AI isn't just a nice-to-have; it's essential."]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"To tackle this, OpenAI created a framework called OpenAI Evaluation. This guide will walk you through what it is, how the tech folks use it, and why it's probably not the right tool for most business teams. We'll also look at how platforms like ","position":{"start":{"line":5,"column":1,"offset":698},"end":{"line":5,"column":246,"offset":943}}},{"type":"element","tagName":"a","properties":{"href":"https://eesel.ai"},"children":[{"type":"text","value":"eesel AI","position":{"start":{"line":5,"column":247,"offset":944},"end":{"line":5,"column":255,"offset":952}}}],"position":{"start":{"line":5,"column":246,"offset":943},"end":{"line":5,"column":274,"offset":971}}},{"type":"text","value":" give you a much more straightforward path to deploying AI you can actually trust.","position":{"start":{"line":5,"column":274,"offset":971},"end":{"line":5,"column":356,"offset":1053}}}],"position":{"start":{"line":5,"column":1,"offset":698},"end":{"line":5,"column":358,"offset":1055}}},"children":["To tackle this, OpenAI created a framework called OpenAI Evaluation. This guide will walk you through what it is, how the tech folks use it, and why it's probably not the right tool for most business teams. We'll also look at how platforms like ",["$","a",null,{"href":"https://eesel.ai","node":"$31","children":"eesel AI"}]," give you a much more straightforward path to deploying AI you can actually trust."]}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"What is OpenAI Evaluation?","position":{"start":{"line":7,"column":4,"offset":1060},"end":{"line":7,"column":30,"offset":1086}}}],"position":{"start":{"line":7,"column":1,"offset":1057},"end":{"line":7,"column":32,"offset":1088}}},"children":"What is OpenAI Evaluation?"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"In simple terms, ","position":{"start":{"line":9,"column":1,"offset":1090},"end":{"line":9,"column":18,"offset":1107}}},{"type":"element","tagName":"a","properties":{"href":"https://platform.openai.com/docs/guides/evaluations"},"children":[{"type":"text","value":"OpenAI Evaluation","position":{"start":{"line":9,"column":19,"offset":1108},"end":{"line":9,"column":36,"offset":1125}}}],"position":{"start":{"line":9,"column":18,"offset":1107},"end":{"line":9,"column":90,"offset":1179}}},{"type":"text","value":" (or \"Evals,\" as it's often called) is a toolkit for developers to create and run tests on language models. It’s how they check if the prompts they’re writing or the models they’re tweaking are actually doing what they're supposed to. Think of it as a quality check for your AI, making sure that when you update something, you don't accidentally break five other things.","position":{"start":{"line":9,"column":90,"offset":1179},"end":{"line":9,"column":460,"offset":1549}}}],"position":{"start":{"line":9,"column":1,"offset":1090},"end":{"line":9,"column":462,"offset":1551}}},"children":["In simple terms, ",["$","a",null,{"href":"https://platform.openai.com/docs/guides/evaluations","node":"$3b","children":"OpenAI Evaluation"}]," (or \"Evals,\" as it's often called) is a toolkit for developers to create and run tests on language models. It’s how they check if the prompts they’re writing or the models they’re tweaking are actually doing what they're supposed to. Think of it as a quality check for your AI, making sure that when you update something, you don't accidentally break five other things."]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"There are two main flavors of these tests:","position":{"start":{"line":11,"column":1,"offset":1553},"end":{"line":11,"column":43,"offset":1595}}}],"position":{"start":{"line":11,"column":1,"offset":1553},"end":{"line":11,"column":45,"offset":1597}}},"children":"There are two main flavors of these tests:"}],"\n",["$","ul",null,{"className":"flex flex-col m-0 ml-5 list-disc gap-2 ps-0 mb-6 [&>:last-child]:mb-0","node":{"type":"element","tagName":"ul","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Code-based checks:","position":{"start":{"line":13,"column":5,"offset":1603},"end":{"line":13,"column":23,"offset":1621}}}],"position":{"start":{"line":13,"column":3,"offset":1601},"end":{"line":13,"column":25,"offset":1623}}},{"type":"text","value":" These are for the black-and-white stuff. A developer can write a test to see if the model's output includes a specific word, is formatted in a certain way (like JSON), or correctly sorts something into a category. It's perfect for when there’s a clear right or wrong answer.","position":{"start":{"line":13,"column":25,"offset":1623},"end":{"line":13,"column":300,"offset":1898}}}],"position":{"start":{"line":13,"column":3,"offset":1601},"end":{"line":13,"column":302,"offset":1900}}},{"type":"text","value":"\n"}],"position":{"start":{"line":13,"column":1,"offset":1599},"end":{"line":13,"column":302,"offset":1900}}},{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"AI-graded checks:","position":{"start":{"line":15,"column":5,"offset":1906},"end":{"line":15,"column":22,"offset":1923}}}],"position":{"start":{"line":15,"column":3,"offset":1904},"end":{"line":15,"column":24,"offset":1925}}},{"type":"text","value":" This is where things get a bit more interesting. You can use a really powerful AI (like GPT-4o) to judge the work of another AI. For example, you could ask it to rate how \"friendly\" or \"helpful\" a ","position":{"start":{"line":15,"column":24,"offset":1925},"end":{"line":15,"column":222,"offset":2123}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/how-can-ai-automate-customer-support-a-helpful-guide"},"children":[{"type":"text","value":"customer support reply","position":{"start":{"line":15,"column":223,"offset":2124},"end":{"line":15,"column":245,"offset":2146}}}],"position":{"start":{"line":15,"column":222,"offset":2123},"end":{"line":15,"column":326,"offset":2227}}},{"type":"text","value":" is. It’s basically like having an AI supervisor review another AI's homework.","position":{"start":{"line":15,"column":326,"offset":2227},"end":{"line":15,"column":404,"offset":2305}}}],"position":{"start":{"line":15,"column":3,"offset":1904},"end":{"line":15,"column":406,"offset":2307}}},{"type":"text","value":"\n"}],"position":{"start":{"line":15,"column":1,"offset":1902},"end":{"line":15,"column":406,"offset":2307}}},{"type":"text","value":"\n"}],"position":{"start":{"line":13,"column":1,"offset":1599},"end":{"line":15,"column":406,"offset":2307}}},"children":["\n",["$","li","li-0",{"children":["\n",["$","p",null,{"className":"","node":"$45","children":[["$","strong",null,{"className":"font-semibold","node":"$48","children":"Code-based checks:"}]," These are for the black-and-white stuff. A developer can write a test to see if the model's output includes a specific word, is formatted in a certain way (like JSON), or correctly sorts something into a category. It's perfect for when there’s a clear right or wrong answer."]}],"\n"]}],"\n",["$","li","li-1",{"children":["\n",["$","p",null,{"className":"","node":"$59","children":[["$","strong",null,{"className":"font-semibold","node":"$5c","children":"AI-graded checks:"}]," This is where things get a bit more interesting. You can use a really powerful AI (like GPT-4o) to judge the work of another AI. For example, you could ask it to rate how \"friendly\" or \"helpful\" a ",["$","a",null,{"href":"https://www.eesel.ai/blog/how-can-ai-automate-customer-support-a-helpful-guide","node":"$6a","children":"customer support reply"}]," is. It’s basically like having an AI supervisor review another AI's homework."]}],"\n"]}],"\n"]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The whole point of using OpenAI Evals is to get hard numbers on how your AI is performing. This helps teams see if they're making progress and, more importantly, catch any slip-ups before they affect your customers. It’s a crucial practice for anyone building ","position":{"start":{"line":17,"column":1,"offset":2309},"end":{"line":17,"column":261,"offset":2569}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/10-best-ai-tools-for-business-to-boost-productivity-and-growth"},"children":[{"type":"text","value":"serious AI tools","position":{"start":{"line":17,"column":262,"offset":2570},"end":{"line":17,"column":278,"offset":2586}}}],"position":{"start":{"line":17,"column":261,"offset":2569},"end":{"line":17,"column":369,"offset":2677}}},{"type":"text","value":", but it’s also deeply technical.","position":{"start":{"line":17,"column":369,"offset":2677},"end":{"line":17,"column":402,"offset":2710}}}],"position":{"start":{"line":17,"column":1,"offset":2309},"end":{"line":17,"column":404,"offset":2712}}},"children":["The whole point of using OpenAI Evals is to get hard numbers on how your AI is performing. This helps teams see if they're making progress and, more importantly, catch any slip-ups before they affect your customers. It’s a crucial practice for anyone building ",["$","a",null,{"href":"https://www.eesel.ai/blog/10-best-ai-tools-for-business-to-boost-productivity-and-growth","node":"$7b","children":"serious AI tools"}],", but it’s also deeply technical."]}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"How a standard OpenAI Evaluation works","position":{"start":{"line":19,"column":4,"offset":2717},"end":{"line":19,"column":42,"offset":2755}}}],"position":{"start":{"line":19,"column":1,"offset":2714},"end":{"line":19,"column":44,"offset":2757}}},"children":"How a standard OpenAI Evaluation works"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Getting a standard OpenAI Evaluation up and running is a job for a developer. To give you a real sense of it, let’s walk through a ","position":{"start":{"line":21,"column":1,"offset":2759},"end":{"line":21,"column":132,"offset":2890}}},{"type":"element","tagName":"a","properties":{"href":"https://cookbook.openai.com/examples/evaluation/getting_started_with_openai_evals"},"children":[{"type":"text","value":"common example from OpenAI’s own documentation","position":{"start":{"line":21,"column":133,"offset":2891},"end":{"line":21,"column":179,"offset":2937}}}],"position":{"start":{"line":21,"column":132,"offset":2890},"end":{"line":21,"column":263,"offset":3021}}},{"type":"text","value":": classifying ","position":{"start":{"line":21,"column":263,"offset":3021},"end":{"line":21,"column":277,"offset":3035}}},{"type":"element","tagName":"a","properties":{"href":"https://eesel.ai/solution/ai-for-it-operations"},"children":[{"type":"text","value":"IT support tickets","position":{"start":{"line":21,"column":278,"offset":3036},"end":{"line":21,"column":296,"offset":3054}}}],"position":{"start":{"line":21,"column":277,"offset":3035},"end":{"line":21,"column":345,"offset":3103}}},{"type":"text","value":".","position":{"start":{"line":21,"column":345,"offset":3103},"end":{"line":21,"column":346,"offset":3104}}}],"position":{"start":{"line":21,"column":1,"offset":2759},"end":{"line":21,"column":348,"offset":3106}}},"children":["Getting a standard OpenAI Evaluation up and running is a job for a developer. To give you a real sense of it, let’s walk through a ",["$","a",null,{"href":"https://cookbook.openai.com/examples/evaluation/getting_started_with_openai_evals","node":"$85","children":"common example from OpenAI’s own documentation"}],": classifying ",["$","a",null,{"href":"https://eesel.ai/solution/ai-for-it-operations","node":"$8f","children":"IT support tickets"}],"."]}],"\n",["$","pre",null,{"className":"flex flex-col gap-3 text-base text-[#808080] font-default mb-5 text-wrap","node":{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"img","properties":{"loading":"lazy","decoding":"async","className":["alignnone","size-medium","wp-image-55385"],"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/10/01-WorkflowV2-OpenAI-A-workflow-diagram-explaining-the-standard-OpenAI-Evaluation-process-300x169.png","alt":"A chart showing the three main steps of OpenAI Evaluation: preparing the dataset, setting up test rules, and running the evaluation.","width":300,"height":169,"srcSet":"https://website-cms.eesel.ai/wp-content/uploads/2025/10/01-WorkflowV2-OpenAI-A-workflow-diagram-explaining-the-standard-OpenAI-Evaluation-process-300x169.png 300w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/01-WorkflowV2-OpenAI-A-workflow-diagram-explaining-the-standard-OpenAI-Evaluation-process-1024x576.png 1024w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/01-WorkflowV2-OpenAI-A-workflow-diagram-explaining-the-standard-OpenAI-Evaluation-process-768x432.png 768w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/01-WorkflowV2-OpenAI-A-workflow-diagram-explaining-the-standard-OpenAI-Evaluation-process-1536x864.png 1536w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/01-WorkflowV2-OpenAI-A-workflow-diagram-explaining-the-standard-OpenAI-Evaluation-process-2048x1152.png 2048w","sizes":"(max-width: 300px) 100vw, 300px"},"children":[],"position":{"start":{"line":23,"column":6,"offset":3113},"end":{"line":23,"column":1293,"offset":4400}}},{"type":"text","value":"A workflow diagram explaining the standard OpenAI Evaluation process.","position":{"start":{"line":23,"column":1293,"offset":4400},"end":{"line":23,"column":1362,"offset":4469}}}],"position":{"start":{"line":23,"column":1,"offset":3108},"end":{"line":23,"column":1368,"offset":4475}}},"children":[["$","span",null,{"style":{"display":"block","position":"relative","width":"100%","aspectRatio":"300 / 169"},"children":["$","$L22",null,{"image":{"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/10/01-WorkflowV2-OpenAI-A-workflow-diagram-explaining-the-standard-OpenAI-Evaluation-process.png","alt":"A chart showing the three main steps of OpenAI Evaluation: preparing the dataset, setting up test rules, and running the evaluation.","mediaDetails":{"width":300,"height":169}},"fill":true,"style":{"objectFit":"contain"},"className":"w-full h-auto border-2 border-[#e0e0e0] rounded-md overflow-hidden","sizes":"(max-width: 768px) 100vw, 700px"}]}],"A workflow diagram explaining the standard OpenAI Evaluation process."]}]," \n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Step 1: Get your test data ready","position":{"start":{"line":25,"column":5,"offset":4483},"end":{"line":25,"column":37,"offset":4515}}}],"position":{"start":{"line":25,"column":1,"offset":4479},"end":{"line":25,"column":39,"offset":4517}}},"children":"Step 1: Get your test data ready"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"First, you need what’s called a \"ground truth\" dataset. This is just a fancy term for an answer key. It's a file full of sample questions paired with the perfect answers. The catch? This file needs to be in a very specific format called \"JSONL\" (JSON Lines).","position":{"start":{"line":27,"column":1,"offset":4519},"end":{"line":27,"column":259,"offset":4777}}}],"position":{"start":{"line":27,"column":1,"offset":4519},"end":{"line":27,"column":261,"offset":4779}}},"children":"First, you need what’s called a \"ground truth\" dataset. This is just a fancy term for an answer key. It's a file full of sample questions paired with the perfect answers. The catch? This file needs to be in a very specific format called \"JSONL\" (JSON Lines)."}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"For our ticket-sorting example, a couple of lines in that file might look like this:","position":{"start":{"line":29,"column":1,"offset":4781},"end":{"line":29,"column":85,"offset":4865}}}],"position":{"start":{"line":29,"column":1,"offset":4781},"end":{"line":29,"column":87,"offset":4867}}},"children":"For our ticket-sorting example, a couple of lines in that file might look like this:"}],"\n",["$","pre",null,{"className":"flex flex-col gap-3 text-base text-[#808080] font-default mb-5 text-wrap","node":{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"code","properties":{"className":["hljs","language-json"]},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":"{"}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-attr"]},"children":[{"type":"text","value":"\"item\""}]},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":":"}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":"{"}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-attr"]},"children":[{"type":"text","value":"\"ticket_text\""}]},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":":"}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-string"]},"children":[{"type":"text","value":"\"My monitor won't turn on!\""}]},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":","}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-attr"]},"children":[{"type":"text","value":"\"correct_label\""}]},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":":"}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-string"]},"children":[{"type":"text","value":"\"Hardware\""}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":"}"}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":"}"}]},{"type":"text","value":" \n\n"},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":"{"}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-attr"]},"children":[{"type":"text","value":"\"item\""}]},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":":"}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":"{"}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-attr"]},"children":[{"type":"text","value":"\"ticket_text\""}]},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":":"}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-string"]},"children":[{"type":"text","value":"\"I'm in vim and I can't quit!\""}]},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":","}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-attr"]},"children":[{"type":"text","value":"\"correct_label\""}]},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":":"}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-string"]},"children":[{"type":"text","value":"\"Software\""}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":"}"}]},{"type":"text","value":" "},{"type":"element","tagName":"span","properties":{"className":["hljs-punctuation"]},"children":[{"type":"text","value":"}"}]},{"type":"text","value":" \n\n"}],"position":{"start":{"line":31,"column":1,"offset":4869},"end":{"line":37,"column":6,"offset":5070}}}],"position":{"start":{"line":31,"column":1,"offset":4869},"end":{"line":37,"column":6,"offset":5070}}},"children":["$","code","code-0",{"className":"hljs language-json","children":["\n",["$","span","span-0",{"className":"hljs-punctuation","children":"{"}]," ",["$","span","span-1",{"className":"hljs-attr","children":"\"item\""}],["$","span","span-2",{"className":"hljs-punctuation","children":":"}]," ",["$","span","span-3",{"className":"hljs-punctuation","children":"{"}]," ",["$","span","span-4",{"className":"hljs-attr","children":"\"ticket_text\""}],["$","span","span-5",{"className":"hljs-punctuation","children":":"}]," ",["$","span","span-6",{"className":"hljs-string","children":"\"My monitor won't turn on!\""}],["$","span","span-7",{"className":"hljs-punctuation","children":","}]," ",["$","span","span-8",{"className":"hljs-attr","children":"\"correct_label\""}],["$","span","span-9",{"className":"hljs-punctuation","children":":"}]," ",["$","span","span-10",{"className":"hljs-string","children":"\"Hardware\""}]," ",["$","span","span-11",{"className":"hljs-punctuation","children":"}"}]," ",["$","span","span-12",{"className":"hljs-punctuation","children":"}"}]," \n\n",["$","span","span-13",{"className":"hljs-punctuation","children":"{"}]," ",["$","span","span-14",{"className":"hljs-attr","children":"\"item\""}],["$","span","span-15",{"className":"hljs-punctuation","children":":"}]," ",["$","span","span-16",{"className":"hljs-punctuation","children":"{"}]," ",["$","span","span-17",{"className":"hljs-attr","children":"\"ticket_text\""}],["$","span","span-18",{"className":"hljs-punctuation","children":":"}]," ",["$","span","span-19",{"className":"hljs-string","children":"\"I'm in vim and I can't quit!\""}],["$","span","span-20",{"className":"hljs-punctuation","children":","}]," ",["$","span","span-21",{"className":"hljs-attr","children":"\"correct_label\""}],["$","span","span-22",{"className":"hljs-punctuation","children":":"}]," ",["$","span","span-23",{"className":"hljs-string","children":"\"Software\""}]," ",["$","span","span-24",{"className":"hljs-punctuation","children":"}"}]," ",["$","span","span-25",{"className":"hljs-punctuation","children":"}"}]," \n\n"]}]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Now, creating this file isn't a one-and-done thing. Someone has to manually create it, clean it up, and make sure it’s formatted perfectly. For a simple task, that might be fine. But if you're dealing with complex customer issues, building a good dataset can be a massive project all on its own.","position":{"start":{"line":39,"column":1,"offset":5072},"end":{"line":39,"column":296,"offset":5367}}}],"position":{"start":{"line":39,"column":1,"offset":5072},"end":{"line":39,"column":298,"offset":5369}}},"children":"Now, creating this file isn't a one-and-done thing. Someone has to manually create it, clean it up, and make sure it’s formatted perfectly. For a simple task, that might be fine. But if you're dealing with complex customer issues, building a good dataset can be a massive project all on its own."}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Step 2: Set up the test rules","position":{"start":{"line":41,"column":5,"offset":5375},"end":{"line":41,"column":34,"offset":5404}}}],"position":{"start":{"line":41,"column":1,"offset":5371},"end":{"line":41,"column":36,"offset":5406}}},"children":"Step 2: Set up the test rules"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Next, a developer has to create a configuration file that tells the evaluation tool ","position":{"start":{"line":43,"column":1,"offset":5408},"end":{"line":43,"column":85,"offset":5492}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"how","position":{"start":{"line":43,"column":86,"offset":5493},"end":{"line":43,"column":89,"offset":5496}}}],"position":{"start":{"line":43,"column":85,"offset":5492},"end":{"line":43,"column":90,"offset":5497}}},{"type":"text","value":" to test the model. This file lays out the prompt that gets sent to the AI and the \"grader\" that will check the AI's response against your answer key.","position":{"start":{"line":43,"column":90,"offset":5497},"end":{"line":43,"column":240,"offset":5647}}}],"position":{"start":{"line":43,"column":1,"offset":5408},"end":{"line":43,"column":242,"offset":5649}}},"children":["Next, a developer has to create a configuration file that tells the evaluation tool ",["$","em","em-0",{"children":"how"}]," to test the model. This file lays out the prompt that gets sent to the AI and the \"grader\" that will check the AI's response against your answer key."]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"For our ticket example, the test might use a simple grader that just checks if the AI’s output exactly matches the \"correct_label\" in the dataset. This step involves knowing your way around special codes and placeholders to pull data from the test file into the test itself.","position":{"start":{"line":45,"column":1,"offset":5651},"end":{"line":45,"column":275,"offset":5925}}}],"position":{"start":{"line":45,"column":1,"offset":5651},"end":{"line":45,"column":277,"offset":5927}}},"children":"For our ticket example, the test might use a simple grader that just checks if the AI’s output exactly matches the \"correct_label\" in the dataset. This step involves knowing your way around special codes and placeholders to pull data from the test file into the test itself."}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Step 3: Run the evaluation and see what happened","position":{"start":{"line":47,"column":5,"offset":5933},"end":{"line":47,"column":53,"offset":5981}}}],"position":{"start":{"line":47,"column":1,"offset":5929},"end":{"line":47,"column":55,"offset":5983}}},"children":"Step 3: Run the evaluation and see what happened"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Finally, the developer kicks off the evaluation from their command line. The system then goes through every item in your dataset, sends the prompt to the model, gets an answer back, and scores it.","position":{"start":{"line":49,"column":1,"offset":5985},"end":{"line":49,"column":197,"offset":6181}}}],"position":{"start":{"line":49,"column":1,"offset":5985},"end":{"line":49,"column":199,"offset":6183}}},"children":"Finally, the developer kicks off the evaluation from their command line. The system then goes through every item in your dataset, sends the prompt to the model, gets an answer back, and scores it."}],"\n",["$","pre",null,{"className":"flex flex-col gap-3 text-base text-[#808080] font-default mb-5 text-wrap","node":{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"img","properties":{"loading":"lazy","decoding":"async","className":["alignnone","size-full","wp-image-55386"],"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/10/02-OpenAI-Evaluation.png","alt":"An example of OpenAI's evaluation interface.","width":2560,"height":1440,"srcSet":"https://website-cms.eesel.ai/wp-content/uploads/2025/10/02-OpenAI-Evaluation.png 2560w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/02-OpenAI-Evaluation-300x169.png 300w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/02-OpenAI-Evaluation-1024x576.png 1024w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/02-OpenAI-Evaluation-768x432.png 768w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/02-OpenAI-Evaluation-1536x864.png 1536w, https://website-cms.eesel.ai/wp-content/uploads/2025/10/02-OpenAI-Evaluation-2048x1152.png 2048w","sizes":"(max-width: 2560px) 100vw, 2560px"},"children":[],"position":{"start":{"line":51,"column":6,"offset":6190},"end":{"line":51,"column":873,"offset":7057}}},{"type":"text","value":"An example of OpenAI's evaluation interface.","position":{"start":{"line":51,"column":873,"offset":7057},"end":{"line":51,"column":917,"offset":7101}}}],"position":{"start":{"line":51,"column":1,"offset":6185},"end":{"line":51,"column":923,"offset":7107}}},"children":[["$","span",null,{"style":{"display":"block","position":"relative","width":"100%","aspectRatio":"2560 / 1440"},"children":["$","$L22",null,{"image":{"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/10/02-OpenAI-Evaluation.png","alt":"An example of OpenAI's evaluation interface.","mediaDetails":{"width":2560,"height":1440}},"fill":true,"style":{"objectFit":"contain"},"className":"w-full h-auto border-2 border-[#e0e0e0] rounded-md overflow-hidden","sizes":"(max-width: 768px) 100vw, 700px"}]}],"An example of OpenAI's evaluation interface."]}]," \n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The result is usually a log file, a wall of text filled with data and metrics like how many tests \"passed\", \"failed\", and the overall \"accuracy\". These numbers tell you ","position":{"start":{"line":53,"column":1,"offset":7111},"end":{"line":53,"column":170,"offset":7280}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"what","position":{"start":{"line":53,"column":171,"offset":7281},"end":{"line":53,"column":175,"offset":7285}}}],"position":{"start":{"line":53,"column":170,"offset":7280},"end":{"line":53,"column":176,"offset":7286}}},{"type":"text","value":" happened, but they don't give you much insight into ","position":{"start":{"line":53,"column":176,"offset":7286},"end":{"line":53,"column":229,"offset":7339}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"why","position":{"start":{"line":53,"column":230,"offset":7340},"end":{"line":53,"column":233,"offset":7343}}}],"position":{"start":{"line":53,"column":229,"offset":7339},"end":{"line":53,"column":234,"offset":7344}}},{"type":"text","value":" something failed without some serious digging. It's a powerful system, but it’s definitely not built for the average user.","position":{"start":{"line":53,"column":234,"offset":7344},"end":{"line":53,"column":357,"offset":7467}}}],"position":{"start":{"line":53,"column":1,"offset":7111},"end":{"line":53,"column":359,"offset":7469}}},"children":["The result is usually a log file, a wall of text filled with data and metrics like how many tests \"passed\", \"failed\", and the overall \"accuracy\". These numbers tell you ",["$","em","em-0",{"children":"what"}]," happened, but they don't give you much insight into ",["$","em","em-1",{"children":"why"}]," something failed without some serious digging. It's a powerful system, but it’s definitely not built for the average user."]}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"Common reasons to use OpenAI Evaluation","position":{"start":{"line":55,"column":4,"offset":7474},"end":{"line":55,"column":43,"offset":7513}}}],"position":{"start":{"line":55,"column":1,"offset":7471},"end":{"line":55,"column":45,"offset":7515}}},"children":"Common reasons to use OpenAI Evaluation"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Even though the setup is a bit of a headache, the reasons behind it are very practical. Proper testing is what turns a fun AI demo into a tool you can rely on for your business.","position":{"start":{"line":57,"column":1,"offset":7517},"end":{"line":57,"column":178,"offset":7694}}}],"position":{"start":{"line":57,"column":1,"offset":7517},"end":{"line":57,"column":180,"offset":7696}}},"children":"Even though the setup is a bit of a headache, the reasons behind it are very practical. Proper testing is what turns a fun AI demo into a tool you can rely on for your business."}],"\n",["$","ul",null,{"className":"flex flex-col m-0 ml-5 list-disc gap-2 ps-0 mb-6 [&>:last-child]:mb-0","node":{"type":"element","tagName":"ul","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Keeping it factual:","position":{"start":{"line":59,"column":5,"offset":7702},"end":{"line":59,"column":24,"offset":7721}}}],"position":{"start":{"line":59,"column":3,"offset":7700},"end":{"line":59,"column":26,"offset":7723}}},{"type":"text","value":" This is a big one. You need to make sure your AI is giving correct information based on your ","position":{"start":{"line":59,"column":26,"offset":7723},"end":{"line":59,"column":120,"offset":7817}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/how-to-build-an-ai-knowledge-base-in-2025"},"children":[{"type":"text","value":"knowledge base","position":{"start":{"line":59,"column":121,"offset":7818},"end":{"line":59,"column":135,"offset":7832}}}],"position":{"start":{"line":59,"column":120,"offset":7817},"end":{"line":59,"column":205,"offset":7902}}},{"type":"text","value":", whether that’s about product details or your return policy. An eval can check if the AI's answers actually match your official documents.","position":{"start":{"line":59,"column":205,"offset":7902},"end":{"line":59,"column":344,"offset":8041}}}],"position":{"start":{"line":59,"column":3,"offset":7700},"end":{"line":59,"column":346,"offset":8043}}},{"type":"text","value":"\n"}],"position":{"start":{"line":59,"column":1,"offset":7698},"end":{"line":59,"column":346,"offset":8043}}},{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Following instructions:","position":{"start":{"line":61,"column":5,"offset":8049},"end":{"line":61,"column":28,"offset":8072}}}],"position":{"start":{"line":61,"column":3,"offset":8047},"end":{"line":61,"column":30,"offset":8074}}},{"type":"text","value":" Many AI workflows need the output to be structured in a specific way. Evals can confirm that your AI can do things like generate clean JSON for another system to use or ","position":{"start":{"line":61,"column":30,"offset":8074},"end":{"line":61,"column":200,"offset":8244}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/how-ai-powered-ticket-categorization-improves-zendesk-advanced-search-and-reporting"},"children":[{"type":"text","value":"tag a support ticket","position":{"start":{"line":61,"column":201,"offset":8245},"end":{"line":61,"column":221,"offset":8265}}}],"position":{"start":{"line":61,"column":200,"offset":8244},"end":{"line":61,"column":333,"offset":8377}}},{"type":"text","value":" with the right category from your list.","position":{"start":{"line":61,"column":333,"offset":8377},"end":{"line":61,"column":373,"offset":8417}}}],"position":{"start":{"line":61,"column":3,"offset":8047},"end":{"line":61,"column":375,"offset":8419}}},{"type":"text","value":"\n"}],"position":{"start":{"line":61,"column":1,"offset":8045},"end":{"line":61,"column":375,"offset":8419}}},{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Getting the tone right:","position":{"start":{"line":63,"column":5,"offset":8425},"end":{"line":63,"column":28,"offset":8448}}}],"position":{"start":{"line":63,"column":3,"offset":8423},"end":{"line":63,"column":30,"offset":8450}}},{"type":"text","value":" A support answer can be 100% correct but still sound robotic and cold. AI-graded evals can help you check if the AI’s tone matches your brand voice. You can ask the grader, \"Does this reply sound empathetic and professional?\" to keep the customer experience consistent.","position":{"start":{"line":63,"column":30,"offset":8450},"end":{"line":63,"column":300,"offset":8720}}}],"position":{"start":{"line":63,"column":3,"offset":8423},"end":{"line":63,"column":302,"offset":8722}}},{"type":"text","value":"\n"}],"position":{"start":{"line":63,"column":1,"offset":8421},"end":{"line":63,"column":302,"offset":8722}}},{"type":"text","value":"\n"},{"type":"element","tagName":"li","properties":{},"children":[{"type":"text","value":"\n"},{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Staying safe and fair:","position":{"start":{"line":65,"column":5,"offset":8728},"end":{"line":65,"column":27,"offset":8750}}}],"position":{"start":{"line":65,"column":3,"offset":8726},"end":{"line":65,"column":29,"offset":8752}}},{"type":"text","value":" On a larger scale, developers use these same methods to ","position":{"start":{"line":65,"column":29,"offset":8752},"end":{"line":65,"column":86,"offset":8809}}},{"type":"element","tagName":"a","properties":{"href":"https://openai.com/safety/evaluations-hub/"},"children":[{"type":"text","value":"test for safety issues","position":{"start":{"line":65,"column":87,"offset":8810},"end":{"line":65,"column":109,"offset":8832}}}],"position":{"start":{"line":65,"column":86,"offset":8809},"end":{"line":65,"column":154,"offset":8877}}},{"type":"text","value":". Evals help make sure models aren’t generating harmful, biased, or inappropriate content, which is obviously critical for any responsible AI tool.","position":{"start":{"line":65,"column":154,"offset":8877},"end":{"line":65,"column":301,"offset":9024}}}],"position":{"start":{"line":65,"column":3,"offset":8726},"end":{"line":65,"column":303,"offset":9026}}},{"type":"text","value":"\n"}],"position":{"start":{"line":65,"column":1,"offset":8724},"end":{"line":65,"column":303,"offset":9026}}},{"type":"text","value":"\n"}],"position":{"start":{"line":59,"column":1,"offset":7698},"end":{"line":65,"column":303,"offset":9026}}},"children":["\n",["$","li","li-0",{"children":["\n",["$","p",null,{"className":"","node":"$99","children":[["$","strong",null,{"className":"font-semibold","node":"$9c","children":"Keeping it factual:"}]," This is a big one. You need to make sure your AI is giving correct information based on your ",["$","a",null,{"href":"https://www.eesel.ai/blog/how-to-build-an-ai-knowledge-base-in-2025","node":"$aa","children":"knowledge base"}],", whether that’s about product details or your return policy. An eval can check if the AI's answers actually match your official documents."]}],"\n"]}],"\n",["$","li","li-1",{"children":["\n",["$","p",null,{"className":"","node":"$bb","children":[["$","strong",null,{"className":"font-semibold","node":"$be","children":"Following instructions:"}]," Many AI workflows need the output to be structured in a specific way. Evals can confirm that your AI can do things like generate clean JSON for another system to use or ",["$","a",null,{"href":"https://www.eesel.ai/blog/how-ai-powered-ticket-categorization-improves-zendesk-advanced-search-and-reporting","node":"$cc","children":"tag a support ticket"}]," with the right category from your list."]}],"\n"]}],"\n",["$","li","li-2",{"children":["\n",["$","p",null,{"className":"","node":"$dd","children":[["$","strong",null,{"className":"font-semibold","node":"$e0","children":"Getting the tone right:"}]," A support answer can be 100% correct but still sound robotic and cold. AI-graded evals can help you check if the AI’s tone matches your brand voice. You can ask the grader, \"Does this reply sound empathetic and professional?\" to keep the customer experience consistent."]}],"\n"]}],"\n",["$","li","li-3",{"children":["\n",["$","p",null,{"className":"","node":"$f1","children":[["$","strong",null,{"className":"font-semibold","node":"$f4","children":"Staying safe and fair:"}]," On a larger scale, developers use these same methods to ",["$","a",null,{"href":"https://openai.com/safety/evaluations-hub/","node":"$102","children":"test for safety issues"}],". Evals help make sure models aren’t generating harmful, biased, or inappropriate content, which is obviously critical for any responsible AI tool."]}],"\n"]}],"\n"]}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"The limits of OpenAI Evaluation for businesses","position":{"start":{"line":67,"column":4,"offset":9031},"end":{"line":67,"column":50,"offset":9077}}}],"position":{"start":{"line":67,"column":1,"offset":9028},"end":{"line":67,"column":52,"offset":9079}}},"children":"The limits of OpenAI Evaluation for businesses"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"OpenAI Evaluation is a fantastic tool for the developers who are building AI. But for the business teams who have to manage that AI every day, it comes with some pretty big downsides.","position":{"start":{"line":69,"column":1,"offset":9081},"end":{"line":69,"column":184,"offset":9264}}}],"position":{"start":{"line":69,"column":1,"offset":9081},"end":{"line":69,"column":186,"offset":9266}}},"children":"OpenAI Evaluation is a fantastic tool for the developers who are building AI. But for the business teams who have to manage that AI every day, it comes with some pretty big downsides."}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Why OpenAI Evaluation is for developers, not your support team","position":{"start":{"line":71,"column":5,"offset":9272},"end":{"line":71,"column":67,"offset":9334}}}],"position":{"start":{"line":71,"column":1,"offset":9268},"end":{"line":71,"column":69,"offset":9336}}},"children":"Why OpenAI Evaluation is for developers, not your support team"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The whole process, from making \"JSONL\" files to reading log data, is ","position":{"start":{"line":73,"column":1,"offset":9338},"end":{"line":73,"column":70,"offset":9407}}},{"type":"element","tagName":"a","properties":{"href":"https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/evaluations"},"children":[{"type":"text","value":"complicated and requires coding skills","position":{"start":{"line":73,"column":71,"offset":9408},"end":{"line":73,"column":109,"offset":9446}}}],"position":{"start":{"line":73,"column":70,"offset":9407},"end":{"line":73,"column":188,"offset":9525}}},{"type":"text","value":". You need engineers to set it up and keep it running. That’s a huge barrier for the support managers or IT leads who are actually in charge of the AI's performance. They need to know if the AI is doing its job, but you can’t expect them to learn to code just to find out.","position":{"start":{"line":73,"column":188,"offset":9525},"end":{"line":73,"column":460,"offset":9797}}}],"position":{"start":{"line":73,"column":1,"offset":9338},"end":{"line":73,"column":462,"offset":9799}}},"children":["The whole process, from making \"JSONL\" files to reading log data, is ",["$","a",null,{"href":"https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/evaluations","node":"$113","children":"complicated and requires coding skills"}],". You need engineers to set it up and keep it running. That’s a huge barrier for the support managers or IT leads who are actually in charge of the AI's performance. They need to know if the AI is doing its job, but you can’t expect them to learn to code just to find out."]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"What support teams actually need:","position":{"start":{"line":75,"column":3,"offset":9803},"end":{"line":75,"column":36,"offset":9836}}}],"position":{"start":{"line":75,"column":1,"offset":9801},"end":{"line":75,"column":38,"offset":9838}}},{"type":"text","value":" Instead of a tool that lives in the command line, business teams need something designed for them. For instance, ","position":{"start":{"line":75,"column":38,"offset":9838},"end":{"line":75,"column":152,"offset":9952}}},{"type":"element","tagName":"a","properties":{"href":"https://eesel.ai"},"children":[{"type":"text","value":"eesel AI","position":{"start":{"line":75,"column":153,"offset":9953},"end":{"line":75,"column":161,"offset":9961}}}],"position":{"start":{"line":75,"column":152,"offset":9952},"end":{"line":75,"column":180,"offset":9980}}},{"type":"text","value":" has a ","position":{"start":{"line":75,"column":180,"offset":9980},"end":{"line":75,"column":187,"offset":9987}}},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"simulation mode","position":{"start":{"line":75,"column":189,"offset":9989},"end":{"line":75,"column":204,"offset":10004}}}],"position":{"start":{"line":75,"column":187,"offset":9987},"end":{"line":75,"column":206,"offset":10006}}},{"type":"text","value":" that lets you test your AI on thousands of your real, historical support tickets in just a few clicks. No code, no fuss. You get simple, visual reports showing you what you can expect to automate and can see exactly how the AI would have replied.","position":{"start":{"line":75,"column":206,"offset":10006},"end":{"line":75,"column":453,"offset":10253}}}],"position":{"start":{"line":75,"column":1,"offset":9801},"end":{"line":75,"column":455,"offset":10255}}},"children":[["$","strong",null,{"className":"font-semibold","node":"$11d","children":"What support teams actually need:"}]," Instead of a tool that lives in the command line, business teams need something designed for them. For instance, ",["$","a",null,{"href":"https://eesel.ai","node":"$127","children":"eesel AI"}]," has a ",["$","strong",null,{"className":"font-semibold","node":"$131","children":"simulation mode"}]," that lets you test your AI on thousands of your real, historical support tickets in just a few clicks. No code, no fuss. You get simple, visual reports showing you what you can expect to automate and can see exactly how the AI would have replied."]}],"\n",["$","pre",null,{"className":"flex flex-col gap-3 text-base text-[#808080] font-default mb-5 text-wrap","node":{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"img","properties":{"loading":"lazy","decoding":"async","className":["alignnone","size-medium","wp-image"],"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/09/eeselAI-screenshot-Testing-Simulation.png","alt":"A screenshot of the eesel AI simulation mode, a user-friendly alternative to the technical OpenAI Evaluation process, showing how businesses can test their AI on real tickets without code.","width":300,"height":169},"children":[],"position":{"start":{"line":77,"column":6,"offset":10262},"end":{"line":77,"column":408,"offset":10664}}},{"type":"text","value":"A screenshot of the eesel AI simulation mode, a user-friendly alternative to the technical OpenAI Evaluation process, showing how businesses can test their AI on real tickets without code.","position":{"start":{"line":77,"column":408,"offset":10664},"end":{"line":77,"column":596,"offset":10852}}}],"position":{"start":{"line":77,"column":1,"offset":10257},"end":{"line":77,"column":602,"offset":10858}}},"children":[["$","span",null,{"style":{"display":"block","position":"relative","width":"100%","aspectRatio":"300 / 169"},"children":["$","$L22",null,{"image":{"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/09/eeselAI-screenshot-Testing-Simulation.png","alt":"A screenshot of the eesel AI simulation mode, a user-friendly alternative to the technical OpenAI Evaluation process, showing how businesses can test their AI on real tickets without code.","mediaDetails":{"width":300,"height":169}},"fill":true,"style":{"objectFit":"contain"},"className":"w-full h-auto border-2 border-[#e0e0e0] rounded-md overflow-hidden","sizes":"(max-width: 768px) 100vw, 700px"}]}],"A screenshot of the eesel AI simulation mode, a user-friendly alternative to the technical OpenAI Evaluation process, showing how businesses can test their AI on real tickets without code."]}]," \n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Why creating test data by hand is a dead end","position":{"start":{"line":79,"column":5,"offset":10866},"end":{"line":79,"column":49,"offset":10910}}}],"position":{"start":{"line":79,"column":1,"offset":10862},"end":{"line":79,"column":51,"offset":10912}}},"children":"Why creating test data by hand is a dead end"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Building and updating a good test dataset is a never-ending chore. Your customers’ problems are always changing as you launch new products or change your policies. A static test file you made in January will be hopelessly out of date by March, which makes your tests pretty meaningless.","position":{"start":{"line":81,"column":1,"offset":10914},"end":{"line":81,"column":287,"offset":11200}}}],"position":{"start":{"line":81,"column":1,"offset":10914},"end":{"line":81,"column":289,"offset":11202}}},"children":"Building and updating a good test dataset is a never-ending chore. Your customers’ problems are always changing as you launch new products or change your policies. A static test file you made in January will be hopelessly out of date by March, which makes your tests pretty meaningless."}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"A better approach:","position":{"start":{"line":83,"column":3,"offset":11206},"end":{"line":83,"column":21,"offset":11224}}}],"position":{"start":{"line":83,"column":1,"offset":11204},"end":{"line":83,"column":23,"offset":11226}}},{"type":"text","value":" Your AI should learn from reality, not a file someone made months ago. ","position":{"start":{"line":83,"column":23,"offset":11226},"end":{"line":83,"column":95,"offset":11298}}},{"type":"element","tagName":"a","properties":{"href":"https://eesel.ai"},"children":[{"type":"text","value":"eesel AI","position":{"start":{"line":83,"column":96,"offset":11299},"end":{"line":83,"column":104,"offset":11307}}}],"position":{"start":{"line":83,"column":95,"offset":11298},"end":{"line":83,"column":123,"offset":11326}}},{"type":"text","value":" plugs right into your help desk (like ","position":{"start":{"line":83,"column":123,"offset":11326},"end":{"line":83,"column":162,"offset":11365}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/integration/zendesk"},"children":[{"type":"text","value":"Zendesk","position":{"start":{"line":83,"column":163,"offset":11366},"end":{"line":83,"column":170,"offset":11373}}}],"position":{"start":{"line":83,"column":162,"offset":11365},"end":{"line":83,"column":213,"offset":11416}}},{"type":"text","value":" or ","position":{"start":{"line":83,"column":213,"offset":11416},"end":{"line":83,"column":217,"offset":11420}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/integration/freshdesk"},"children":[{"type":"text","value":"Freshdesk","position":{"start":{"line":83,"column":218,"offset":11421},"end":{"line":83,"column":227,"offset":11430}}}],"position":{"start":{"line":83,"column":217,"offset":11420},"end":{"line":83,"column":272,"offset":11475}}},{"type":"text","value":") and your knowledge sources. It trains and tests on your ","position":{"start":{"line":83,"column":272,"offset":11475},"end":{"line":83,"column":330,"offset":11533}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"actual","position":{"start":{"line":83,"column":331,"offset":11534},"end":{"line":83,"column":337,"offset":11540}}}],"position":{"start":{"line":83,"column":330,"offset":11533},"end":{"line":83,"column":338,"offset":11541}}},{"type":"text","value":" past tickets and help center articles from the very beginning. Your test dataset is your real, live data, so your tests are always relevant without any extra work.","position":{"start":{"line":83,"column":338,"offset":11541},"end":{"line":83,"column":502,"offset":11705}}}],"position":{"start":{"line":83,"column":1,"offset":11204},"end":{"line":83,"column":504,"offset":11707}}},"children":[["$","strong",null,{"className":"font-semibold","node":"$13b","children":"A better approach:"}]," Your AI should learn from reality, not a file someone made months ago. ",["$","a",null,{"href":"https://eesel.ai","node":"$145","children":"eesel AI"}]," plugs right into your help desk (like ",["$","a",null,{"href":"https://www.eesel.ai/integration/zendesk","node":"$14f","children":"Zendesk"}]," or ",["$","a",null,{"href":"https://www.eesel.ai/integration/freshdesk","node":"$159","children":"Freshdesk"}],") and your knowledge sources. It trains and tests on your ",["$","em","em-0",{"children":"actual"}]," past tickets and help center articles from the very beginning. Your test dataset is your real, live data, so your tests are always relevant without any extra work."]}],"\n",["$","pre",null,{"className":"flex flex-col gap-3 text-base text-[#808080] font-default mb-5 text-wrap","node":{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"img","properties":{"loading":"lazy","decoding":"async","className":["alignnone","size-medium","wp-image"],"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/02-eeselAI-Bot-Training.png","alt":"A screenshot of the eesel AI platform connecting to live business data, which is a better approach than the static datasets required for OpenAI Evaluation.::","width":300,"height":169},"children":[],"position":{"start":{"line":85,"column":6,"offset":11714},"end":{"line":85,"column":363,"offset":12071}}},{"type":"text","value":"A screenshot of the eesel AI platform connecting to live business data, which is a better approach than the static datasets required for OpenAI Evaluation.","position":{"start":{"line":85,"column":363,"offset":12071},"end":{"line":85,"column":518,"offset":12226}}}],"position":{"start":{"line":85,"column":1,"offset":11709},"end":{"line":85,"column":524,"offset":12232}}},"children":[["$","span",null,{"style":{"display":"block","position":"relative","width":"100%","aspectRatio":"300 / 169"},"children":["$","$L22",null,{"image":{"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/02-eeselAI-Bot-Training.png","alt":"A screenshot of the eesel AI platform connecting to live business data, which is a better approach than the static datasets required for OpenAI Evaluation.::","mediaDetails":{"width":300,"height":169}},"fill":true,"style":{"objectFit":"contain"},"className":"w-full h-auto border-2 border-[#e0e0e0] rounded-md overflow-hidden","sizes":"(max-width: 768px) 100vw, 700px"}]}],"A screenshot of the eesel AI platform connecting to live business data, which is a better approach than the static datasets required for OpenAI Evaluation."]}]," \n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Why just testing text isn't the full picture","position":{"start":{"line":87,"column":5,"offset":12240},"end":{"line":87,"column":49,"offset":12284}}}],"position":{"start":{"line":87,"column":1,"offset":12236},"end":{"line":87,"column":51,"offset":12286}}},"children":"Why just testing text isn't the full picture"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"A standard OpenAI Evaluation is great for checking if a text reply is correct. But in a real support situation, the words are just one piece of the puzzle. A great ","position":{"start":{"line":89,"column":1,"offset":12288},"end":{"line":89,"column":165,"offset":12452}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/product/ai-agent"},"children":[{"type":"text","value":"AI agent","position":{"start":{"line":89,"column":166,"offset":12453},"end":{"line":89,"column":174,"offset":12461}}}],"position":{"start":{"line":89,"column":165,"offset":12452},"end":{"line":89,"column":214,"offset":12501}}},{"type":"text","value":" doesn't just answer a question; it ","position":{"start":{"line":89,"column":214,"offset":12501},"end":{"line":89,"column":250,"offset":12537}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"does something","position":{"start":{"line":89,"column":251,"offset":12538},"end":{"line":89,"column":265,"offset":12552}}}],"position":{"start":{"line":89,"column":250,"offset":12537},"end":{"line":89,"column":266,"offset":12553}}},{"type":"text","value":". The standard eval can't tell you if the AI successfully did things like ","position":{"start":{"line":89,"column":266,"offset":12553},"end":{"line":89,"column":340,"offset":12627}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"tagging a ticket as urgent","position":{"start":{"line":89,"column":341,"offset":12628},"end":{"line":89,"column":367,"offset":12654}}}],"position":{"start":{"line":89,"column":340,"offset":12627},"end":{"line":89,"column":368,"offset":12655}}},{"type":"text","value":", ","position":{"start":{"line":89,"column":368,"offset":12655},"end":{"line":89,"column":370,"offset":12657}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"escalating it to a person","position":{"start":{"line":89,"column":371,"offset":12658},"end":{"line":89,"column":396,"offset":12683}}}],"position":{"start":{"line":89,"column":370,"offset":12657},"end":{"line":89,"column":397,"offset":12684}}},{"type":"text","value":", or ","position":{"start":{"line":89,"column":397,"offset":12684},"end":{"line":89,"column":402,"offset":12689}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"looking up an order status in ","position":{"start":{"line":89,"column":403,"offset":12690},"end":{"line":89,"column":433,"offset":12720}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/integration/shopify"},"children":[{"type":"text","value":"Shopify","position":{"start":{"line":89,"column":434,"offset":12721},"end":{"line":89,"column":441,"offset":12728}}}],"position":{"start":{"line":89,"column":433,"offset":12720},"end":{"line":89,"column":484,"offset":12771}}}],"position":{"start":{"line":89,"column":402,"offset":12689},"end":{"line":89,"column":485,"offset":12772}}},{"type":"text","value":".","position":{"start":{"line":89,"column":485,"offset":12772},"end":{"line":89,"column":486,"offset":12773}}}],"position":{"start":{"line":89,"column":1,"offset":12288},"end":{"line":89,"column":488,"offset":12775}}},"children":["A standard OpenAI Evaluation is great for checking if a text reply is correct. But in a real support situation, the words are just one piece of the puzzle. A great ",["$","a",null,{"href":"https://www.eesel.ai/product/ai-agent","node":"$163","children":"AI agent"}]," doesn't just answer a question; it ",["$","em","em-0",{"children":"does something"}],". The standard eval can't tell you if the AI successfully did things like ",["$","em","em-1",{"children":"tagging a ticket as urgent"}],", ",["$","em","em-2",{"children":"escalating it to a person"}],", or ",["$","em","em-3",{"children":["looking up an order status in ",["$","a",null,{"href":"https://www.eesel.ai/integration/shopify","node":"$16d","children":"Shopify"}]]}],"."]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Test the whole workflow:","position":{"start":{"line":91,"column":3,"offset":12779},"end":{"line":91,"column":27,"offset":12803}}}],"position":{"start":{"line":91,"column":1,"offset":12777},"end":{"line":91,"column":29,"offset":12805}}},{"type":"text","value":" You need to ","position":{"start":{"line":91,"column":29,"offset":12805},"end":{"line":91,"column":42,"offset":12818}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/how-to-automate-your-customer-support-workflow-using-ai"},"children":[{"type":"text","value":"test the entire process","position":{"start":{"line":91,"column":43,"offset":12819},"end":{"line":91,"column":66,"offset":12842}}}],"position":{"start":{"line":91,"column":42,"offset":12818},"end":{"line":91,"column":150,"offset":12926}}},{"type":"text","value":", not just the words. With the ","position":{"start":{"line":91,"column":150,"offset":12926},"end":{"line":91,"column":181,"offset":12957}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/product/ai-agent"},"children":[{"type":"text","value":"customizable workflow engine in eesel AI","position":{"start":{"line":91,"column":182,"offset":12958},"end":{"line":91,"column":222,"offset":12998}}}],"position":{"start":{"line":91,"column":181,"offset":12957},"end":{"line":91,"column":262,"offset":13038}}},{"type":"text","value":", you can build and test these actions right inside the simulation. You can see not only what the AI ","position":{"start":{"line":91,"column":262,"offset":13038},"end":{"line":91,"column":363,"offset":13139}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"would have said","position":{"start":{"line":91,"column":364,"offset":13140},"end":{"line":91,"column":379,"offset":13155}}}],"position":{"start":{"line":91,"column":363,"offset":13139},"end":{"line":91,"column":380,"offset":13156}}},{"type":"text","value":", but also what it ","position":{"start":{"line":91,"column":380,"offset":13156},"end":{"line":91,"column":399,"offset":13175}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"would have done","position":{"start":{"line":91,"column":400,"offset":13176},"end":{"line":91,"column":415,"offset":13191}}}],"position":{"start":{"line":91,"column":399,"offset":13175},"end":{"line":91,"column":416,"offset":13192}}},{"type":"text","value":". This gives you a complete picture of its performance so you can feel good about automating entire processes, not just text snippets.","position":{"start":{"line":91,"column":416,"offset":13192},"end":{"line":91,"column":550,"offset":13326}}}],"position":{"start":{"line":91,"column":1,"offset":12777},"end":{"line":91,"column":552,"offset":13328}}},"children":[["$","strong",null,{"className":"font-semibold","node":"$177","children":"Test the whole workflow:"}]," You need to ",["$","a",null,{"href":"https://www.eesel.ai/blog/how-to-automate-your-customer-support-workflow-using-ai","node":"$181","children":"test the entire process"}],", not just the words. With the ",["$","a",null,{"href":"https://www.eesel.ai/product/ai-agent","node":"$18b","children":"customizable workflow engine in eesel AI"}],", you can build and test these actions right inside the simulation. You can see not only what the AI ",["$","em","em-0",{"children":"would have said"}],", but also what it ",["$","em","em-1",{"children":"would have done"}],". This gives you a complete picture of its performance so you can feel good about automating entire processes, not just text snippets."]}],"\n",["$","pre",null,{"className":"flex flex-col gap-3 text-base text-[#808080] font-default mb-5 text-wrap","node":{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"img","properties":{"loading":"lazy","decoding":"async","className":["alignnone","size-medium","wp-image"],"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/09/05-WorkflowV2-eeselAI-Support-Automation-Workflow.png","alt":"A workflow diagram showing how eesel AI tests the entire support process, a key limitation of text-only OpenAI Evaluation.::","width":300,"height":169},"children":[],"position":{"start":{"line":93,"column":6,"offset":13335},"end":{"line":93,"column":356,"offset":13685}}},{"type":"text","value":"A workflow diagram showing how eesel AI tests the entire support process, a key limitation of text-only OpenAI Evaluation.","position":{"start":{"line":93,"column":356,"offset":13685},"end":{"line":93,"column":478,"offset":13807}}}],"position":{"start":{"line":93,"column":1,"offset":13330},"end":{"line":93,"column":484,"offset":13813}}},"children":[["$","span",null,{"style":{"display":"block","position":"relative","width":"100%","aspectRatio":"300 / 169"},"children":["$","$L22",null,{"image":{"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/09/05-WorkflowV2-eeselAI-Support-Automation-Workflow.png","alt":"A workflow diagram showing how eesel AI tests the entire support process, a key limitation of text-only OpenAI Evaluation.::","mediaDetails":{"width":300,"height":169}},"fill":true,"style":{"objectFit":"contain"},"className":"w-full h-auto border-2 border-[#e0e0e0] rounded-md overflow-hidden","sizes":"(max-width: 768px) 100vw, 700px"}]}],"A workflow diagram showing how eesel AI tests the entire support process, a key limitation of text-only OpenAI Evaluation."]}]," \n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"Understanding the API pricing for OpenAI Evaluation","position":{"start":{"line":95,"column":4,"offset":13820},"end":{"line":95,"column":55,"offset":13871}}}],"position":{"start":{"line":95,"column":1,"offset":13817},"end":{"line":95,"column":57,"offset":13873}}},"children":"Understanding the API pricing for OpenAI Evaluation"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"While the OpenAI Evals framework is open-source, running the tests will cost you. Every test you run uses API tokens, and that adds up on your bill. You pay for every prompt you send to the model you're testing and for every answer it generates. This is especially true when you use AI-graded evals, since you're paying for a second, more powerful model to do the grading.","position":{"start":{"line":97,"column":1,"offset":13875},"end":{"line":97,"column":373,"offset":14247}}}],"position":{"start":{"line":97,"column":1,"offset":13875},"end":{"line":97,"column":375,"offset":14249}}},"children":"While the OpenAI Evals framework is open-source, running the tests will cost you. Every test you run uses API tokens, and that adds up on your bill. You pay for every prompt you send to the model you're testing and for every answer it generates. This is especially true when you use AI-graded evals, since you're paying for a second, more powerful model to do the grading."}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Here’s a quick look at the pay-as-you-go costs for some of OpenAI's models:","position":{"start":{"line":99,"column":1,"offset":14251},"end":{"line":99,"column":76,"offset":14326}}}],"position":{"start":{"line":99,"column":1,"offset":14251},"end":{"line":99,"column":78,"offset":14328}}},"children":"Here’s a quick look at the pay-as-you-go costs for some of OpenAI's models:"}],"\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",["$","table",null,{"className":"mb-7 !border !border-[#121212] overflow-x-auto block","node":{"type":"element","tagName":"table","properties":{},"children":[{"type":"element","tagName":"thead","properties":{},"children":[{"type":"element","tagName":"tr","properties":{},"children":[{"type":"element","tagName":"th","properties":{"align":"left"},"children":[{"type":"text","value":"Model","position":{"start":{"line":101,"column":3,"offset":14332},"end":{"line":101,"column":8,"offset":14337}}}],"position":{"start":{"line":101,"column":1,"offset":14330},"end":{"line":101,"column":9,"offset":14338}}},{"type":"element","tagName":"th","properties":{"align":"left"},"children":[{"type":"text","value":"Input (per 1M tokens)","position":{"start":{"line":101,"column":11,"offset":14340},"end":{"line":101,"column":32,"offset":14361}}}],"position":{"start":{"line":101,"column":9,"offset":14338},"end":{"line":101,"column":33,"offset":14362}}},{"type":"element","tagName":"th","properties":{"align":"left"},"children":[{"type":"text","value":"Output (per 1M tokens)","position":{"start":{"line":101,"column":35,"offset":14364},"end":{"line":101,"column":57,"offset":14386}}}],"position":{"start":{"line":101,"column":33,"offset":14362},"end":{"line":101,"column":59,"offset":14388}}}],"position":{"start":{"line":101,"column":1,"offset":14330},"end":{"line":101,"column":59,"offset":14388}}}],"position":{"start":{"line":101,"column":1,"offset":14330},"end":{"line":101,"column":59,"offset":14388}}},{"type":"element","tagName":"tbody","properties":{},"children":[{"type":"element","tagName":"tr","properties":{},"children":[{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"\"gpt-4o-mini\"","position":{"start":{"line":103,"column":3,"offset":14414},"end":{"line":103,"column":16,"offset":14427}}}],"position":{"start":{"line":103,"column":1,"offset":14412},"end":{"line":103,"column":17,"offset":14428}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"$$0.15","position":{"start":{"line":103,"column":19,"offset":14430},"end":{"line":103,"column":24,"offset":14435}}}],"position":{"start":{"line":103,"column":17,"offset":14428},"end":{"line":103,"column":25,"offset":14436}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"$$0.60","position":{"start":{"line":103,"column":27,"offset":14438},"end":{"line":103,"column":32,"offset":14443}}}],"position":{"start":{"line":103,"column":25,"offset":14436},"end":{"line":103,"column":34,"offset":14445}}}],"position":{"start":{"line":103,"column":1,"offset":14412},"end":{"line":103,"column":34,"offset":14445}}},{"type":"element","tagName":"tr","properties":{},"children":[{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"\"gpt-4o\"","position":{"start":{"line":104,"column":3,"offset":14448},"end":{"line":104,"column":11,"offset":14456}}}],"position":{"start":{"line":104,"column":1,"offset":14446},"end":{"line":104,"column":12,"offset":14457}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"$$5.00","position":{"start":{"line":104,"column":14,"offset":14459},"end":{"line":104,"column":19,"offset":14464}}}],"position":{"start":{"line":104,"column":12,"offset":14457},"end":{"line":104,"column":20,"offset":14465}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"$$15.00","position":{"start":{"line":104,"column":22,"offset":14467},"end":{"line":104,"column":28,"offset":14473}}}],"position":{"start":{"line":104,"column":20,"offset":14465},"end":{"line":104,"column":30,"offset":14475}}}],"position":{"start":{"line":104,"column":1,"offset":14446},"end":{"line":104,"column":30,"offset":14475}}},{"type":"element","tagName":"tr","properties":{},"children":[{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"\"gpt-5-mini\"","position":{"start":{"line":105,"column":3,"offset":14478},"end":{"line":105,"column":15,"offset":14490}}}],"position":{"start":{"line":105,"column":1,"offset":14476},"end":{"line":105,"column":16,"offset":14491}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"$$0.25","position":{"start":{"line":105,"column":18,"offset":14493},"end":{"line":105,"column":23,"offset":14498}}}],"position":{"start":{"line":105,"column":16,"offset":14491},"end":{"line":105,"column":24,"offset":14499}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"$$2.00","position":{"start":{"line":105,"column":26,"offset":14501},"end":{"line":105,"column":31,"offset":14506}}}],"position":{"start":{"line":105,"column":24,"offset":14499},"end":{"line":105,"column":33,"offset":14508}}}],"position":{"start":{"line":105,"column":1,"offset":14476},"end":{"line":105,"column":33,"offset":14508}}},{"type":"element","tagName":"tr","properties":{},"children":[{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"\"gpt-5\"","position":{"start":{"line":106,"column":3,"offset":14511},"end":{"line":106,"column":10,"offset":14518}}}],"position":{"start":{"line":106,"column":1,"offset":14509},"end":{"line":106,"column":11,"offset":14519}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"$$1.25","position":{"start":{"line":106,"column":13,"offset":14521},"end":{"line":106,"column":18,"offset":14526}}}],"position":{"start":{"line":106,"column":11,"offset":14519},"end":{"line":106,"column":19,"offset":14527}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"$$10.00","position":{"start":{"line":106,"column":21,"offset":14529},"end":{"line":106,"column":27,"offset":14535}}}],"position":{"start":{"line":106,"column":19,"offset":14527},"end":{"line":106,"column":29,"offset":14537}}}],"position":{"start":{"line":106,"column":1,"offset":14509},"end":{"line":106,"column":29,"offset":14537}}}],"position":{"start":{"line":103,"column":1,"offset":14412},"end":{"line":106,"column":29,"offset":14537}}}],"position":{"start":{"line":101,"column":1,"offset":14330},"end":{"line":106,"column":29,"offset":14537}}},"children":[["$","thead","thead-0",{"children":["$","tr","tr-0",{"children":[["$","th","th-0",{"style":{"textAlign":"left"},"children":"Model"}],["$","th","th-1",{"style":{"textAlign":"left"},"children":"Input (per 1M tokens)"}],["$","th","th-2",{"style":{"textAlign":"left"},"children":"Output (per 1M tokens)"}]]}]}],["$","tbody","tbody-0",{"children":[["$","tr","tr-0",{"children":[["$","td","td-0",{"style":{"textAlign":"left"},"children":"\"gpt-4o-mini\""}],["$","td","td-1",{"style":{"textAlign":"left"},"children":"$$0.15"}],["$","td","td-2",{"style":{"textAlign":"left"},"children":"$$0.60"}]]}],["$","tr","tr-1",{"children":[["$","td","td-0",{"style":{"textAlign":"left"},"children":"\"gpt-4o\""}],["$","td","td-1",{"style":{"textAlign":"left"},"children":"$$5.00"}],["$","td","td-2",{"style":{"textAlign":"left"},"children":"$$15.00"}]]}],["$","tr","tr-2",{"children":[["$","td","td-0",{"style":{"textAlign":"left"},"children":"\"gpt-5-mini\""}],["$","td","td-1",{"style":{"textAlign":"left"},"children":"$$0.25"}],["$","td","td-2",{"style":{"textAlign":"left"},"children":"$$2.00"}]]}],["$","tr","tr-3",{"children":[["$","td","td-0",{"style":{"textAlign":"left"},"children":"\"gpt-5\""}],["$","td","td-1",{"style":{"textAlign":"left"},"children":"$$1.25"}],["$","td","td-2",{"style":{"textAlign":"left"},"children":"$$10.00"}]]}]]}]]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"Pricing can change, so it's always a good idea to check the official ","position":{"start":{"line":109,"column":2,"offset":14544},"end":{"line":109,"column":71,"offset":14613}}},{"type":"element","tagName":"a","properties":{"href":"https://platform.openai.com/pricing"},"children":[{"type":"text","value":"OpenAI pricing page","position":{"start":{"line":109,"column":72,"offset":14614},"end":{"line":109,"column":91,"offset":14633}}}],"position":{"start":{"line":109,"column":71,"offset":14613},"end":{"line":109,"column":129,"offset":14671}}},{"type":"text","value":" for the latest details.","position":{"start":{"line":109,"column":129,"offset":14671},"end":{"line":109,"column":153,"offset":14695}}}],"position":{"start":{"line":109,"column":1,"offset":14543},"end":{"line":109,"column":154,"offset":14696}}}],"position":{"start":{"line":109,"column":1,"offset":14543},"end":{"line":109,"column":156,"offset":14698}}},"children":["$","em","em-0",{"children":["Pricing can change, so it's always a good idea to check the official ",["$","a",null,{"href":"https://platform.openai.com/pricing","node":"$195","children":"OpenAI pricing page"}]," for the latest details."]}]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"A more predictable way:","position":{"start":{"line":111,"column":3,"offset":14702},"end":{"line":111,"column":26,"offset":14725}}}],"position":{"start":{"line":111,"column":1,"offset":14700},"end":{"line":111,"column":28,"offset":14727}}},{"type":"text","value":" This token-based pricing can lead to some unpleasant surprises on your monthly bill, especially if you're running a lot of tests. In contrast, ","position":{"start":{"line":111,"column":28,"offset":14727},"end":{"line":111,"column":172,"offset":14871}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/pricing"},"children":[{"type":"text","value":"eesel AI offers predictable pricing","position":{"start":{"line":111,"column":173,"offset":14872},"end":{"line":111,"column":208,"offset":14907}}}],"position":{"start":{"line":111,"column":172,"offset":14871},"end":{"line":111,"column":239,"offset":14938}}},{"type":"text","value":". Plans are based on a set number of AI interactions per month, and all the testing you do in simulation mode is included. This makes budgeting for your AI tools much simpler, with no hidden costs for making sure your AI is ready to go.","position":{"start":{"line":111,"column":239,"offset":14938},"end":{"line":111,"column":475,"offset":15174}}}],"position":{"start":{"line":111,"column":1,"offset":14700},"end":{"line":111,"column":477,"offset":15176}}},"children":[["$","strong",null,{"className":"font-semibold","node":"$19f","children":"A more predictable way:"}]," This token-based pricing can lead to some unpleasant surprises on your monthly bill, especially if you're running a lot of tests. In contrast, ",["$","a",null,{"href":"https://www.eesel.ai/pricing","node":"$1a9","children":"eesel AI offers predictable pricing"}],". Plans are based on a set number of AI interactions per month, and all the testing you do in simulation mode is included. This makes budgeting for your AI tools much simpler, with no hidden costs for making sure your AI is ready to go."]}],"\n",["$","pre",null,{"className":"flex flex-col gap-3 text-base text-[#808080] font-default mb-5 text-wrap","node":{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"img","properties":{"loading":"lazy","decoding":"async","className":["alignnone","size-medium","wp-image"],"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/09/eeselAI-screenshot-Pricing.png","alt":"A screenshot of eesel AI's pricing page, showing a predictable pricing model that contrasts with the variable API costs of OpenAI Evaluation.::","width":300,"height":169},"children":[],"position":{"start":{"line":113,"column":6,"offset":15183},"end":{"line":113,"column":352,"offset":15529}}},{"type":"text","value":"A screenshot of eesel AI's pricing page, showing a predictable pricing model that contrasts with the variable API costs of OpenAI Evaluation.","position":{"start":{"line":113,"column":352,"offset":15529},"end":{"line":113,"column":493,"offset":15670}}}],"position":{"start":{"line":113,"column":1,"offset":15178},"end":{"line":113,"column":499,"offset":15676}}},"children":[["$","span",null,{"style":{"display":"block","position":"relative","width":"100%","aspectRatio":"300 / 169"},"children":["$","$L22",null,{"image":{"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/09/eeselAI-screenshot-Pricing.png","alt":"A screenshot of eesel AI's pricing page, showing a predictable pricing model that contrasts with the variable API costs of OpenAI Evaluation.::","mediaDetails":{"width":300,"height":169}},"fill":true,"style":{"objectFit":"contain"},"className":"w-full h-auto border-2 border-[#e0e0e0] rounded-md overflow-hidden","sizes":"(max-width: 768px) 100vw, 700px"}]}],"A screenshot of eesel AI's pricing page, showing a predictable pricing model that contrasts with the variable API costs of OpenAI Evaluation."]}]," \n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"Move beyond OpenAI Evaluation and start automating","position":{"start":{"line":115,"column":4,"offset":15683},"end":{"line":115,"column":54,"offset":15733}}}],"position":{"start":{"line":115,"column":1,"offset":15680},"end":{"line":115,"column":56,"offset":15735}}},"children":"Move beyond OpenAI Evaluation and start automating"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"OpenAI Evaluation is a big deal for developers building with LLMs. It proves that serious, methodical testing isn't just an extra step, it's at the core of building AI responsibly. However, because it's so technical and developer-focused, it’s just not practical for most business teams who need to manage AI for things like ","position":{"start":{"line":117,"column":1,"offset":15737},"end":{"line":117,"column":326,"offset":16062}}},{"type":"element","tagName":"a","properties":{"href":"https://eesel.ai/solution/customer-support-automation"},"children":[{"type":"text","value":"customer support","position":{"start":{"line":117,"column":327,"offset":16063},"end":{"line":117,"column":343,"offset":16079}}}],"position":{"start":{"line":117,"column":326,"offset":16062},"end":{"line":117,"column":399,"offset":16135}}},{"type":"text","value":" or ","position":{"start":{"line":117,"column":399,"offset":16135},"end":{"line":117,"column":403,"offset":16139}}},{"type":"element","tagName":"a","properties":{"href":"https://eesel.ai/solution/ai-service-desk"},"children":[{"type":"text","value":"internal help desks","position":{"start":{"line":117,"column":404,"offset":16140},"end":{"line":117,"column":423,"offset":16159}}}],"position":{"start":{"line":117,"column":403,"offset":16139},"end":{"line":117,"column":467,"offset":16203}}},{"type":"text","value":".","position":{"start":{"line":117,"column":467,"offset":16203},"end":{"line":117,"column":468,"offset":16204}}}],"position":{"start":{"line":117,"column":1,"offset":15737},"end":{"line":117,"column":470,"offset":16206}}},"children":["OpenAI Evaluation is a big deal for developers building with LLMs. It proves that serious, methodical testing isn't just an extra step, it's at the core of building AI responsibly. However, because it's so technical and developer-focused, it’s just not practical for most business teams who need to manage AI for things like ",["$","a",null,{"href":"https://eesel.ai/solution/customer-support-automation","node":"$1b3","children":"customer support"}]," or ",["$","a",null,{"href":"https://eesel.ai/solution/ai-service-desk","node":"$1bd","children":"internal help desks"}],"."]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The future of AI in business isn't just about raw power; it's about making that power safe, reliable, and easy for anyone to manage. That means you need testing tools that are built into your platform, easy to use, and designed for the people who will be using them every single day.","position":{"start":{"line":119,"column":1,"offset":16208},"end":{"line":119,"column":284,"offset":16491}}}],"position":{"start":{"line":119,"column":1,"offset":16208},"end":{"line":119,"column":286,"offset":16493}}},"children":"The future of AI in business isn't just about raw power; it's about making that power safe, reliable, and easy for anyone to manage. That means you need testing tools that are built into your platform, easy to use, and designed for the people who will be using them every single day."}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Instead of spending months trying to build a complex, code-heavy testing system, you can get all the benefits in just a few minutes. ","position":{"start":{"line":121,"column":1,"offset":16495},"end":{"line":121,"column":134,"offset":16628}}},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"element","tagName":"a","properties":{"href":"https://dashboard.eesel.ai/api/auth/signup?returnTo=v2"},"children":[{"type":"text","value":"Sign up for eesel AI","position":{"start":{"line":121,"column":137,"offset":16631},"end":{"line":121,"column":157,"offset":16651}}}],"position":{"start":{"line":121,"column":136,"offset":16630},"end":{"line":121,"column":214,"offset":16708}}}],"position":{"start":{"line":121,"column":134,"offset":16628},"end":{"line":121,"column":216,"offset":16710}}},{"type":"text","value":" and run a free simulation on your own data. You'll see exactly what you can automate and can launch your ","position":{"start":{"line":121,"column":216,"offset":16710},"end":{"line":121,"column":322,"offset":16816}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/ai-agent-examples"},"children":[{"type":"text","value":"AI agents","position":{"start":{"line":121,"column":323,"offset":16817},"end":{"line":121,"column":332,"offset":16826}}}],"position":{"start":{"line":121,"column":322,"offset":16816},"end":{"line":121,"column":378,"offset":16872}}},{"type":"text","value":" feeling completely confident.","position":{"start":{"line":121,"column":378,"offset":16872},"end":{"line":121,"column":408,"offset":16902}}}],"position":{"start":{"line":121,"column":1,"offset":16495},"end":{"line":121,"column":410,"offset":16904}}},"children":["Instead of spending months trying to build a complex, code-heavy testing system, you can get all the benefits in just a few minutes. ",["$","strong",null,{"className":"font-semibold","node":"$1c7","children":["$","a",null,{"href":"https://dashboard.eesel.ai/api/auth/signup?returnTo=v2","node":"$1ca","children":"Sign up for eesel AI"}]}]," and run a free simulation on your own data. You'll see exactly what you can automate and can launch your ",["$","a",null,{"href":"https://www.eesel.ai/blog/ai-agent-examples","node":"$1d7","children":"AI agents"}]," feeling completely confident."]}],"\n",["$","$L1e1",null,{"categoryName":"guides-en"}]]}]]}]}]}]]}],false,["$","div",null,{"children":[["$","$L1e2","0-AcfFaqs",{"children":["$","$11",null,{"fallback":null,"children":["$","$L1e3",null,{"_data":"$1e4","extra":{"faqs":{"hasTopMargin":true,"isBlogPage":true},"blogCategory":"guides-en","textBlock":{"isFirstTextBlock":false}}}]}]}]]}],false]}]]}],["$","div",null,{"className":"relative hidden dskxl:flex flex-col gap-6 ","children":["$","div",null,{"className":"sticky top-[92px]","children":["$","$L1f0",null,{"BASE_URL":"https://www.eesel.ai","locale":"EN","shareUrl":"https://www.eesel.ai/en/blog/openai-evaluation-en","categoryName":"guides-en"}]}]}]]}],["$","div",null,{"className":"grid gap-[72px] place-items-center py-12 tblsm:py-18 h-fit max-w-[800px] mx-auto dsklg:max-w-full","children":[["$","$L1f1",null,{"url":"https://www.eesel.ai/en/blog/openai-evaluation-en","title":"A practical guide to OpenAI Evaluation for LLM applications - eesel AI","isTextCentered":true}],["$","$L1f2",null,{"data":"$1f3"}]]}]]}]]}],["$","$L216",null,{"relateds":[{"id":"cG9zdDo3NTYyNQ==","title":"Koala AI pricing in 2025: A complete breakdown","excerpt":"

Is Koala AI pricing worth it? We break down every plan, the hidden costs of using GPT-4, and the real cost per article to help you decide.

\n","slug":"koala-ai-pricing-en","date":"2025-11-25T06:25:11","language":{"slug":"en"},"featuredImage":{"node":{"altText":"","mediaDetails":{"width":1785,"height":949},"sourceUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/Banner-Top-7-solutions-for-AI-for-ticketing-systems-in-2025.png"}},"author":{"node":{"firstName":"Stevia","lastName":"Putri","authors":{"avatar":{"node":{"altText":"","mediaItemUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/IMG-20250812-WA0014-e1755016187283.jpg","mediaDetails":{"width":544,"height":1013}}},"role":"Writer","roleFrench":"Writer","roleGerman":"Writer","roleSpanish":"Writer","rolePortuguese":"Writer","roleJapanese":"Writer"}}},"postMeta":{"minsRead":null}},{"id":"cG9zdDo3NTYxNA==","title":"Koala AI review","excerpt":"

Our in-depth Koala AI review explores its features, pros, and cons. Discover if this AI writer is right for you or if its pricing and support issues are a deal-breaker.

\n","slug":"koala-ai-review-en","date":"2025-11-25T06:16:50","language":{"slug":"en"},"featuredImage":{"node":{"altText":"","mediaDetails":{"width":1785,"height":949},"sourceUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/Banner-The-6-best-AI-chat-for-e-commerce-solutions-for-brands-in-2025.png"}},"author":{"node":{"firstName":"Stevia","lastName":"Putri","authors":{"avatar":{"node":{"altText":"","mediaItemUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/IMG-20250812-WA0014-e1755016187283.jpg","mediaDetails":{"width":544,"height":1013}}},"role":"Writer","roleFrench":"Writer","roleGerman":"Writer","roleSpanish":"Writer","rolePortuguese":"Writer","roleJapanese":"Writer"}}},"postMeta":{"minsRead":null}},{"id":"cG9zdDo3NTYxMw==","title":"What is Koala AI? A clear guide to the name on everyone's lips in 2025","excerpt":"

Confused by \"Koala AI\"? You're not alone. This guide breaks down the different tools, from content writers to chatbots, and helps you find the right solution.

\n","slug":"koala-ai-en","date":"2025-11-25T06:15:45","language":{"slug":"en"},"featuredImage":{"node":{"altText":"","mediaDetails":{"width":1785,"height":949},"sourceUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/Banner-The-7-Best-AI-Scheduling-Assistant-Tools-in-2025-Features-Pricing.png"}},"author":{"node":{"firstName":"Kenneth","lastName":"Pangan","authors":{"avatar":{"node":{"altText":"","mediaItemUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/01/ff982460-eca1-4f0e-b1db-aa9ad25df868.jpg","mediaDetails":{"width":1894,"height":3718}}},"role":"Writer","roleFrench":"Écrivain","roleGerman":"Schriftsteller","roleSpanish":"Escritor","rolePortuguese":"Escritor","roleJapanese":"作家"}}},"postMeta":{"minsRead":null}}]}]]}]