8:["$","div",null,{"className":"page bg-white","children":[["$","article",null,{"className":"mb-10 p-6 tblsm:p-10 dsk:px-[72px] dsk:pt-[120px] pb-0 max-w-[1644px] mx-auto [&_section]:mb-[50px] [&_[data-quote]]:mt-0 [&_.container]:p-0 tblsm:[&_.container]:p-0 tblsm:[&_.columns]:!block tblsm:pt-8 ","children":[["$","$L20",null,{"data":{"id":"cG9zdDo0ODE1OQ==","title":"A practical guide to OpenAI evaluation best practices for support teams","excerpt":"

Tired of theoretical AI evaluation? This guide breaks down OpenAI evaluation best practices for non-developers, highlighting the challenges and introducing a simpler, business-focused approach to ensure your AI support is reliable.

\n","slug":"openai-evaluation-best-practices-en","date":"2025-10-13T00:32:44","dateGmt":"2025-10-13T00:32:44","modified":"2025-11-14T14:40:13","language":{"slug":"en"},"featuredImage":{"node":{"altText":"","mediaDetails":{"width":1785,"height":949},"sourceUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/Banner-Microsoft-teams-pricing-overview_-Features-plans-costs-breakdown.png"}},"postMeta":{"banner":null,"minsRead":null,"hideHeroImage":false,"reviewer":{"nodes":[{"name":"Katelin Teen","firstName":"Katelin","lastName":"Teen","authors":{"avatar":{"node":{"altText":"","mediaItemUrl":"https://website-cms.eesel.ai/wp-content/uploads/2024/10/katelin-profile-e1752733682107.jpeg","mediaDetails":{"width":752,"height":765}}}}}]}},"author":{"node":{"firstName":"Stevia","lastName":"Putri","description":"Stevia Putri is a marketing generalist at eesel AI, where she helps turn powerful AI tools into stories that resonate. She’s driven by curiosity, clarity, and the human side of technology.","email":null,"seo":{"social":{"facebook":"","instagram":"instagram.com/steviaanlena","linkedIn":"https://www.linkedin.com/in/steviaputri/","twitter":"https://x.com/steviaanlena"}},"authors":{"avatar":{"node":{"altText":"","mediaItemUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/IMG-20250812-WA0014-e1755016187283.jpg","mediaDetails":{"width":544,"height":1013}}},"role":"Writer","roleFrench":"Writer","roleGerman":"Writer","roleSpanish":"Writer","rolePortuguese":"Writer","roleJapanese":"Writer"}}},"categories":{"nodes":[{"slug":"microsoft-teams-ai-en","name":"Microsoft Teams AI"}]},"tags":{"edges":[]},"seo":{"canonical":"https://www.eesel.ai//openai-evaluation-best-practices-en","title":"A practical guide to OpenAI evaluation best practices for support teams - eesel AI","metaDesc":"Master OpenAI evaluation best practices for business. Learn the core process, key metrics, and practical limitations, and discover how eesel AI makes it easy.","focuskw":"","opengraphTitle":"A practical guide to OpenAI evaluation best practices for support teams","opengraphDescription":"Master OpenAI evaluation best practices for business. Learn the core process, key metrics, and practical limitations, and discover how eesel AI makes it easy.","opengraphImage":{"altText":"","sourceUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/Banner-Microsoft-teams-pricing-overview_-Features-plans-costs-breakdown.png","srcSet":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/Banner-Microsoft-teams-pricing-overview_-Features-plans-costs-breakdown-300x159.png 300w, https://website-cms.eesel.ai/wp-content/uploads/2025/08/Banner-Microsoft-teams-pricing-overview_-Features-plans-costs-breakdown-1024x544.png 1024w, https://website-cms.eesel.ai/wp-content/uploads/2025/08/Banner-Microsoft-teams-pricing-overview_-Features-plans-costs-breakdown-768x408.png 768w, https://website-cms.eesel.ai/wp-content/uploads/2025/08/Banner-Microsoft-teams-pricing-overview_-Features-plans-costs-breakdown-1536x817.png 1536w, https://website-cms.eesel.ai/wp-content/uploads/2025/08/Banner-Microsoft-teams-pricing-overview_-Features-plans-costs-breakdown.png 1785w"},"opengraphUrl":"https://www.eesel.ai//openai-evaluation-best-practices-en","opengraphSiteName":"eesel AI","opengraphModifiedTime":"2025-11-14T14:40:13+00:00","breadcrumbs":[{"url":"https://website-cms.eesel.ai/","text":"Home"},{"url":"https://www.eesel.ai//openai-evaluation-best-practices-en/","text":"A practical guide to OpenAI evaluation best practices for support teams"}],"readingTime":1},"editorBlocks":[{"__typename":"AcfTextblock","parentClientId":null,"clientId":"69307de474982","innerBlocks":[],"textBlock":{"marginBottomReduced":false,"heading":null,"content":"$21","contentType":["markdownV2"]}},{"__typename":"AcfFaqs","parentClientId":null,"clientId":"69307de47498d","innerBlocks":[],"faqs":{"type":["default"],"heading":"Frequently asked questions","answerType":["markdown"],"faqs":[{"question":"What exactly are OpenAI Evaluation Best Practices, and why should my support team care about them?","answer":"

OpenAI Evaluation Best Practices refer to structured tests used to measure an AI model's performance on specific tasks, like answering customer questions. They are crucial for ensuring your AI support agent is accurate, reliable, and consistent, preventing poor customer experiences and building trust.

\n"},{"question":"How can a non-developer effectively implement OpenAI Evaluation Best Practices without needing to code?","answer":"

While many frameworks are developer-focused, platforms like eesel AI offer no-code solutions. These tools integrate directly with your helpdesk, allowing you to simulate AI performance on historical tickets and get actionable insights without technical expertise.

\n"},{"question":"What are the practical limitations I might encounter when trying to follow OpenAI Evaluation Best Practices with standard tools?","answer":"

Standard OpenAI Evaluation Best Practices often [require coding skills](https://www.openassistantgpt.io/blogs/how-to-use-the-openai-evals-api), involve slow and disconnected workflows, and rely on potentially small or generic test datasets. These limitations make them challenging for busy support teams without dedicated developer resources.

\n"},{"question":"What kind of data is essential to gather for effective OpenAI Evaluation Best Practices?","answer":"

To apply OpenAI Evaluation Best Practices, you need a \"ground truth\" dataset. This consists of real customer questions paired with expert-approved, perfect answers, reflecting the diverse inquiries your customers typically ask.

\n"},{"question":"Is human evaluation considered a key component of OpenAI Evaluation Best Practices, and what are its drawbacks?","answer":"

Yes, human evaluation is the gold standard within OpenAI Evaluation Best Practices for nuanced judgments like tone or empathy. However, it is slow, expensive, and difficult to scale for continuous, large-volume testing.

\n"},{"question":"How does using an \"LLM-as-a-Judge\" fit into modern OpenAI Evaluation Best Practices?","answer":"

LLM-as-a-Judge is a contemporary method within OpenAI Evaluation Best Practices where a powerful AI grades your support AI's output. It's faster and understands context better than traditional metrics, though it can have biases and requires careful setup.

\n"}],"questionText":null,"supportLink":null}}]},"shareUrl":"https://www.eesel.ai/en/blog/openai-evaluation-best-practices-en"}],["$","span",null,{"className":"my-8 tblsm:my-[60px] dsk:my-18 dskxl:my-20 block w-full h-px bg-border-light dsklg:my-[72px] "}],["$","$L22",null,{"image":"$23","className":"w-full max-h-[780px] overflow-hidden h-auto object-cover mb-10 rounded-xl tblsm:mb-10 dsk:mb-[60px] dsklg:mb-[72px] dsklg:max-w-[1150px] dsklg:mx-auto","priority":true,"sizes":"(max-width: 500px) 300px,(max-width: 1600px) 100vw, 1600px","quality":80}],["$","div",null,{"className":"","children":[["$","div",null,{"className":"grid gap-[70px] grid-cols-1 dsklg:grid-cols-[1fr_600px_1fr] dskxl:grid-cols-[1fr_800px_1fr]","children":[["$","div",null,{"className":"relative hidden dsk:flex flex-col gap-6 ","children":["$","div",null,{"className":"sticky top-[92px]","children":["$","$L25",null,{}]}]}],["$","div",null,{"className":"","children":["$undefined",["$","div",null,{"className":"relative [&_.faqWrapper]:!mt-5","data-content":true,"children":[["$","div",null,{"className":"relative [&_.faqWrapper]:!mt-5","dangerouslySetInnerHTML":{"__html":"\n\n"}}],["$","div",null,{"children":[["$","$11",null,{"fallback":null,"children":["$","section",null,{"className":"relative !mb-0 data-[margin-bottom-reduced=true]:mb-[30px]","data-margin-bottom-reduced":false,"children":["$","div",null,{"className":"container mx-auto","children":[null,false,["$","div",null,{"className":"$26","children":[["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"So, you’ve brought an ","position":{"start":{"line":1,"column":1,"offset":0},"end":{"line":1,"column":23,"offset":22}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/product/ai-agent"},"children":[{"type":"text","value":"AI support agent","position":{"start":{"line":1,"column":24,"offset":23},"end":{"line":1,"column":40,"offset":39}}}],"position":{"start":{"line":1,"column":23,"offset":22},"end":{"line":1,"column":80,"offset":79}}},{"type":"text","value":" onto the team. That's a big step. But how do you ","position":{"start":{"line":1,"column":80,"offset":79},"end":{"line":1,"column":130,"offset":129}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"really","position":{"start":{"line":1,"column":131,"offset":130},"end":{"line":1,"column":137,"offset":136}}}],"position":{"start":{"line":1,"column":130,"offset":129},"end":{"line":1,"column":138,"offset":137}}},{"type":"text","value":" know if it's helping your customers or just creating more headaches for human agents? Going with your \"gut feeling\" or spot-checking a few conversations isn't going to cut it. Without a solid way to measure performance, you're essentially flying blind. You need real data to feel confident that your AI is accurate, helpful, and staying on-brand.","position":{"start":{"line":1,"column":138,"offset":137},"end":{"line":1,"column":485,"offset":484}}}],"position":{"start":{"line":1,"column":1,"offset":0},"end":{"line":1,"column":487,"offset":486}}},"children":["So, you’ve brought an ",["$","a",null,{"href":"https://www.eesel.ai/product/ai-agent","node":"$27","children":"AI support agent"}]," onto the team. That's a big step. But how do you ",["$","em","em-0",{"children":"really"}]," know if it's helping your customers or just creating more headaches for human agents? Going with your \"gut feeling\" or spot-checking a few conversations isn't going to cut it. Without a solid way to measure performance, you're essentially flying blind. You need real data to feel confident that your AI is accurate, helpful, and staying on-brand."]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"This guide is here to clear up the confusion around OpenAI Evaluation Best Practices. We'll translate the developer-heavy concepts into a framework that actually makes sense for business and support leaders. We'll walk through the core ideas of AI evaluation and then show you a much more practical way to test and deploy AI confidently, right from your helpdesk.","position":{"start":{"line":3,"column":1,"offset":488},"end":{"line":3,"column":364,"offset":851}}}],"position":{"start":{"line":3,"column":1,"offset":488},"end":{"line":3,"column":366,"offset":853}}},"children":"This guide is here to clear up the confusion around OpenAI Evaluation Best Practices. We'll translate the developer-heavy concepts into a framework that actually makes sense for business and support leaders. We'll walk through the core ideas of AI evaluation and then show you a much more practical way to test and deploy AI confidently, right from your helpdesk."}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"What are OpenAI evaluation best practices?","position":{"start":{"line":5,"column":4,"offset":858},"end":{"line":5,"column":46,"offset":900}}}],"position":{"start":{"line":5,"column":1,"offset":855},"end":{"line":5,"column":48,"offset":902}}},"children":"What are OpenAI evaluation best practices?"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Let's break it down. \"Evals\" are just structured tests to see how well an AI model is doing a specific job. Think of it as a report card for your AI, grading it on things like accuracy, relevance, and reliability.","position":{"start":{"line":7,"column":1,"offset":904},"end":{"line":7,"column":214,"offset":1117}}}],"position":{"start":{"line":7,"column":1,"offset":904},"end":{"line":7,"column":216,"offset":1119}}},"children":"Let's break it down. \"Evals\" are just structured tests to see how well an AI model is doing a specific job. Think of it as a report card for your AI, grading it on things like accuracy, relevance, and reliability."}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"According to ","position":{"start":{"line":9,"column":1,"offset":1121},"end":{"line":9,"column":14,"offset":1134}}},{"type":"element","tagName":"a","properties":{"href":"https://platform.openai.com/docs/guides/evaluation"},"children":[{"type":"text","value":"OpenAI’s own documentation","position":{"start":{"line":9,"column":15,"offset":1135},"end":{"line":9,"column":41,"offset":1161}}}],"position":{"start":{"line":9,"column":14,"offset":1134},"end":{"line":9,"column":94,"offset":1214}}},{"type":"text","value":", running these evals is essential for improving any app that uses a large language model (LLM). It’s how you stop the AI from sending weird or wrong answers to customers, keep quality consistent, and track whether things are getting better over time, especially when the underlying models are updated.","position":{"start":{"line":9,"column":94,"offset":1214},"end":{"line":9,"column":396,"offset":1516}}}],"position":{"start":{"line":9,"column":1,"offset":1121},"end":{"line":9,"column":398,"offset":1518}}},"children":["According to ",["$","a",null,{"href":"https://platform.openai.com/docs/guides/evaluation","node":"$31","children":"OpenAI’s own documentation"}],", running these evals is essential for improving any app that uses a large language model (LLM). It’s how you stop the AI from sending weird or wrong answers to customers, keep quality consistent, and track whether things are getting better over time, especially when the underlying models are updated."]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"But here’s the thing: frameworks like the ","position":{"start":{"line":11,"column":1,"offset":1520},"end":{"line":11,"column":43,"offset":1562}}},{"type":"element","tagName":"a","properties":{"href":"https://platform.openai.com/docs/guides/evals"},"children":[{"type":"text","value":"OpenAI Evals API","position":{"start":{"line":11,"column":44,"offset":1563},"end":{"line":11,"column":60,"offset":1579}}}],"position":{"start":{"line":11,"column":43,"offset":1562},"end":{"line":11,"column":108,"offset":1627}}},{"type":"text","value":" are built for developers. They involve writing code, formatting data in special files (like JSONL), and analyzing the results with scripts. For a business leader, the goal isn't to learn how to code. It's to move from \"I ","position":{"start":{"line":11,"column":108,"offset":1627},"end":{"line":11,"column":330,"offset":1849}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"think","position":{"start":{"line":11,"column":331,"offset":1850},"end":{"line":11,"column":336,"offset":1855}}}],"position":{"start":{"line":11,"column":330,"offset":1849},"end":{"line":11,"column":337,"offset":1856}}},{"type":"text","value":" it's working\" to \"I have the data that ","position":{"start":{"line":11,"column":337,"offset":1856},"end":{"line":11,"column":377,"offset":1896}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"proves","position":{"start":{"line":11,"column":378,"offset":1897},"end":{"line":11,"column":384,"offset":1903}}}],"position":{"start":{"line":11,"column":377,"offset":1896},"end":{"line":11,"column":385,"offset":1904}}},{"type":"text","value":" our AI is hitting our goals and keeping customers happy.\"","position":{"start":{"line":11,"column":385,"offset":1904},"end":{"line":11,"column":443,"offset":1962}}}],"position":{"start":{"line":11,"column":1,"offset":1520},"end":{"line":11,"column":445,"offset":1964}}},"children":["But here’s the thing: frameworks like the ",["$","a",null,{"href":"https://platform.openai.com/docs/guides/evals","node":"$3b","children":"OpenAI Evals API"}]," are built for developers. They involve writing code, formatting data in special files (like JSONL), and analyzing the results with scripts. For a business leader, the goal isn't to learn how to code. It's to move from \"I ",["$","em","em-0",{"children":"think"}]," it's working\" to \"I have the data that ",["$","em","em-1",{"children":"proves"}]," our AI is hitting our goals and keeping customers happy.\""]}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"The core evaluation process","position":{"start":{"line":13,"column":4,"offset":1969},"end":{"line":13,"column":31,"offset":1996}}}],"position":{"start":{"line":13,"column":1,"offset":1966},"end":{"line":13,"column":33,"offset":1998}}},"children":"The core evaluation process"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"If you look at the guidelines from folks like ","position":{"start":{"line":15,"column":1,"offset":2000},"end":{"line":15,"column":47,"offset":2046}}},{"type":"element","tagName":"a","properties":{"href":"https://platform.openai.com/docs/guides/evaluation-best-practices"},"children":[{"type":"text","value":"OpenAI","position":{"start":{"line":15,"column":48,"offset":2047},"end":{"line":15,"column":54,"offset":2053}}}],"position":{"start":{"line":15,"column":47,"offset":2046},"end":{"line":15,"column":122,"offset":2121}}},{"type":"text","value":" and ","position":{"start":{"line":15,"column":122,"offset":2121},"end":{"line":15,"column":127,"offset":2126}}},{"type":"element","tagName":"a","properties":{"href":"https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/evaluations"},"children":[{"type":"text","value":"Microsoft","position":{"start":{"line":15,"column":128,"offset":2127},"end":{"line":15,"column":137,"offset":2136}}}],"position":{"start":{"line":15,"column":127,"offset":2126},"end":{"line":15,"column":216,"offset":2215}}},{"type":"text","value":", a good evaluation process usually has four main steps. Following this cycle helps make sure your tests are actually useful and lead to real improvements.","position":{"start":{"line":15,"column":216,"offset":2215},"end":{"line":15,"column":371,"offset":2370}}}],"position":{"start":{"line":15,"column":1,"offset":2000},"end":{"line":15,"column":373,"offset":2372}}},"children":["If you look at the guidelines from folks like ",["$","a",null,{"href":"https://platform.openai.com/docs/guides/evaluation-best-practices","node":"$45","children":"OpenAI"}]," and ",["$","a",null,{"href":"https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/evaluations","node":"$4f","children":"Microsoft"}],", a good evaluation process usually has four main steps. Following this cycle helps make sure your tests are actually useful and lead to real improvements."]}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"1. Define your goal","position":{"start":{"line":21,"column":5,"offset":2386},"end":{"line":21,"column":24,"offset":2405}}}],"position":{"start":{"line":21,"column":1,"offset":2382},"end":{"line":21,"column":26,"offset":2407}}},"children":"1. Define your goal"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"First, you need to decide what \"success\" looks like for a specific task. And you have to be specific. \"Answers questions well\" is too vague. A better goal would be, \"The AI should accurately explain our 30-day return policy by referencing the official help center article.\" Now that’s something you can actually measure.","position":{"start":{"line":23,"column":1,"offset":2409},"end":{"line":23,"column":321,"offset":2729}}}],"position":{"start":{"line":23,"column":1,"offset":2409},"end":{"line":23,"column":323,"offset":2731}}},"children":"First, you need to decide what \"success\" looks like for a specific task. And you have to be specific. \"Answers questions well\" is too vague. A better goal would be, \"The AI should accurately explain our 30-day return policy by referencing the official help center article.\" Now that’s something you can actually measure."}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"2. Gather your data","position":{"start":{"line":25,"column":5,"offset":2737},"end":{"line":25,"column":24,"offset":2756}}}],"position":{"start":{"line":25,"column":1,"offset":2733},"end":{"line":25,"column":26,"offset":2758}}},"children":"2. Gather your data"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"To test your AI, you need a ","position":{"start":{"line":27,"column":1,"offset":2760},"end":{"line":27,"column":29,"offset":2788}}},{"type":"element","tagName":"a","properties":{"href":"https://platform.openai.com/docs/guides/evaluation-getting-started"},"children":[{"type":"text","value":"\"ground truth\" dataset","position":{"start":{"line":27,"column":30,"offset":2789},"end":{"line":27,"column":52,"offset":2811}}}],"position":{"start":{"line":27,"column":29,"offset":2788},"end":{"line":27,"column":121,"offset":2880}}},{"type":"text","value":". This is just a fancy term for a collection of questions paired with perfect, expert-approved answers. This data should look like the real questions your customers ask, covering the common stuff, the weird edge cases, and everything in between.","position":{"start":{"line":27,"column":121,"offset":2880},"end":{"line":27,"column":366,"offset":3125}}}],"position":{"start":{"line":27,"column":1,"offset":2760},"end":{"line":27,"column":368,"offset":3127}}},"children":["To test your AI, you need a ",["$","a",null,{"href":"https://platform.openai.com/docs/guides/evaluation-getting-started","node":"$59","children":"\"ground truth\" dataset"}],". This is just a fancy term for a collection of questions paired with perfect, expert-approved answers. This data should look like the real questions your customers ask, covering the common stuff, the weird edge cases, and everything in between."]}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"3. Choose your metrics","position":{"start":{"line":29,"column":5,"offset":3133},"end":{"line":29,"column":27,"offset":3155}}}],"position":{"start":{"line":29,"column":1,"offset":3129},"end":{"line":29,"column":29,"offset":3157}}},"children":"3. Choose your metrics"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"How are you going to score the AI's answers? It could be a simple pass/fail on whether the information is correct, a rating for how well it matches your brand's tone of voice, or checking if it did something specific, like ","position":{"start":{"line":31,"column":1,"offset":3159},"end":{"line":31,"column":224,"offset":3382}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/automate-your-zendesk-ticket-tagging-with-ai-a-practical-guide"},"children":[{"type":"text","value":"tagging a ticket correctly","position":{"start":{"line":31,"column":225,"offset":3383},"end":{"line":31,"column":251,"offset":3409}}}],"position":{"start":{"line":31,"column":224,"offset":3382},"end":{"line":31,"column":342,"offset":3500}}},{"type":"text","value":". Whatever you choose, it should tie directly back to the goal you set in step one.","position":{"start":{"line":31,"column":342,"offset":3500},"end":{"line":31,"column":425,"offset":3583}}}],"position":{"start":{"line":31,"column":1,"offset":3159},"end":{"line":31,"column":427,"offset":3585}}},"children":["How are you going to score the AI's answers? It could be a simple pass/fail on whether the information is correct, a rating for how well it matches your brand's tone of voice, or checking if it did something specific, like ",["$","a",null,{"href":"https://www.eesel.ai/blog/automate-your-zendesk-ticket-tagging-with-ai-a-practical-guide","node":"$63","children":"tagging a ticket correctly"}],". Whatever you choose, it should tie directly back to the goal you set in step one."]}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"4. Test, check, and repeat","position":{"start":{"line":33,"column":5,"offset":3591},"end":{"line":33,"column":31,"offset":3617}}}],"position":{"start":{"line":33,"column":1,"offset":3587},"end":{"line":33,"column":33,"offset":3619}}},"children":"4. Test, check, and repeat"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The last step is to run your tests, look at the results, and use what you learn to tweak your AI. Maybe you need to adjust a prompt, point it to a better knowledge source, or change a workflow rule. Evaluation isn't something you do once; it's a loop of testing and improving that keeps your AI performing at its best.","position":{"start":{"line":35,"column":1,"offset":3621},"end":{"line":35,"column":319,"offset":3939}}}],"position":{"start":{"line":35,"column":1,"offset":3621},"end":{"line":35,"column":321,"offset":3941}}},"children":"The last step is to run your tests, look at the results, and use what you learn to tweak your AI. Maybe you need to adjust a prompt, point it to a better knowledge source, or change a workflow rule. Evaluation isn't something you do once; it's a loop of testing and improving that keeps your AI performing at its best."}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"Key evaluation strategies and metrics","position":{"start":{"line":37,"column":4,"offset":3946},"end":{"line":37,"column":41,"offset":3983}}}],"position":{"start":{"line":37,"column":1,"offset":3943},"end":{"line":37,"column":43,"offset":3985}}},"children":"Key evaluation strategies and metrics"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"There are a few different ways to grade an AI's performance, and each has its ups and downs. Knowing the options helps you pick the right tool for the job.","position":{"start":{"line":39,"column":1,"offset":3987},"end":{"line":39,"column":156,"offset":4142}}}],"position":{"start":{"line":39,"column":1,"offset":3987},"end":{"line":39,"column":158,"offset":4144}}},"children":"There are a few different ways to grade an AI's performance, and each has its ups and downs. Knowing the options helps you pick the right tool for the job."}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Human evaluation","position":{"start":{"line":41,"column":5,"offset":4150},"end":{"line":41,"column":21,"offset":4166}}}],"position":{"start":{"line":41,"column":1,"offset":4146},"end":{"line":41,"column":23,"offset":4168}}},"children":"Human evaluation"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"This is the gold standard for quality. You have a human expert read the AI's response and grade it against a set of criteria. It’s fantastic for judging nuanced things like empathy or tone, but it's also incredibly slow, expensive, and a pain to scale. For everyday use, it’s just not practical.","position":{"start":{"line":43,"column":1,"offset":4170},"end":{"line":43,"column":296,"offset":4465}}}],"position":{"start":{"line":43,"column":1,"offset":4170},"end":{"line":43,"column":298,"offset":4467}}},"children":"This is the gold standard for quality. You have a human expert read the AI's response and grade it against a set of criteria. It’s fantastic for judging nuanced things like empathy or tone, but it's also incredibly slow, expensive, and a pain to scale. For everyday use, it’s just not practical."}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Traditional metric-based evaluation (ROUGE/BLEU)","position":{"start":{"line":45,"column":5,"offset":4473},"end":{"line":45,"column":53,"offset":4521}}}],"position":{"start":{"line":45,"column":1,"offset":4469},"end":{"line":45,"column":55,"offset":4523}}},"children":"Traditional metric-based evaluation (ROUGE/BLEU)"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"These are automated systems that score an AI's answer by comparing its text to a \"perfect\" reference answer. They basically count how many words and phrases overlap.","position":{"start":{"line":47,"column":1,"offset":4525},"end":{"line":47,"column":166,"offset":4690}}}],"position":{"start":{"line":47,"column":1,"offset":4525},"end":{"line":47,"column":168,"offset":4692}}},"children":"These are automated systems that score an AI's answer by comparing its text to a \"perfect\" reference answer. They basically count how many words and phrases overlap."}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"The catch:","position":{"start":{"line":49,"column":3,"offset":4696},"end":{"line":49,"column":13,"offset":4706}}}],"position":{"start":{"line":49,"column":1,"offset":4694},"end":{"line":49,"column":15,"offset":4708}}},{"type":"text","value":" As many in the industry point out, these metrics aren't great with understanding meaning. An AI might give a perfectly correct answer using different words, but a ","position":{"start":{"line":49,"column":15,"offset":4708},"end":{"line":49,"column":179,"offset":4872}}},{"type":"element","tagName":"a","properties":{"href":"https://nexla.com/ai-readiness/llm-evaluation/"},"children":[{"type":"text","value":"ROUGE or BLEU","position":{"start":{"line":49,"column":180,"offset":4873},"end":{"line":49,"column":193,"offset":4886}}}],"position":{"start":{"line":49,"column":179,"offset":4872},"end":{"line":49,"column":242,"offset":4935}}},{"type":"text","value":" test would fail it. That rigidity makes them less useful for judging conversational AI.","position":{"start":{"line":49,"column":242,"offset":4935},"end":{"line":49,"column":330,"offset":5023}}}],"position":{"start":{"line":49,"column":1,"offset":4694},"end":{"line":49,"column":332,"offset":5025}}},"children":[["$","strong",null,{"className":"font-semibold","node":"$6d","children":"The catch:"}]," As many in the industry point out, these metrics aren't great with understanding meaning. An AI might give a perfectly correct answer using different words, but a ",["$","a",null,{"href":"https://nexla.com/ai-readiness/llm-evaluation/","node":"$77","children":"ROUGE or BLEU"}]," test would fail it. That rigidity makes them less useful for judging conversational AI."]}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"LLM-as-a-judge","position":{"start":{"line":51,"column":5,"offset":5031},"end":{"line":51,"column":19,"offset":5045}}}],"position":{"start":{"line":51,"column":1,"offset":5027},"end":{"line":51,"column":21,"offset":5047}}},"children":"LLM-as-a-judge"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"This is a newer approach where you use a powerful AI model (like GPT-4) to act as a \"judge\" and grade the output of your support AI. It's faster and cheaper than using people, and it understands context way better than simple text-matching tools.","position":{"start":{"line":53,"column":1,"offset":5049},"end":{"line":53,"column":247,"offset":5295}}}],"position":{"start":{"line":53,"column":1,"offset":5049},"end":{"line":53,"column":249,"offset":5297}}},"children":"This is a newer approach where you use a powerful AI model (like GPT-4) to act as a \"judge\" and grade the output of your support AI. It's faster and cheaper than using people, and it understands context way better than simple text-matching tools."}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"The catch:","position":{"start":{"line":55,"column":3,"offset":5301},"end":{"line":55,"column":13,"offset":5311}}}],"position":{"start":{"line":55,"column":1,"offset":5299},"end":{"line":55,"column":15,"offset":5313}}},{"type":"text","value":" This ","position":{"start":{"line":55,"column":15,"offset":5313},"end":{"line":55,"column":21,"offset":5319}}},{"type":"element","tagName":"a","properties":{"href":"https://medium.com/data-science/how-to-best-leverage-openais-evals-framework-c38bcef0ec47"},"children":[{"type":"text","value":"method can have its own biases","position":{"start":{"line":55,"column":22,"offset":5320},"end":{"line":55,"column":52,"offset":5350}}}],"position":{"start":{"line":55,"column":21,"offset":5319},"end":{"line":55,"column":144,"offset":5442}}},{"type":"text","value":" (for example, it sometimes prefers longer answers for no good reason) and still needs some careful setup to work well. It's a definite improvement, but it isn't a silver bullet and often still needs a technical eye on it.","position":{"start":{"line":55,"column":144,"offset":5442},"end":{"line":55,"column":366,"offset":5664}}}],"position":{"start":{"line":55,"column":1,"offset":5299},"end":{"line":55,"column":368,"offset":5666}}},"children":[["$","strong",null,{"className":"font-semibold","node":"$81","children":"The catch:"}]," This ",["$","a",null,{"href":"https://medium.com/data-science/how-to-best-leverage-openais-evals-framework-c38bcef0ec47","node":"$8b","children":"method can have its own biases"}]," (for example, it sometimes prefers longer answers for no good reason) and still needs some careful setup to work well. It's a definite improvement, but it isn't a silver bullet and often still needs a technical eye on it."]}],"\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",["$","table",null,{"className":"mb-7 !border !border-[#121212] overflow-x-auto block","node":{"type":"element","tagName":"table","properties":{},"children":[{"type":"element","tagName":"thead","properties":{},"children":[{"type":"element","tagName":"tr","properties":{},"children":[{"type":"element","tagName":"th","properties":{"align":"left"},"children":[{"type":"text","value":"Evaluation Method","position":{"start":{"line":57,"column":3,"offset":5670},"end":{"line":57,"column":20,"offset":5687}}}],"position":{"start":{"line":57,"column":1,"offset":5668},"end":{"line":57,"column":21,"offset":5688}}},{"type":"element","tagName":"th","properties":{"align":"left"},"children":[{"type":"text","value":"Speed","position":{"start":{"line":57,"column":23,"offset":5690},"end":{"line":57,"column":28,"offset":5695}}}],"position":{"start":{"line":57,"column":21,"offset":5688},"end":{"line":57,"column":29,"offset":5696}}},{"type":"element","tagName":"th","properties":{"align":"left"},"children":[{"type":"text","value":"Cost","position":{"start":{"line":57,"column":31,"offset":5698},"end":{"line":57,"column":35,"offset":5702}}}],"position":{"start":{"line":57,"column":29,"offset":5696},"end":{"line":57,"column":36,"offset":5703}}},{"type":"element","tagName":"th","properties":{"align":"left"},"children":[{"type":"text","value":"Scalability","position":{"start":{"line":57,"column":38,"offset":5705},"end":{"line":57,"column":49,"offset":5716}}}],"position":{"start":{"line":57,"column":36,"offset":5703},"end":{"line":57,"column":50,"offset":5717}}},{"type":"element","tagName":"th","properties":{"align":"left"},"children":[{"type":"text","value":"Nuance","position":{"start":{"line":57,"column":52,"offset":5719},"end":{"line":57,"column":58,"offset":5725}}}],"position":{"start":{"line":57,"column":50,"offset":5717},"end":{"line":57,"column":60,"offset":5727}}}],"position":{"start":{"line":57,"column":1,"offset":5668},"end":{"line":57,"column":60,"offset":5727}}}],"position":{"start":{"line":57,"column":1,"offset":5668},"end":{"line":57,"column":60,"offset":5727}}},{"type":"element","tagName":"tbody","properties":{},"children":[{"type":"element","tagName":"tr","properties":{},"children":[{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Human Evaluation","position":{"start":{"line":59,"column":5,"offset":5769},"end":{"line":59,"column":21,"offset":5785}}}],"position":{"start":{"line":59,"column":3,"offset":5767},"end":{"line":59,"column":23,"offset":5787}}}],"position":{"start":{"line":59,"column":1,"offset":5765},"end":{"line":59,"column":24,"offset":5788}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"Slow","position":{"start":{"line":59,"column":26,"offset":5790},"end":{"line":59,"column":30,"offset":5794}}}],"position":{"start":{"line":59,"column":24,"offset":5788},"end":{"line":59,"column":31,"offset":5795}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"High","position":{"start":{"line":59,"column":33,"offset":5797},"end":{"line":59,"column":37,"offset":5801}}}],"position":{"start":{"line":59,"column":31,"offset":5795},"end":{"line":59,"column":38,"offset":5802}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"Low","position":{"start":{"line":59,"column":40,"offset":5804},"end":{"line":59,"column":43,"offset":5807}}}],"position":{"start":{"line":59,"column":38,"offset":5802},"end":{"line":59,"column":44,"offset":5808}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"High","position":{"start":{"line":59,"column":46,"offset":5810},"end":{"line":59,"column":50,"offset":5814}}}],"position":{"start":{"line":59,"column":44,"offset":5808},"end":{"line":59,"column":52,"offset":5816}}}],"position":{"start":{"line":59,"column":1,"offset":5765},"end":{"line":59,"column":52,"offset":5816}}},{"type":"element","tagName":"tr","properties":{},"children":[{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"Metric-based (ROUGE)","position":{"start":{"line":60,"column":5,"offset":5821},"end":{"line":60,"column":25,"offset":5841}}}],"position":{"start":{"line":60,"column":3,"offset":5819},"end":{"line":60,"column":27,"offset":5843}}}],"position":{"start":{"line":60,"column":1,"offset":5817},"end":{"line":60,"column":28,"offset":5844}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"Fast","position":{"start":{"line":60,"column":30,"offset":5846},"end":{"line":60,"column":34,"offset":5850}}}],"position":{"start":{"line":60,"column":28,"offset":5844},"end":{"line":60,"column":35,"offset":5851}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"Low","position":{"start":{"line":60,"column":37,"offset":5853},"end":{"line":60,"column":40,"offset":5856}}}],"position":{"start":{"line":60,"column":35,"offset":5851},"end":{"line":60,"column":41,"offset":5857}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"High","position":{"start":{"line":60,"column":43,"offset":5859},"end":{"line":60,"column":47,"offset":5863}}}],"position":{"start":{"line":60,"column":41,"offset":5857},"end":{"line":60,"column":48,"offset":5864}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"Low","position":{"start":{"line":60,"column":50,"offset":5866},"end":{"line":60,"column":53,"offset":5869}}}],"position":{"start":{"line":60,"column":48,"offset":5864},"end":{"line":60,"column":55,"offset":5871}}}],"position":{"start":{"line":60,"column":1,"offset":5817},"end":{"line":60,"column":55,"offset":5871}}},{"type":"element","tagName":"tr","properties":{},"children":[{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"LLM-as-a-Judge","position":{"start":{"line":61,"column":5,"offset":5876},"end":{"line":61,"column":19,"offset":5890}}}],"position":{"start":{"line":61,"column":3,"offset":5874},"end":{"line":61,"column":21,"offset":5892}}}],"position":{"start":{"line":61,"column":1,"offset":5872},"end":{"line":61,"column":22,"offset":5893}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"Fast","position":{"start":{"line":61,"column":24,"offset":5895},"end":{"line":61,"column":28,"offset":5899}}}],"position":{"start":{"line":61,"column":22,"offset":5893},"end":{"line":61,"column":29,"offset":5900}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"Medium","position":{"start":{"line":61,"column":31,"offset":5902},"end":{"line":61,"column":37,"offset":5908}}}],"position":{"start":{"line":61,"column":29,"offset":5900},"end":{"line":61,"column":38,"offset":5909}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"High","position":{"start":{"line":61,"column":40,"offset":5911},"end":{"line":61,"column":44,"offset":5915}}}],"position":{"start":{"line":61,"column":38,"offset":5909},"end":{"line":61,"column":45,"offset":5916}}},{"type":"element","tagName":"td","properties":{"align":"left"},"children":[{"type":"text","value":"Medium","position":{"start":{"line":61,"column":47,"offset":5918},"end":{"line":61,"column":53,"offset":5924}}}],"position":{"start":{"line":61,"column":45,"offset":5916},"end":{"line":61,"column":55,"offset":5926}}}],"position":{"start":{"line":61,"column":1,"offset":5872},"end":{"line":61,"column":55,"offset":5926}}}],"position":{"start":{"line":59,"column":1,"offset":5765},"end":{"line":61,"column":55,"offset":5926}}}],"position":{"start":{"line":57,"column":1,"offset":5668},"end":{"line":61,"column":55,"offset":5926}}},"children":[["$","thead","thead-0",{"children":["$","tr","tr-0",{"children":[["$","th","th-0",{"style":{"textAlign":"left"},"children":"Evaluation Method"}],["$","th","th-1",{"style":{"textAlign":"left"},"children":"Speed"}],["$","th","th-2",{"style":{"textAlign":"left"},"children":"Cost"}],["$","th","th-3",{"style":{"textAlign":"left"},"children":"Scalability"}],["$","th","th-4",{"style":{"textAlign":"left"},"children":"Nuance"}]]}]}],["$","tbody","tbody-0",{"children":[["$","tr","tr-0",{"children":[["$","td","td-0",{"style":{"textAlign":"left"},"children":["$","strong",null,{"className":"font-semibold","node":"$95","children":"Human Evaluation"}]}],["$","td","td-1",{"style":{"textAlign":"left"},"children":"Slow"}],["$","td","td-2",{"style":{"textAlign":"left"},"children":"High"}],["$","td","td-3",{"style":{"textAlign":"left"},"children":"Low"}],["$","td","td-4",{"style":{"textAlign":"left"},"children":"High"}]]}],["$","tr","tr-1",{"children":[["$","td","td-0",{"style":{"textAlign":"left"},"children":["$","strong",null,{"className":"font-semibold","node":"$9f","children":"Metric-based (ROUGE)"}]}],["$","td","td-1",{"style":{"textAlign":"left"},"children":"Fast"}],["$","td","td-2",{"style":{"textAlign":"left"},"children":"Low"}],["$","td","td-3",{"style":{"textAlign":"left"},"children":"High"}],["$","td","td-4",{"style":{"textAlign":"left"},"children":"Low"}]]}],["$","tr","tr-2",{"children":[["$","td","td-0",{"style":{"textAlign":"left"},"children":["$","strong",null,{"className":"font-semibold","node":"$a9","children":"LLM-as-a-Judge"}]}],["$","td","td-1",{"style":{"textAlign":"left"},"children":"Fast"}],["$","td","td-2",{"style":{"textAlign":"left"},"children":"Medium"}],["$","td","td-3",{"style":{"textAlign":"left"},"children":"High"}],["$","td","td-4",{"style":{"textAlign":"left"},"children":"Medium"}]]}]]}]]}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"Practical limitations of developer-focused OpenAI evaluation","position":{"start":{"line":64,"column":4,"offset":5935},"end":{"line":64,"column":64,"offset":5995}}}],"position":{"start":{"line":64,"column":1,"offset":5932},"end":{"line":64,"column":66,"offset":5997}}},"children":"Practical limitations of developer-focused OpenAI evaluation"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"While the theory behind OpenAI Evaluation Best Practices is solid, the tools themselves are often a poor fit for a busy support team. Here’s where the textbook approach tends to fall apart in the real world.","position":{"start":{"line":66,"column":1,"offset":5999},"end":{"line":66,"column":208,"offset":6206}}}],"position":{"start":{"line":66,"column":1,"offset":5999},"end":{"line":66,"column":210,"offset":6208}}},"children":"While the theory behind OpenAI Evaluation Best Practices is solid, the tools themselves are often a poor fit for a busy support team. Here’s where the textbook approach tends to fall apart in the real world."}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Requires developer expertise","position":{"start":{"line":68,"column":5,"offset":6214},"end":{"line":68,"column":33,"offset":6242}}}],"position":{"start":{"line":68,"column":1,"offset":6210},"end":{"line":68,"column":35,"offset":6244}}},"children":"Requires developer expertise"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"To run evals with the standard frameworks, you have to be comfortable with APIs, command-line tools, and ","position":{"start":{"line":70,"column":1,"offset":6246},"end":{"line":70,"column":106,"offset":6351}}},{"type":"element","tagName":"a","properties":{"href":"https://cookbook.openai.com/examples/evaluation/getting_started_with_openai_evals"},"children":[{"type":"text","value":"formatting data in JSONL","position":{"start":{"line":70,"column":107,"offset":6352},"end":{"line":70,"column":131,"offset":6376}}}],"position":{"start":{"line":70,"column":106,"offset":6351},"end":{"line":70,"column":215,"offset":6460}}},{"type":"text","value":". That's just not realistic for most support leaders, who need tools they can manage themselves without filing a ticket with the engineering team and waiting.","position":{"start":{"line":70,"column":215,"offset":6460},"end":{"line":70,"column":373,"offset":6618}}}],"position":{"start":{"line":70,"column":1,"offset":6246},"end":{"line":70,"column":375,"offset":6620}}},"children":["To run evals with the standard frameworks, you have to be comfortable with APIs, command-line tools, and ",["$","a",null,{"href":"https://cookbook.openai.com/examples/evaluation/getting_started_with_openai_evals","node":"$b3","children":"formatting data in JSONL"}],". That's just not realistic for most support leaders, who need tools they can manage themselves without filing a ticket with the engineering team and waiting."]}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"The process is slow and disconnected","position":{"start":{"line":72,"column":5,"offset":6626},"end":{"line":72,"column":41,"offset":6662}}}],"position":{"start":{"line":72,"column":1,"offset":6622},"end":{"line":72,"column":43,"offset":6664}}},"children":"The process is slow and disconnected"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The typical workflow involves pulling data out of your ","position":{"start":{"line":74,"column":1,"offset":6666},"end":{"line":74,"column":56,"offset":6721}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/how-to-use-ai-helpdesk-tools-to-transform-support"},"children":[{"type":"text","value":"helpdesk","position":{"start":{"line":74,"column":57,"offset":6722},"end":{"line":74,"column":65,"offset":6730}}}],"position":{"start":{"line":74,"column":56,"offset":6721},"end":{"line":74,"column":143,"offset":6808}}},{"type":"text","value":", running tests in a completely separate place, and then trying to make sense of the results. It's clunky and doesn't give you feedback where you actually work: inside your helpdesk. This creates a gap between testing and actually running your support operations.","position":{"start":{"line":74,"column":143,"offset":6808},"end":{"line":74,"column":406,"offset":7071}}}],"position":{"start":{"line":74,"column":1,"offset":6666},"end":{"line":74,"column":408,"offset":7073}}},"children":["The typical workflow involves pulling data out of your ",["$","a",null,{"href":"https://www.eesel.ai/blog/how-to-use-ai-helpdesk-tools-to-transform-support","node":"$bd","children":"helpdesk"}],", running tests in a completely separate place, and then trying to make sense of the results. It's clunky and doesn't give you feedback where you actually work: inside your helpdesk. This creates a gap between testing and actually running your support operations."]}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Test datasets are often too small or generic","position":{"start":{"line":76,"column":5,"offset":7079},"end":{"line":76,"column":49,"offset":7123}}}],"position":{"start":{"line":76,"column":1,"offset":7075},"end":{"line":76,"column":51,"offset":7125}}},"children":"Test datasets are often too small or generic"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Building a good set of test data is tough. A lot of teams end up either testing on a handful of examples they wrote themselves or using generic industry benchmarks. Neither one really captures the unique, and often messy, variety of your real ","position":{"start":{"line":78,"column":1,"offset":7127},"end":{"line":78,"column":244,"offset":7370}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/conversational-ai-vs-chatbots-a-complete-comparison-guide"},"children":[{"type":"text","value":"customer conversations","position":{"start":{"line":78,"column":245,"offset":7371},"end":{"line":78,"column":267,"offset":7393}}}],"position":{"start":{"line":78,"column":244,"offset":7370},"end":{"line":78,"column":353,"offset":7479}}},{"type":"text","value":", which can give you a false sense of security.","position":{"start":{"line":78,"column":353,"offset":7479},"end":{"line":78,"column":400,"offset":7526}}}],"position":{"start":{"line":78,"column":1,"offset":7127},"end":{"line":78,"column":402,"offset":7528}}},"children":["Building a good set of test data is tough. A lot of teams end up either testing on a handful of examples they wrote themselves or using generic industry benchmarks. Neither one really captures the unique, and often messy, variety of your real ",["$","a",null,{"href":"https://www.eesel.ai/blog/conversational-ai-vs-chatbots-a-complete-comparison-guide","node":"$c7","children":"customer conversations"}],", which can give you a false sense of security."]}],"\n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"A better approach: Business-focused evaluation with eesel AI","position":{"start":{"line":80,"column":4,"offset":7533},"end":{"line":80,"column":64,"offset":7593}}}],"position":{"start":{"line":80,"column":1,"offset":7530},"end":{"line":80,"column":66,"offset":7595}}},"children":"A better approach: Business-focused evaluation with eesel AI"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Instead of making you learn a developer's toolkit, some platforms build evaluation right into a simple workflow that anyone can use. ","position":{"start":{"line":82,"column":1,"offset":7597},"end":{"line":82,"column":134,"offset":7730}}},{"type":"element","tagName":"a","properties":{"href":"https://eesel.ai"},"children":[{"type":"text","value":"eesel AI","position":{"start":{"line":82,"column":135,"offset":7731},"end":{"line":82,"column":143,"offset":7739}}}],"position":{"start":{"line":82,"column":134,"offset":7730},"end":{"line":82,"column":162,"offset":7758}}},{"type":"text","value":" was designed from the ground up to solve these practical problems for support teams.","position":{"start":{"line":82,"column":162,"offset":7758},"end":{"line":82,"column":247,"offset":7843}}}],"position":{"start":{"line":82,"column":1,"offset":7597},"end":{"line":82,"column":249,"offset":7845}}},"children":["Instead of making you learn a developer's toolkit, some platforms build evaluation right into a simple workflow that anyone can use. ",["$","a",null,{"href":"https://eesel.ai","node":"$d1","children":"eesel AI"}]," was designed from the ground up to solve these practical problems for support teams."]}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Get started in minutes: No-code evaluation","position":{"start":{"line":84,"column":5,"offset":7851},"end":{"line":84,"column":47,"offset":7893}}}],"position":{"start":{"line":84,"column":1,"offset":7847},"end":{"line":84,"column":49,"offset":7895}}},"children":"Get started in minutes: No-code evaluation"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Forget about complicated setups. ","position":{"start":{"line":86,"column":1,"offset":7897},"end":{"line":86,"column":34,"offset":7930}}},{"type":"element","tagName":"a","properties":{"href":"https://eesel.ai"},"children":[{"type":"text","value":"eesel AI","position":{"start":{"line":86,"column":35,"offset":7931},"end":{"line":86,"column":43,"offset":7939}}}],"position":{"start":{"line":86,"column":34,"offset":7930},"end":{"line":86,"column":62,"offset":7958}}},{"type":"text","value":" is a ","position":{"start":{"line":86,"column":62,"offset":7958},"end":{"line":86,"column":68,"offset":7964}}},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"truly self-serve","position":{"start":{"line":86,"column":70,"offset":7966},"end":{"line":86,"column":86,"offset":7982}}}],"position":{"start":{"line":86,"column":68,"offset":7964},"end":{"line":86,"column":88,"offset":7984}}},{"type":"text","value":" platform with ","position":{"start":{"line":86,"column":88,"offset":7984},"end":{"line":86,"column":103,"offset":7999}}},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"one-click helpdesk integrations","position":{"start":{"line":86,"column":105,"offset":8001},"end":{"line":86,"column":136,"offset":8032}}}],"position":{"start":{"line":86,"column":103,"offset":7999},"end":{"line":86,"column":138,"offset":8034}}},{"type":"text","value":". You can connect your knowledge from places like ","position":{"start":{"line":86,"column":138,"offset":8034},"end":{"line":86,"column":188,"offset":8084}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/integration/zendesk"},"children":[{"type":"text","value":"Zendesk","position":{"start":{"line":86,"column":189,"offset":8085},"end":{"line":86,"column":196,"offset":8092}}}],"position":{"start":{"line":86,"column":188,"offset":8084},"end":{"line":86,"column":239,"offset":8135}}},{"type":"text","value":" or ","position":{"start":{"line":86,"column":239,"offset":8135},"end":{"line":86,"column":243,"offset":8139}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/integration/confluence"},"children":[{"type":"text","value":"Confluence","position":{"start":{"line":86,"column":244,"offset":8140},"end":{"line":86,"column":254,"offset":8150}}}],"position":{"start":{"line":86,"column":243,"offset":8139},"end":{"line":86,"column":300,"offset":8196}}},{"type":"text","value":" and start evaluating your AI's potential without writing a single line of code or sitting through a sales demo.","position":{"start":{"line":86,"column":300,"offset":8196},"end":{"line":86,"column":412,"offset":8308}}}],"position":{"start":{"line":86,"column":1,"offset":7897},"end":{"line":86,"column":414,"offset":8310}}},"children":["Forget about complicated setups. ",["$","a",null,{"href":"https://eesel.ai","node":"$db","children":"eesel AI"}]," is a ",["$","strong",null,{"className":"font-semibold","node":"$e5","children":"truly self-serve"}]," platform with ",["$","strong",null,{"className":"font-semibold","node":"$ef","children":"one-click helpdesk integrations"}],". You can connect your knowledge from places like ",["$","a",null,{"href":"https://www.eesel.ai/integration/zendesk","node":"$f9","children":"Zendesk"}]," or ",["$","a",null,{"href":"https://www.eesel.ai/integration/confluence","node":"$103","children":"Confluence"}]," and start evaluating your AI's potential without writing a single line of code or sitting through a sales demo."]}],"\n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Test with confidence: Use past tickets for evaluation","position":{"start":{"line":88,"column":5,"offset":8316},"end":{"line":88,"column":58,"offset":8369}}}],"position":{"start":{"line":88,"column":1,"offset":8312},"end":{"line":88,"column":60,"offset":8371}}},"children":"Test with confidence: Use past tickets for evaluation"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"This is where it gets really powerful. eesel AI's ","position":{"start":{"line":90,"column":1,"offset":8373},"end":{"line":90,"column":51,"offset":8423}}},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"simulation mode","position":{"start":{"line":90,"column":53,"offset":8425},"end":{"line":90,"column":68,"offset":8440}}}],"position":{"start":{"line":90,"column":51,"offset":8423},"end":{"line":90,"column":70,"offset":8442}}},{"type":"text","value":" can run your AI setup on thousands of your real, historical tickets. This gives you an accurate, data-backed forecast of how your AI would have performed on real customer issues. No more guessing and no more building test datasets by hand.","position":{"start":{"line":90,"column":70,"offset":8442},"end":{"line":90,"column":310,"offset":8682}}}],"position":{"start":{"line":90,"column":1,"offset":8373},"end":{"line":90,"column":312,"offset":8684}}},"children":["This is where it gets really powerful. eesel AI's ",["$","strong",null,{"className":"font-semibold","node":"$10d","children":"simulation mode"}]," can run your AI setup on thousands of your real, historical tickets. This gives you an accurate, data-backed forecast of how your AI would have performed on real customer issues. No more guessing and no more building test datasets by hand."]}],"\n",["$","pre",null,{"className":"flex flex-col gap-3 text-base text-[#808080] font-default mb-5 text-wrap","node":{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"img","properties":{"loading":"lazy","decoding":"async","className":["alignnone","size-medium","wp-image"],"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/09/eeselAI-screenshot-Testing-Simulation.png","alt":"A screenshot of the eesel AI platform showing the simulation mode, a key feature for implementing OpenAI Evaluation Best Practices by testing on historical data.::","width":300,"height":169},"children":[],"position":{"start":{"line":92,"column":6,"offset":8691},"end":{"line":92,"column":383,"offset":9068}}},{"type":"text","value":"A screenshot of the eesel AI platform showing the simulation mode, a key feature for implementing OpenAI Evaluation Best Practices by testing on historical data.","position":{"start":{"line":92,"column":383,"offset":9068},"end":{"line":92,"column":544,"offset":9229}}}],"position":{"start":{"line":92,"column":1,"offset":8686},"end":{"line":92,"column":550,"offset":9235}}},"children":[["$","span",null,{"style":{"display":"block","position":"relative","width":"100%","aspectRatio":"300 / 169"},"children":["$","$L22",null,{"image":{"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/09/eeselAI-screenshot-Testing-Simulation.png","alt":"A screenshot of the eesel AI platform showing the simulation mode, a key feature for implementing OpenAI Evaluation Best Practices by testing on historical data.::","mediaDetails":{"width":300,"height":169}},"fill":true,"style":{"objectFit":"contain"},"className":"w-full h-auto border-2 border-[#e0e0e0] rounded-md overflow-hidden","sizes":"(max-width: 768px) 100vw, 700px"}]}],"A screenshot of the eesel AI platform showing the simulation mode, a key feature for implementing OpenAI Evaluation Best Practices by testing on historical data."]}]," \n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Get clear next steps, not just a score","position":{"start":{"line":94,"column":5,"offset":9243},"end":{"line":94,"column":43,"offset":9281}}}],"position":{"start":{"line":94,"column":1,"offset":9239},"end":{"line":94,"column":45,"offset":9283}}},"children":"Get clear next steps, not just a score"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The ","position":{"start":{"line":96,"column":1,"offset":9285},"end":{"line":96,"column":5,"offset":9289}}},{"type":"element","tagName":"strong","properties":{},"children":[{"type":"text","value":"actionable reporting","position":{"start":{"line":96,"column":7,"offset":9291},"end":{"line":96,"column":27,"offset":9311}}}],"position":{"start":{"line":96,"column":5,"offset":9289},"end":{"line":96,"column":29,"offset":9313}}},{"type":"text","value":" in ","position":{"start":{"line":96,"column":29,"offset":9313},"end":{"line":96,"column":33,"offset":9317}}},{"type":"element","tagName":"a","properties":{"href":"https://eesel.ai"},"children":[{"type":"text","value":"eesel AI","position":{"start":{"line":96,"column":34,"offset":9318},"end":{"line":96,"column":42,"offset":9326}}}],"position":{"start":{"line":96,"column":33,"offset":9317},"end":{"line":96,"column":61,"offset":9345}}},{"type":"text","value":" does more than give you a pass/fail grade. It analyzes the simulation to show you which topics are prime for automation. Even better, it points out the gaps in your ","position":{"start":{"line":96,"column":61,"offset":9345},"end":{"line":96,"column":227,"offset":9511}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/how-to-build-a-customer-support-knowledge-base"},"children":[{"type":"text","value":"knowledge base","position":{"start":{"line":96,"column":228,"offset":9512},"end":{"line":96,"column":242,"offset":9526}}}],"position":{"start":{"line":96,"column":227,"offset":9511},"end":{"line":96,"column":317,"offset":9601}}},{"type":"text","value":", giving you a clear to-do list for what help articles to write next, all based on real customer questions.","position":{"start":{"line":96,"column":317,"offset":9601},"end":{"line":96,"column":424,"offset":9708}}}],"position":{"start":{"line":96,"column":1,"offset":9285},"end":{"line":96,"column":426,"offset":9710}}},"children":["The ",["$","strong",null,{"className":"font-semibold","node":"$117","children":"actionable reporting"}]," in ",["$","a",null,{"href":"https://eesel.ai","node":"$121","children":"eesel AI"}]," does more than give you a pass/fail grade. It analyzes the simulation to show you which topics are prime for automation. Even better, it points out the gaps in your ",["$","a",null,{"href":"https://www.eesel.ai/blog/how-to-build-a-customer-support-knowledge-base","node":"$12b","children":"knowledge base"}],", giving you a clear to-do list for what help articles to write next, all based on real customer questions."]}],"\n",["$","pre",null,{"className":"flex flex-col gap-3 text-base text-[#808080] font-default mb-5 text-wrap","node":{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"img","properties":{"loading":"lazy","decoding":"async","className":["alignnone","size-medium","wp-image"],"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/09/eeselAI-screenshot-Reports-Knowledge-Gaps-Deflection-Rate.png","alt":"This screenshot shows eesel AI's actionable reporting, which is a practical application of OpenAI Evaluation Best Practices for identifying knowledge gaps.::","width":300,"height":169},"children":[],"position":{"start":{"line":98,"column":6,"offset":9717},"end":{"line":98,"column":397,"offset":10108}}},{"type":"text","value":"This screenshot shows eesel AI's actionable reporting, which is a practical application of OpenAI Evaluation Best Practices for identifying knowledge gaps.","position":{"start":{"line":98,"column":397,"offset":10108},"end":{"line":98,"column":552,"offset":10263}}}],"position":{"start":{"line":98,"column":1,"offset":9712},"end":{"line":98,"column":558,"offset":10269}}},"children":[["$","span",null,{"style":{"display":"block","position":"relative","width":"100%","aspectRatio":"300 / 169"},"children":["$","$L22",null,{"image":{"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/09/eeselAI-screenshot-Reports-Knowledge-Gaps-Deflection-Rate.png","alt":"This screenshot shows eesel AI's actionable reporting, which is a practical application of OpenAI Evaluation Best Practices for identifying knowledge gaps.::","mediaDetails":{"width":300,"height":169}},"fill":true,"style":{"objectFit":"contain"},"className":"w-full h-auto border-2 border-[#e0e0e0] rounded-md overflow-hidden","sizes":"(max-width: 768px) 100vw, 700px"}]}],"This screenshot shows eesel AI's actionable reporting, which is a practical application of OpenAI Evaluation Best Practices for identifying knowledge gaps."]}]," \n",["$","h3",null,{"className":"tracking-[0px] font-semibold text-2xl leading-[120%] pt-9 pb-6 tblsm:text-[28px] tblsm:pt-14","node":{"type":"element","tagName":"h3","properties":{},"children":[{"type":"text","value":"Roll out gradually and safely","position":{"start":{"line":100,"column":5,"offset":10277},"end":{"line":100,"column":34,"offset":10306}}}],"position":{"start":{"line":100,"column":1,"offset":10273},"end":{"line":100,"column":36,"offset":10308}}},"children":"Roll out gradually and safely"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"With ","position":{"start":{"line":102,"column":1,"offset":10310},"end":{"line":102,"column":6,"offset":10315}}},{"type":"element","tagName":"a","properties":{"href":"https://eesel.ai"},"children":[{"type":"text","value":"eesel AI","position":{"start":{"line":102,"column":7,"offset":10316},"end":{"line":102,"column":15,"offset":10324}}}],"position":{"start":{"line":102,"column":6,"offset":10315},"end":{"line":102,"column":34,"offset":10343}}},{"type":"text","value":", you can launch without the risk. After running a simulation, you can choose to automate just a small slice of tickets, like only ","position":{"start":{"line":102,"column":34,"offset":10343},"end":{"line":102,"column":165,"offset":10474}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/what-is-an-automated-order-processing-system-benefits-features-and-tools"},"children":[{"type":"text","value":"inquiries about \"order status.\"","position":{"start":{"line":102,"column":166,"offset":10475},"end":{"line":102,"column":197,"offset":10506}}}],"position":{"start":{"line":102,"column":165,"offset":10474},"end":{"line":102,"column":298,"offset":10607}}},{"type":"text","value":" You can watch how it performs in real-time and expand the scope as you get more comfortable. This kind of careful control gives you a smooth, safe rollout that you just can't get with platforms that demand an all-or-nothing approach.","position":{"start":{"line":102,"column":298,"offset":10607},"end":{"line":102,"column":532,"offset":10841}}}],"position":{"start":{"line":102,"column":1,"offset":10310},"end":{"line":102,"column":534,"offset":10843}}},"children":["With ",["$","a",null,{"href":"https://eesel.ai","node":"$135","children":"eesel AI"}],", you can launch without the risk. After running a simulation, you can choose to automate just a small slice of tickets, like only ",["$","a",null,{"href":"https://www.eesel.ai/blog/what-is-an-automated-order-processing-system-benefits-features-and-tools","node":"$13f","children":"inquiries about \"order status.\""}]," You can watch how it performs in real-time and expand the scope as you get more comfortable. This kind of careful control gives you a smooth, safe rollout that you just can't get with platforms that demand an all-or-nothing approach."]}],"\n",["$","pre",null,{"className":"flex flex-col gap-3 text-base text-[#808080] font-default mb-5 text-wrap","node":{"type":"element","tagName":"pre","properties":{},"children":[{"type":"element","tagName":"img","properties":{"loading":"lazy","decoding":"async","className":["alignnone","size-medium","wp-image"],"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/06-eeselAI-Customization-Rules.png","alt":"This image displays the customization rules in eesel AI, allowing for a safe, gradual rollout as part of OpenAI Evaluation Best Practices.::","width":300,"height":169},"children":[],"position":{"start":{"line":104,"column":6,"offset":10850},"end":{"line":104,"column":353,"offset":11197}}},{"type":"text","value":"This image displays the customization rules in eesel AI, allowing for a safe, gradual rollout as part of OpenAI Evaluation Best Practices.","position":{"start":{"line":104,"column":353,"offset":11197},"end":{"line":104,"column":491,"offset":11335}}}],"position":{"start":{"line":104,"column":1,"offset":10845},"end":{"line":104,"column":497,"offset":11341}}},"children":[["$","span",null,{"style":{"display":"block","position":"relative","width":"100%","aspectRatio":"300 / 169"},"children":["$","$L22",null,{"image":{"src":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/06-eeselAI-Customization-Rules.png","alt":"This image displays the customization rules in eesel AI, allowing for a safe, gradual rollout as part of OpenAI Evaluation Best Practices.::","mediaDetails":{"width":300,"height":169}},"fill":true,"style":{"objectFit":"contain"},"className":"w-full h-auto border-2 border-[#e0e0e0] rounded-md overflow-hidden","sizes":"(max-width: 768px) 100vw, 700px"}]}],"This image displays the customization rules in eesel AI, allowing for a safe, gradual rollout as part of OpenAI Evaluation Best Practices."]}]," \n",["$","h2",null,{"className":"text-[28px] tracking-[0px] font-semibold text-[#121212] tblsm:mb-8 leading-[120%] max-w-[600px] mt-14 mb-6 tblsm:text-4xl tblsm:leading-[110%] tblsm:max-w-none tblsm:mt-20","node":{"type":"element","tagName":"h2","properties":{},"children":[{"type":"text","value":"Stop guessing, start measuring","position":{"start":{"line":106,"column":4,"offset":11348},"end":{"line":106,"column":34,"offset":11378}}}],"position":{"start":{"line":106,"column":1,"offset":11345},"end":{"line":106,"column":36,"offset":11380}}},"children":"Stop guessing, start measuring"}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"Putting ","position":{"start":{"line":108,"column":1,"offset":11382},"end":{"line":108,"column":9,"offset":11390}}},{"type":"element","tagName":"a","properties":{"href":"https://www.eesel.ai/blog/how-can-ai-automate-customer-support-a-helpful-guide"},"children":[{"type":"text","value":"AI to work in customer support","position":{"start":{"line":108,"column":10,"offset":11391},"end":{"line":108,"column":40,"offset":11421}}}],"position":{"start":{"line":108,"column":9,"offset":11390},"end":{"line":108,"column":121,"offset":11502}}},{"type":"text","value":" isn't a matter of ","position":{"start":{"line":108,"column":121,"offset":11502},"end":{"line":108,"column":140,"offset":11521}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"if","position":{"start":{"line":108,"column":141,"offset":11522},"end":{"line":108,"column":143,"offset":11524}}}],"position":{"start":{"line":108,"column":140,"offset":11521},"end":{"line":108,"column":144,"offset":11525}}},{"type":"text","value":" anymore, but ","position":{"start":{"line":108,"column":144,"offset":11525},"end":{"line":108,"column":158,"offset":11539}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"how","position":{"start":{"line":108,"column":159,"offset":11540},"end":{"line":108,"column":162,"offset":11543}}}],"position":{"start":{"line":108,"column":158,"offset":11539},"end":{"line":108,"column":163,"offset":11544}}},{"type":"text","value":". A huge part of the \"how\" is having a dependable way to evaluate it. While the concepts behind OpenAI Evaluation Best Practices point us in the right direction, the standard tools are often too technical and disconnected for business teams.","position":{"start":{"line":108,"column":163,"offset":11544},"end":{"line":108,"column":404,"offset":11785}}}],"position":{"start":{"line":108,"column":1,"offset":11382},"end":{"line":108,"column":406,"offset":11787}}},"children":["Putting ",["$","a",null,{"href":"https://www.eesel.ai/blog/how-can-ai-automate-customer-support-a-helpful-guide","node":"$149","children":"AI to work in customer support"}]," isn't a matter of ",["$","em","em-0",{"children":"if"}]," anymore, but ",["$","em","em-1",{"children":"how"}],". A huge part of the \"how\" is having a dependable way to evaluate it. While the concepts behind OpenAI Evaluation Best Practices point us in the right direction, the standard tools are often too technical and disconnected for business teams."]}],"\n",["$","p",null,{"className":"","node":{"type":"element","tagName":"p","properties":{},"children":[{"type":"text","value":"The right platform makes sophisticated evaluation a simple, built-in part of your operations. By embedding simulation and reporting directly into a self-serve workflow, ","position":{"start":{"line":110,"column":1,"offset":11789},"end":{"line":110,"column":170,"offset":11958}}},{"type":"element","tagName":"a","properties":{"href":"https://eesel.ai"},"children":[{"type":"text","value":"eesel AI","position":{"start":{"line":110,"column":171,"offset":11959},"end":{"line":110,"column":179,"offset":11967}}}],"position":{"start":{"line":110,"column":170,"offset":11958},"end":{"line":110,"column":198,"offset":11986}}},{"type":"text","value":" lets you test on your own data and deploy with confidence. You can finally stop ","position":{"start":{"line":110,"column":198,"offset":11986},"end":{"line":110,"column":279,"offset":12067}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"hoping","position":{"start":{"line":110,"column":280,"offset":12068},"end":{"line":110,"column":286,"offset":12074}}}],"position":{"start":{"line":110,"column":279,"offset":12067},"end":{"line":110,"column":287,"offset":12075}}},{"type":"text","value":" your AI works and start ","position":{"start":{"line":110,"column":287,"offset":12075},"end":{"line":110,"column":312,"offset":12100}}},{"type":"element","tagName":"em","properties":{},"children":[{"type":"text","value":"proving","position":{"start":{"line":110,"column":313,"offset":12101},"end":{"line":110,"column":320,"offset":12108}}}],"position":{"start":{"line":110,"column":312,"offset":12100},"end":{"line":110,"column":321,"offset":12109}}},{"type":"text","value":" it.","position":{"start":{"line":110,"column":321,"offset":12109},"end":{"line":110,"column":325,"offset":12113}}}],"position":{"start":{"line":110,"column":1,"offset":11789},"end":{"line":110,"column":327,"offset":12115}}},"children":["The right platform makes sophisticated evaluation a simple, built-in part of your operations. By embedding simulation and reporting directly into a self-serve workflow, ",["$","a",null,{"href":"https://eesel.ai","node":"$153","children":"eesel AI"}]," lets you test on your own data and deploy with confidence. You can finally stop ",["$","em","em-0",{"children":"hoping"}]," your AI works and start ",["$","em","em-1",{"children":"proving"}]," it."]}],"\n",["$","$L15d",null,{"categoryName":"microsoft-teams-ai-en"}]]}]]}]}]}]]}],false,["$","div",null,{"children":[["$","$L15e","0-AcfFaqs",{"children":["$","$11",null,{"fallback":null,"children":["$","$L15f",null,{"_data":"$160","extra":{"faqs":{"hasTopMargin":true,"isBlogPage":true},"blogCategory":"microsoft-teams-ai-en","textBlock":{"isFirstTextBlock":false}}}]}]}]]}],false]}]]}],["$","div",null,{"className":"relative hidden dskxl:flex flex-col gap-6 ","children":["$","div",null,{"className":"sticky top-[92px]","children":["$","$L16c",null,{"BASE_URL":"https://www.eesel.ai","locale":"EN","shareUrl":"https://www.eesel.ai/en/blog/openai-evaluation-best-practices-en","categoryName":"microsoft-teams-ai-en"}]}]}]]}],["$","div",null,{"className":"grid gap-[72px] place-items-center py-12 tblsm:py-18 h-fit max-w-[800px] mx-auto dsklg:max-w-full","children":[["$","$L16d",null,{"url":"https://www.eesel.ai/en/blog/openai-evaluation-best-practices-en","title":"A practical guide to OpenAI evaluation best practices for support teams - eesel AI","isTextCentered":true}],["$","$L16e",null,{"data":"$16f"}]]}]]}]]}],["$","$L192",null,{"relateds":[{"id":"cG9zdDo3MjUzMQ==","title":"A practical guide to the ChatGPT group chat feature for teams in 2025","excerpt":"

OpenAI is piloting a ChatGPT group chat feature, but is it right for your team? This guide covers its features, alternatives, and our recommendation for secure, internal collaboration.

\n","slug":"chatgpt-group-chat-en","date":"2025-11-18T22:53:35","language":{"slug":"en"},"featuredImage":{"node":{"altText":"","mediaDetails":{"width":1785,"height":949},"sourceUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/09/Banner-The-ultimate-guide-to-using-ChatGPT-Slack.png"}},"author":{"node":{"firstName":"Kenneth","lastName":"Pangan","authors":{"avatar":{"node":{"altText":"","mediaItemUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/01/ff982460-eca1-4f0e-b1db-aa9ad25df868.jpg","mediaDetails":{"width":1894,"height":3718}}},"role":"Writer","roleFrench":"Écrivain","roleGerman":"Schriftsteller","roleSpanish":"Escritor","rolePortuguese":"Escritor","roleJapanese":"作家"}}},"postMeta":{"minsRead":null}},{"id":"cG9zdDo3MTU2Ng==","title":"The 5 best AI tools for Salesforce teams in 2025 (including Einstein for Developers)","excerpt":"

Looking for the right AI tool beyond just Einstein for Developers? We've reviewed the top 5 AI tools that empower your entire Salesforce team in 2025, from developers to support agents.

\n","slug":"einstein-for-developers-en","date":"2025-11-16T06:51:17","language":{"slug":"en"},"featuredImage":{"node":{"altText":"","mediaDetails":{"width":1785,"height":949},"sourceUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/05/Banner-12.png"}},"author":{"node":{"firstName":"Kenneth","lastName":"Pangan","authors":{"avatar":{"node":{"altText":"","mediaItemUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/01/ff982460-eca1-4f0e-b1db-aa9ad25df868.jpg","mediaDetails":{"width":1894,"height":3718}}},"role":"Writer","roleFrench":"Écrivain","roleGerman":"Schriftsteller","roleSpanish":"Escritor","rolePortuguese":"Escritor","roleJapanese":"作家"}}},"postMeta":{"minsRead":null}},{"id":"cG9zdDo2NDQ1Nw==","title":"I tested 7 AI platforms to find the best free vs paid AI tools for support teams in 2025","excerpt":"

Should you pay for AI? This guide compares the 7 best free vs paid AI tools for customer support in 2025, helping you decide what's right for your team.

\n","slug":"free-vs-paid-ai-tools-en","date":"2025-11-13T07:46:51","language":{"slug":"en"},"featuredImage":{"node":{"altText":"","mediaDetails":{"width":1785,"height":949},"sourceUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/08/Banner-I-tested-the-5-best-AI-service-desk-platforms-in-2025-to-fix-our-broken-support-workflow.png"}},"author":{"node":{"firstName":"Kenneth","lastName":"Pangan","authors":{"avatar":{"node":{"altText":"","mediaItemUrl":"https://website-cms.eesel.ai/wp-content/uploads/2025/01/ff982460-eca1-4f0e-b1db-aa9ad25df868.jpg","mediaDetails":{"width":1894,"height":3718}}},"role":"Writer","roleFrench":"Écrivain","roleGerman":"Schriftsteller","roleSpanish":"Escritor","rolePortuguese":"Escritor","roleJapanese":"作家"}}},"postMeta":{"minsRead":null}}]}]]}]