Evaluating the Performance of AI Agents Using Zendesk QA: A 2025 Guide

Kenneth Pangan
Written by

Kenneth Pangan

Amogh Sarda
Reviewed by

Amogh Sarda

Last edited November 12, 2025

Expert Verified

As more companies use AI agents to handle customer support, the conversation has changed. It's no longer a question of if you should use them, but how well they're actually doing their job. Let’s be honest, a bad AI interaction can be way more frustrating for a customer than just waiting in a queue.

This is where a tool like Zendesk QA comes into the picture. It’s built to give you a look under the hood at your AI's performance. But does it show you everything you need to see?

In this guide, we'll walk through how to use Zendesk QA for evaluating the performance of AI agents, covering what it does well. More importantly, we'll talk about its blind spots and explore a more solid, proactive way to test your AI, making sure it’s actually helpful before it ever interacts with a customer.

What is Zendesk QA for AI agents?

Zendesk QA for AI agents is a feature inside the Zendesk suite that helps managers check in on conversations handled by their bots. The goal is to hold your AI to the same standards as your human agents by checking for things like tone, accuracy, and whether it actually solved the problem.

It boils down to a few main functions:

  • Manual reviews: You can create custom scorecards and have your team grade bot conversations by hand. This is perfect for digging into specific interactions that just don't feel right.

  • Automated scoring (AutoQA): This feature uses AI to automatically score 100% of your conversations, whether they’re with a human or a bot. It checks for set categories like greetings, empathy, spelling, and tone.

  • Dedicated dashboards (BotQA): Zendesk gives you a special dashboard with reports on key AI metrics. You can see things like how often the bot gives up and escalates to a human, gets stuck in a loop, or when a conversation just has a negative vibe.

It’s a pretty good toolset if your team is already all-in on the Zendesk ecosystem. It's designed to work best with Zendesk's own conversation bots and a few other integrations, making it a powerful but somewhat fenced-in solution.

How to set up and use Zendesk QA

Getting started with Zendesk QA is less about a complicated technical setup and more about defining what a "good" interaction looks like for your team. From there, you use the tools to measure your AI against that standard.

Manual evaluations and scorecards

The starting point for manual reviews is the scorecard. This is where you lay out the rules your AI will be judged by. You can set up scorecards to look at different parts of a conversation, from the opening "hello" to the final resolution.

The process itself is pretty simple: you filter your conversations to find the ones handled by a specific bot, pick an interaction, and then have a team member grade the bot's performance using the scorecard.

Pro Tip
When you're building your scorecard, try to go beyond simple yes/no questions. Add categories that check if your AI can handle vague or complicated questions with multiple parts. These are the areas where many bots stumble, and a basic scorecard might miss them completely.

Using automated evaluation with AutoScoring

Zendesk’s AutoScoring aims to cover all your bases by automatically evaluating every single conversation. It scans for specific, pre-set categories, including:

  • Greeting and closing

  • Spelling and grammar

  • Tone and empathy

  • Solution offered

This gives you a bird's-eye view of performance and can quickly flag conversations that seem off. But here's the catch: while it can tell you if a solution was offered, it often can't tell you if the right solution was offered for a tricky or technical problem. It's good at spotting style issues but can miss the mark on substance.

BotQA dashboard metrics

The BotQA dashboard is your main hub for AI performance. It gathers all the data into a few key reports to give you a high-level view of how your bot is doing.

  • Bot-only conversation rate: This shows you the percentage of conversations your AI handled all by itself, without a human stepping in. It’s a decent way to measure your deflection rate.

  • Escalation rate: This is the one to keep an eye on. It tells you how often customers gave up on the bot and asked for a human. A high escalation rate is a clear sign that your AI isn't meeting customer needs.

  • Bot looping rate: We’ve all been there, stuck in a loop with a bot that keeps giving the same unhelpful answer. This metric tracks how often that happens, which is a huge source of customer frustration.

  • Bot negative sentiment rate: This report highlights conversations where the customer's language was flagged as negative, unhappy, or frustrated.

Key limitations of Zendesk QA

While Zendesk QA is a useful tool for reviewing what already happened, it doesn't really help you predict what will happen. It’s a reactive tool, and in the world of AI support, you really want to be proactive.

The real world is messy: A challenge for Zendesk QA

The biggest hurdle is that Zendesk QA analyzes conversations after they're over. It doesn’t give you a way to put your AI through its paces before you unleash it on your customers. Real customer questions are all over the place. People ask vague questions, cram multiple requests into one message, or need help with complex troubleshooting steps.

Think about the kinds of questions that really test an AI:

  • Vague questions: A customer might just type "it's broken." A smart AI should ask follow-up questions to figure out what "it" is. A less-than-smart AI will just reply with "I don't understand."

  • Multi-part questions: "How do I reset my password, can I change my shipping address, and what's your return policy for sale items?" Can your AI handle all three parts, or does it just grab onto the first one it recognizes?

  • Troubleshooting questions: These require a step-by-step process. Can the AI walk a user through a sequence of actions, or does it just send them a link to a help article and call it a day?

A simple "Solution Offered" checkbox on a Zendesk QA scorecard just doesn't capture this kind of nuance. You often only find out where your bot is failing when a real customer gets annoyed.

The "walled garden" problem and siloed knowledge

Zendesk QA is built to work best inside the Zendesk ecosystem.

Reddit
As some users have pointed out, getting it to play nice with external tools can be a headache, and you often have to manually flag conversations from non-native bots.

But the bigger issue is that your company's knowledge isn't all in one place. It’s spread across Confluence, Google Docs, Slack, internal wikis, and old support tickets. An AI trained only on your official help center articles is missing most of the story.

Zendesk QA can tell you an answer was wrong, but it can't fix the root cause: the AI didn't have access to the right information. This is where a platform like eesel AI really shines. It connects all of your scattered knowledge sources, from your helpdesk to your internal docs, and creates a single brain for your AI to use.

An infographic explaining how eesel AI can connect to multiple knowledge sources to provide comprehensive answers, a key part of evaluating the performance of AI agents using Zendesk QA.::
An infographic explaining how eesel AI can connect to multiple knowledge sources to provide comprehensive answers, a key part of evaluating the performance of AI agents using Zendesk QA.

No way to practice without risk: A flaw in Zendesk QA

The biggest risk of launching a new AI agent is creating a terrible customer experience. The only real way to avoid that is to simulate its performance on your actual historical data before it goes live.

Zendesk QA doesn't have a true simulation mode. You can look at past conversations, but you can't take a new AI setup and test it against thousands of your old tickets to see how it would have done. This means you're basically forced to test on live customers, which can hurt their trust if the AI isn't ready.

In contrast, a platform like eesel AI offers a powerful simulation mode right from the start. You can test your AI on thousands of your past tickets, get a precise forecast of your resolution rate, and see exactly where it might get stuck, all before it affects a single customer.

A screenshot showing the simulation mode in eesel AI, which allows for evaluating the performance of AI agents using Zendesk QA against historical data before deployment.::
A screenshot showing the simulation mode in eesel AI, which allows for evaluating the performance of AI agents using Zendesk QA against historical data before deployment.

Confusing and unpredictable pricing

Finally, Zendesk QA isn't a simple, out-of-the-box feature. It’s an add-on, and to get the most out of it, you often need the "Advanced AI Agents" add-on too. This leads to a layered, complicated pricing structure that can be hard to pin down. As your team or ticket volume grows, you might find yourself paying for multiple add-ons on top of your main plan, making it tough to budget.

Beyond Zendesk QA: A better way with proactive simulation

The shortcomings of reactive QA tools point toward a more modern way of thinking. The goal shouldn't just be to review performance after the fact, but to build a smarter, more dependable AI from day one. This really comes down to two things: proactive simulation and unified knowledge.

See how you'll perform before you launch: A better approach

Imagine knowing your AI's resolution rate before it ever talks to a customer. That's what simulation gives you. It means testing your AI agent against thousands of your real, historical support tickets in a safe, offline space.

eesel AI has this built in as a core feature. You can sign up, connect your helpdesk, and in minutes, see exactly how the AI would have answered your past tickets. You get a data-backed report showing what its resolution rate would have been and which specific tickets it would have fumbled. This takes the guesswork out of the equation and lets you deploy with confidence.

Building a smarter AI by connecting all knowledge: An upgrade

An AI agent is only as smart as the information it can get to. While older tools often trap your AI in a single knowledge base, a modern approach means connecting everything.

This video explains the first of three steps to implementing Zendesk QA, a key process in evaluating the performance of AI agents.

eesel AI was designed for this. With one-click integrations for over 100 sources like Confluence, Google Docs, Notion, Slack, and of course, your past tickets, you can give your AI the full context it needs. This helps it answer tricky, detailed questions correctly, which cuts down on escalations and improves the quality of every conversation.

Zendesk QA pricing

Zendesk QA isn't included in every plan; it’s usually a paid add-on. Its full power for AI agents is often tied to buying the Advanced AI add-on on top of a Zendesk Suite plan.

This can make figuring out the total cost a bit tricky. You're not just buying one thing, but a base plan plus one or two add-ons to get the tools you need.

PlanPrice (per agent/month, billed annually)Key Features IncludedQA & Advanced AI Add-ons
Suite Team$55Ticketing, Messaging, Help CenterAdvanced AI: Add-on, QA: Add-on
Suite Professional$115Everything in Team + Advanced reporting, SLAsAdvanced AI: Add-on, QA: Add-on

Move from looking back to planning ahead

Zendesk QA is a decent tool for teams who are fully committed to the Zendesk ecosystem and need a way to do quality checks on their AI agents after the fact. It gives you a dashboard and a process for reviewing what’s already happened.

However, its limitations, the walled-off knowledge, lack of true simulation, and confusing pricing, don't quite cut it for modern teams who want to deploy AI without crossing their fingers.

The future of AI quality control is proactive. It's about simulating performance before you launch and training your AI on all of your company's knowledge, not just what’s in the help center. By doing this, you can build an AI agent that actually understands your customers and solves their problems from day one.

eesel AI is a platform built for this proactive approach. It gives you the tools to test, build, and deploy with confidence.

Ready to stop guessing and start knowing how your AI will perform? Simulate your AI agent on your historical tickets with eesel AI for free. Go live in minutes, not months.

Frequently asked questions

Zendesk QA allows you to review bot conversations through manual scorecards and automated scoring (AutoQA) for elements like tone and grammar. It also provides dedicated BotQA dashboards that show key metrics, such as escalation rates and bot looping rates, giving you insight into bot-handled conversations.

The BotQA dashboard in Zendesk QA tracks several key metrics to gauge AI performance. You can monitor the bot-only conversation rate, escalation rate to human agents, bot looping rate, and the bot negative sentiment rate. These help identify where your AI might be struggling.

Yes, a primary limitation is that Zendesk QA is reactive, analyzing interactions after they've occurred rather than predicting future performance. It also lacks a true simulation mode to test AI against historical data before launch, and it operates best within the Zendesk ecosystem, limiting integration with external knowledge sources.

No, Zendesk QA primarily focuses on reviewing actual conversations post-interaction. It does not offer a true simulation mode to test a new AI setup against thousands of historical tickets before it goes live. This means initial testing often happens with real customers.

Zendesk QA is typically a paid add-on, not included in every base plan. To unlock its full capabilities for AI agents, it often requires purchasing the "Advanced AI add-on" in addition to a Zendesk Suite plan. This results in a layered and potentially complex pricing structure.

While Zendesk QA works well within the Zendesk ecosystem, it generally functions best with Zendesk's own conversation bots and integrated tools. Integrating it with widely scattered external knowledge sources like Confluence, Google Docs, or Slack can be challenging and often requires manual flagging or workarounds.

Share this post

Kenneth undefined

Article by

Kenneth Pangan

Writer and marketer for over ten years, Kenneth Pangan splits his time between history, politics, and art with plenty of interruptions from his dogs demanding attention.