Conversational AI for banking: what works and what breaks

Written by

Alicia Kirana Utomo

Reviewed by

Katelin Teen

Last edited July 4, 2026

Expert Verified

Illustration of an AI assistant helping a bank customer with balances, payments and fraud alerts inside a mobile banking app

TL;DR

Conversational AI for banking has quietly become the default. All ten of the largest US commercial banks run a chatbot, and about 37% of the US population, over 98 million people, used one in 2022, per the CFPB's 2023 report. Bank of America's Erica alone has crossed 3 billion interactions.

But there's a split running right through it. The deployments that work are grounded in the bank's own documents, gated by a confidence check, and quick to hand off to a human. The ones that enrage customers loop, repeat themselves, and hide the human, sometimes even during a fraud case. That gap is the whole story of this post.

I build AI agents at eesel, and we've spent years running them on live support queues, including a German loan-comparison platform where our agent handles 100,000+ tickets a month. The single hardest lesson: a confident bot will fabricate an answer the moment its knowledge base comes up empty. That's why the safe pattern below matters more than any feature list.

What conversational AI actually means in banking

Strip away the marketing and conversational AI is just this: a customer types or says what they want the way they'd say it to a teller, and the software understands the intent and either answers or does the task. No phone tree, no "press 2 for balances."

The CFPB lays it out on a sophistication ladder, and the rungs matter:

Rule-based chatbots run on "decision tree logic or a database of keywords," so the user is "limited to predefined possible inputs." Think a menu of buttons. If you've read our breakdown of an AI agent vs a rule-based chatbot, this is the bottom rung.
NLU chatbots use natural language understanding to recognise the intent behind free text, not just keywords. NatWest describes its Cora assistant as handling queries "through natural language processing and machine learning."
LLM-backed agents are the new top rung. The CFPB notes banks "moving from simple, rule-based chatbots towards more sophisticated technologies such as large language models."

The rung that makes LLMs safe for banking is retrieval-augmented generation, or grounding: the model answers from the bank's own knowledge base instead of its training weights. DBS describes DBS Joy as integrating "large language models with the bank's proprietary knowledge base," letting it "move beyond pre-programmed static answers to dynamic responses." Wells Fargo goes further and architects its assistant so no personal data reaches the LLM at all.

Here's the shape of a well-built flow, and why the confidence check in the middle is non-negotiable:

How a banking AI answers a question safely: intent recognition, retrieval from an approved knowledge base, a confidence check, then either a grounded answer or a human handoff

The CFPB is blunt about the ceiling, too: chatbots "may be useful for resolving basic inquiries, but their effectiveness wanes as problems become more complex." That's not a knock on the tech, it's the design brief. The tools people actually like are the ones that know their own limits and reach for a person. Our guide to AI in customer service covers that boundary in general terms; banking just raises the stakes.

Where it's already working: the big bank deployments

The proof isn't in a vendor deck, it's in the banks' own numbers. Here are the flagship deployments, all sourced from primary newsrooms:

Bank	Assistant	Scale (from the bank)	Standout use case	Built or bought
Bank of America	Erica	3B+ interactions, ~50M users, 58M/month	Balance-trend alerts, investment guidance	In-house
Wells Fargo	Fargo	1B+ interactions in under 3 years	Zelle payments, spending insights	Google Cloud LLMs
Capital One	Eno	SMS-first since March 2017	Fraud alerts, virtual card numbers	In-house
NatWest	Cora / Cora+	10.8M queries in 2023	Mortgage guidance, summarised handoff	Built with IBM
DBS	DBS Joy	120k+ chats, +23% CSAT	Corporate/SME servicing	In-house

A few things jump out. First, the use cases cluster around high-volume, low-risk lookups: Erica flags balance trends "in the next 7 days," while Eno proactively alerts on "a double charge, an abnormally large tip amount, or potential fraud" and generates merchant-specific virtual card numbers. Second, the smart ones treat the AI as a front door, not a wall. NatWest's Cora+ hands off with a summary so "the human agent can quickly understand what support the customer needs."

Third, multilingual reach is a genuine unlock, not a footnote. More than 3 million Spanish-speaking Wells Fargo customers have used Fargo over 160 million times. That's the kind of coverage that's brutally expensive to staff with humans and cheap to add with an agent trained on multilingual history.

The economics explain the rush. Juniper Research projected banking chatbots would save $7.3 billion globally by 2023, up from $209 million in 2019, equal to 862 million hours of work, with mobile apps carrying 79% of interactions. The CFPB puts the unit figure at $0.70 saved per interaction. If you want to sanity-check those numbers for your own team, our piece on AI vs human customer support walks through the comparison.

What customers actually think

Here's the part vendor pages skip. If you read where real banking customers talk, the frustration voice is louder, sharper, and more specific than the praise, and it's worth listening to, because it tells you exactly what to avoid.

The angriest complaint is the loop that won't escalate, even when money is on fire:

"I've had fraud happening on my card this week and I've never had such an excruciating experience with a bank... I had to threaten to reach out to KiFid [the Dutch financial ombudsman] for them to allow me to speak to a human. Also the AI will occasionally pretend to be a person too. It's all horrible."
Bitterboule80 on r/bunq

That's the exact "doom loop" the CFPB warned about, playing out in a fraud case. The follow-on theme is just as consistent: people want a visible human, and trust falls off a cliff the moment it's not simple. As one fintech operator put it watching their own customers:

"i've seen customers be fine with bots for simple stuff but get wary as soon as money or disputes are involved."
thepillowco on r/fintech

There's even a distinctly banking flavour of complaint: the bot as a downgrade of a feature people already had. A Bank of America customer's rant about being pushed to "ask Erica" instead of just filtering their own statements is a useful reminder that conversational AI is not automatically an upgrade over a good search box.

None of this says "don't." It says the bar is trust, and the failure mode is specific and avoidable. The practitioners who've shipped these agree on the fix. From the same r/fintech thread:

"The key though is avoiding generic bots and keeping it rules-based, built for a specific domain/process/problem (especially in regulated areas like disputes), integrating with back-office data, and making handover to humans seamless."

That's the recipe, in a customer's own words. Getting the handoff right is what separates the deployments people tolerate from the ones they rage-quit.

What to automate, and what to hand to a human

So where's the line? After watching a lot of these go live, my rule is simple: automate the lookup, escalate the decision. If answering the question could move money, deny a product, or invoke a legal right, a human owns it. Everything else is fair game for the agent.

A split showing what to automate (check balance, recent transactions, freeze or replace a card, find routing number, request a statement) versus what to route to a human (loan or credit decision, dispute a charge, fraud claim, formal complaint, anything with legal effect)

The left column is where the volume and the savings live, and it's exactly the tier-1 work that eats a support team's day. It maps cleanly to what a good AI helpdesk agent already does well: deflect the repetitive stuff, keep the answers consistent, log everything. The right column is where a wrong answer becomes a regulatory problem, so those flows should capture intent, then route, never guess.

The mistake I see most is teams trying to push the line rightward too fast, letting the bot attempt disputes or loan questions because a demo made it look capable. That's how you end up in the r/bunq threads above. Start narrow, prove it on the left column, and expand only when your deflection rate and your escalation quality both hold up.

The compliance surface banking adds

This is what makes banking different from generic support. A wrong answer here isn't an annoyed customer, it's a regulated institution potentially breaking federal law. The CFPB said it plainly: a "poorly deployed chatbot can lead to customer frustration, reduced trust, and even violations of the law."

So a banking bot carries a compliance surface a generic support bot doesn't:

What a banking bot must satisfy: redact card numbers and PII (PCI DSS), encrypt data in transit and at rest (GLBA), keep audit logs (EU AI Act), offer a human off-ramp (GDPR / CFPB), answer only from approved knowledge (accuracy)

Walking the stack:

PII and card-data handling. The Gramm-Leach-Bliley Act and the FTC Safeguards Rule require encryption of customer information in transit and at rest, plus a 30-day breach-notification duty. If the flow can touch card numbers, PCI DSS requires the account number be masked and rendered unreadable. That's why redaction in transcripts and logs isn't optional.
Human review of automated decisions. For EU customers, GDPR Article 22 gives a right "not to be subject to a decision based solely on automated processing" that has legal or significant effects, being denied a loan is the textbook example, plus a right to human intervention.
The EU AI Act's high-risk line. Under Annex III, AI used "to evaluate the creditworthiness of natural persons or establish their credit score" is classified high-risk, triggering human-oversight and logging obligations. Worth being precise here: a support bot answering "what's my balance" isn't automatically high-risk, that trigger is the credit-scoring use case. But the moment a flow influences a credit decision, it crosses the line.
The vendor bar. Banks buying a tool will expect a SOC 2 Type II report, which tests that controls actually operated over time, not just that they exist on paper.

The through-line across all of it is the same design pattern the customers were begging for: grounding, logging, and a human off-ramp. Three separate legal frameworks independently mandate the escalation path. If you're evaluating tools, our note on AI knowledge management for support teams covers keeping that approved-knowledge layer clean, which is where accuracy starts.

How to deploy one without enraging your customers

Put the customer voice and the compliance surface together and the playbook is clear. Here's what I'd do, in order.

Ground everything, then prove it before go-live. Restricting the agent to your approved help center and policy docs is what keeps it from inventing an answer, and it's the mitigation the CFPB implicitly asks for when it says generic bots are "ill-suited for tasks that require logic, specialized knowledge, or current data." I've watched this fail the hard way: a paying customer's bot fabricated a product claim and sent it to real customers because retrieval came back empty and the model filled the gap from training data. The fix isn't a smarter model, it's simulating the agent against thousands of real past tickets first, so you see where it would have hallucinated before a customer does.

eesel's reports dashboard, showing per-topic coverage and resolution analytics used to check an agent's behaviour before and after go-live

Gate on confidence and keep the human visible. Set a threshold: below it, the agent drafts for a human or hands off rather than replying live. Cap repeat attempts so you never build a doom loop. The single most-cited feature in banking-bot reviews is seamless escalation, and it's what the r/bunq customers were denied.

Keep the sensitive data where it belongs. When we onboard finance and healthcare teams, the hard gate is always data handling. One buyer needed assurance that ticket data with card numbers and passwords stayed in their environment; the answer is that the agent reasons over question type and response style, with custom retention and PII redaction, and no customer data is used to train models. Those are the questions your security review should be asking any vendor.

Start on tier-1, expand on evidence. The realistic scope, echoed by operators over and over, is tier-1 deflection: let the agent own the "what are the fees" and "how do I withdraw" questions that eat support time, and route the messy stuff to a person. Expand only when your first-contact resolution holds. If you're staffing up a team around this, our scaling guide for startups is a useful companion.

Try eesel for banking and fintech support

If you're a bank, a lender, or a fintech weighing this up, eesel is built for exactly the pattern above. It plugs into the helpdesk you already run, learns from your past tickets and help docs, and answers only from that approved knowledge, so it deflects the tier-1 lookups without wandering off-script. The part that matters most for a regulated team: you can simulate the agent against thousands of your real historical tickets before it replies to a single live customer, then turn on autonomy gradually with confidence-based routing and a clean human handoff.

It already runs at banking scale, our agent handles 100,000+ German-language tickets a month for a loan-comparison platform, with SOC 2 controls, GDPR and EU data residency, and PII redaction on the security side. Pricing is usage-based, about $0.40 per resolved ticket with no per-seat fees, so you're not paying for a platform you're still testing.

eesel AI helpdesk dashboard, showing an AI agent handling support tickets inside an existing helpdesk

You can try eesel free, or book a demo if you want to walk through the compliance and simulation setup with someone first.

Frequently Asked Questions

What is conversational AI for banking?

Conversational AI for banking is software that lets a customer interact with their bank in natural language, typing or speaking a request instead of navigating menus, and get an answer or complete a task. Modern versions use large language models grounded in the bank's own knowledge base, unlike a scripted rule-based chatbot. See our overview of the benefits of conversational AI for more.

Is conversational AI safe for banking customer service?

It is when it's built right: grounded in approved documents, gated by a confidence threshold, and wired to escalate to a human. The risk is hallucination, which is why preventing AI hallucinations and testing against past tickets matter so much before any bot handles conversational AI for banking traffic.

How much does conversational AI for banking cost?

The CFPB cites roughly $0.70 saved per customer interaction versus a human agent. On the vendor side, pricing ranges from per-conversation to flat platform fees. eesel's pricing is usage-based at about $0.40 per resolved ticket with no per-seat fees. More on the math in our guide to AI customer support cost savings.

What banking tasks should a chatbot handle versus a human?

Automate high-volume, low-risk lookups: balances, recent transactions, freezing a card, finding a routing number, requesting a statement. Route anything with a legal or financial consequence, such as a loan decision, a dispute, or a fraud claim, to a person. Getting the escalation boundary right is the whole game.

What compliance rules apply to conversational AI for banking?

In the US, the CFPB's chatbot guidance, GLBA and the FTC Safeguards Rule, and PCI DSS if card data is involved. In the EU, GDPR Article 22 (a right to human review of automated decisions) and the EU AI Act, which classifies credit-scoring AI as high-risk. Vendors are typically expected to hold a SOC 2 Type II report, which ties into wider AI knowledge management controls.

Can conversational AI for banking answer in multiple languages?

Yes. Wells Fargo's Fargo has been used over 160 million times by more than 3 million Spanish-speaking customers. Modern AI customer service for fintech tools handle dozens of languages out of the box, answering in the customer's language off multilingual ticket history.

How do I stop a banking chatbot from looping instead of helping?

The CFPB calls these "doom loops." You stop them by setting a confidence threshold, capping repeat attempts, and always exposing a visible path to a human. It's the same discipline behind a good tier-1 deflection setup: deflect what you can answer well, hand off everything else fast.

Bring safe conversational AI to your support queue

eesel grounds every answer in your own docs and simulates on past tickets before it ever replies live.

Book a demo Try for free

Share this article

Article by

Alicia Kirana Utomo

Kira is a writer at eesel AI with a Computer Science background and over a year of hands-on experience evaluating AI-powered customer service tools. She focuses on breaking down how helpdesk platforms and AI agents actually work so that support teams can make better buying decisions.