Conversational AI for banking: what works and what breaks
Alicia Kirana Utomo
Katelin Teen
Last edited July 4, 2026

What conversational AI actually means in banking
Strip away the marketing and conversational AI is just this: a customer types or says what they want the way they'd say it to a teller, and the software understands the intent and either answers or does the task. No phone tree, no "press 2 for balances."
The CFPB lays it out on a sophistication ladder, and the rungs matter:
- Rule-based chatbots run on "decision tree logic or a database of keywords," so the user is "limited to predefined possible inputs." Think a menu of buttons. If you've read our breakdown of an AI agent vs a rule-based chatbot, this is the bottom rung.
- NLU chatbots use natural language understanding to recognise the intent behind free text, not just keywords. NatWest describes its Cora assistant as handling queries "through natural language processing and machine learning."
- LLM-backed agents are the new top rung. The CFPB notes banks "moving from simple, rule-based chatbots towards more sophisticated technologies such as large language models."
The rung that makes LLMs safe for banking is retrieval-augmented generation, or grounding: the model answers from the bank's own knowledge base instead of its training weights. DBS describes DBS Joy as integrating "large language models with the bank's proprietary knowledge base," letting it "move beyond pre-programmed static answers to dynamic responses." Wells Fargo goes further and architects its assistant so no personal data reaches the LLM at all.
Here's the shape of a well-built flow, and why the confidence check in the middle is non-negotiable:

The CFPB is blunt about the ceiling, too: chatbots "may be useful for resolving basic inquiries, but their effectiveness wanes as problems become more complex." That's not a knock on the tech, it's the design brief. The tools people actually like are the ones that know their own limits and reach for a person. Our guide to AI in customer service covers that boundary in general terms; banking just raises the stakes.
Where it's already working: the big bank deployments
The proof isn't in a vendor deck, it's in the banks' own numbers. Here are the flagship deployments, all sourced from primary newsrooms:
| Bank | Assistant | Scale (from the bank) | Standout use case | Built or bought |
|---|---|---|---|---|
| Bank of America | Erica | 3B+ interactions, ~50M users, 58M/month | Balance-trend alerts, investment guidance | In-house |
| Wells Fargo | Fargo | 1B+ interactions in under 3 years | Zelle payments, spending insights | Google Cloud LLMs |
| Capital One | Eno | SMS-first since March 2017 | Fraud alerts, virtual card numbers | In-house |
| NatWest | Cora / Cora+ | 10.8M queries in 2023 | Mortgage guidance, summarised handoff | Built with IBM |
| DBS | DBS Joy | 120k+ chats, +23% CSAT | Corporate/SME servicing | In-house |
A few things jump out. First, the use cases cluster around high-volume, low-risk lookups: Erica flags balance trends "in the next 7 days," while Eno proactively alerts on "a double charge, an abnormally large tip amount, or potential fraud" and generates merchant-specific virtual card numbers. Second, the smart ones treat the AI as a front door, not a wall. NatWest's Cora+ hands off with a summary so "the human agent can quickly understand what support the customer needs."
Third, multilingual reach is a genuine unlock, not a footnote. More than 3 million Spanish-speaking Wells Fargo customers have used Fargo over 160 million times. That's the kind of coverage that's brutally expensive to staff with humans and cheap to add with an agent trained on multilingual history.
The economics explain the rush. Juniper Research projected banking chatbots would save $7.3 billion globally by 2023, up from $209 million in 2019, equal to 862 million hours of work, with mobile apps carrying 79% of interactions. The CFPB puts the unit figure at $0.70 saved per interaction. If you want to sanity-check those numbers for your own team, our piece on AI vs human customer support walks through the comparison.
What customers actually think
Here's the part vendor pages skip. If you read where real banking customers talk, the frustration voice is louder, sharper, and more specific than the praise, and it's worth listening to, because it tells you exactly what to avoid.
The angriest complaint is the loop that won't escalate, even when money is on fire:
"I've had fraud happening on my card this week and I've never had such an excruciating experience with a bank... I had to threaten to reach out to KiFid [the Dutch financial ombudsman] for them to allow me to speak to a human. Also the AI will occasionally pretend to be a person too. It's all horrible."
That's the exact "doom loop" the CFPB warned about, playing out in a fraud case. The follow-on theme is just as consistent: people want a visible human, and trust falls off a cliff the moment it's not simple. As one fintech operator put it watching their own customers:
"i've seen customers be fine with bots for simple stuff but get wary as soon as money or disputes are involved."
There's even a distinctly banking flavour of complaint: the bot as a downgrade of a feature people already had. A Bank of America customer's rant about being pushed to "ask Erica" instead of just filtering their own statements is a useful reminder that conversational AI is not automatically an upgrade over a good search box.
None of this says "don't." It says the bar is trust, and the failure mode is specific and avoidable. The practitioners who've shipped these agree on the fix. From the same r/fintech thread:
"The key though is avoiding generic bots and keeping it rules-based, built for a specific domain/process/problem (especially in regulated areas like disputes), integrating with back-office data, and making handover to humans seamless."
That's the recipe, in a customer's own words. Getting the handoff right is what separates the deployments people tolerate from the ones they rage-quit.
What to automate, and what to hand to a human
So where's the line? After watching a lot of these go live, my rule is simple: automate the lookup, escalate the decision. If answering the question could move money, deny a product, or invoke a legal right, a human owns it. Everything else is fair game for the agent.

The left column is where the volume and the savings live, and it's exactly the tier-1 work that eats a support team's day. It maps cleanly to what a good AI helpdesk agent already does well: deflect the repetitive stuff, keep the answers consistent, log everything. The right column is where a wrong answer becomes a regulatory problem, so those flows should capture intent, then route, never guess.
The mistake I see most is teams trying to push the line rightward too fast, letting the bot attempt disputes or loan questions because a demo made it look capable. That's how you end up in the r/bunq threads above. Start narrow, prove it on the left column, and expand only when your deflection rate and your escalation quality both hold up.
The compliance surface banking adds
This is what makes banking different from generic support. A wrong answer here isn't an annoyed customer, it's a regulated institution potentially breaking federal law. The CFPB said it plainly: a "poorly deployed chatbot can lead to customer frustration, reduced trust, and even violations of the law."
So a banking bot carries a compliance surface a generic support bot doesn't:

Walking the stack:
- PII and card-data handling. The Gramm-Leach-Bliley Act and the FTC Safeguards Rule require encryption of customer information in transit and at rest, plus a 30-day breach-notification duty. If the flow can touch card numbers, PCI DSS requires the account number be masked and rendered unreadable. That's why redaction in transcripts and logs isn't optional.
- Human review of automated decisions. For EU customers, GDPR Article 22 gives a right "not to be subject to a decision based solely on automated processing" that has legal or significant effects, being denied a loan is the textbook example, plus a right to human intervention.
- The EU AI Act's high-risk line. Under Annex III, AI used "to evaluate the creditworthiness of natural persons or establish their credit score" is classified high-risk, triggering human-oversight and logging obligations. Worth being precise here: a support bot answering "what's my balance" isn't automatically high-risk, that trigger is the credit-scoring use case. But the moment a flow influences a credit decision, it crosses the line.
- The vendor bar. Banks buying a tool will expect a SOC 2 Type II report, which tests that controls actually operated over time, not just that they exist on paper.
The through-line across all of it is the same design pattern the customers were begging for: grounding, logging, and a human off-ramp. Three separate legal frameworks independently mandate the escalation path. If you're evaluating tools, our note on AI knowledge management for support teams covers keeping that approved-knowledge layer clean, which is where accuracy starts.
How to deploy one without enraging your customers
Put the customer voice and the compliance surface together and the playbook is clear. Here's what I'd do, in order.
Ground everything, then prove it before go-live. Restricting the agent to your approved help center and policy docs is what keeps it from inventing an answer, and it's the mitigation the CFPB implicitly asks for when it says generic bots are "ill-suited for tasks that require logic, specialized knowledge, or current data." I've watched this fail the hard way: a paying customer's bot fabricated a product claim and sent it to real customers because retrieval came back empty and the model filled the gap from training data. The fix isn't a smarter model, it's simulating the agent against thousands of real past tickets first, so you see where it would have hallucinated before a customer does.

Gate on confidence and keep the human visible. Set a threshold: below it, the agent drafts for a human or hands off rather than replying live. Cap repeat attempts so you never build a doom loop. The single most-cited feature in banking-bot reviews is seamless escalation, and it's what the r/bunq customers were denied.
Keep the sensitive data where it belongs. When we onboard finance and healthcare teams, the hard gate is always data handling. One buyer needed assurance that ticket data with card numbers and passwords stayed in their environment; the answer is that the agent reasons over question type and response style, with custom retention and PII redaction, and no customer data is used to train models. Those are the questions your security review should be asking any vendor.
Start on tier-1, expand on evidence. The realistic scope, echoed by operators over and over, is tier-1 deflection: let the agent own the "what are the fees" and "how do I withdraw" questions that eat support time, and route the messy stuff to a person. Expand only when your first-contact resolution holds. If you're staffing up a team around this, our scaling guide for startups is a useful companion.
Try eesel for banking and fintech support
If you're a bank, a lender, or a fintech weighing this up, eesel is built for exactly the pattern above. It plugs into the helpdesk you already run, learns from your past tickets and help docs, and answers only from that approved knowledge, so it deflects the tier-1 lookups without wandering off-script. The part that matters most for a regulated team: you can simulate the agent against thousands of your real historical tickets before it replies to a single live customer, then turn on autonomy gradually with confidence-based routing and a clean human handoff.
It already runs at banking scale, our agent handles 100,000+ German-language tickets a month for a loan-comparison platform, with SOC 2 controls, GDPR and EU data residency, and PII redaction on the security side. Pricing is usage-based, about $0.40 per resolved ticket with no per-seat fees, so you're not paying for a platform you're still testing.

You can try eesel free, or book a demo if you want to walk through the compliance and simulation setup with someone first.
Frequently Asked Questions
What is conversational AI for banking?
Is conversational AI safe for banking customer service?
How much does conversational AI for banking cost?
What banking tasks should a chatbot handle versus a human?
What compliance rules apply to conversational AI for banking?
Can conversational AI for banking answer in multiple languages?
How do I stop a banking chatbot from looping instead of helping?

Article by
Alicia Kirana Utomo
Kira is a writer at eesel AI with a Computer Science background and over a year of hands-on experience evaluating AI-powered customer service tools. She focuses on breaking down how helpdesk platforms and AI agents actually work so that support teams can make better buying decisions.








