Design a Hallucination-Free Banking Chatbot

A hellointerview-style system design deep dive into hallucination-free banking chatbots like JP Morgan's LLM Suite and Bank of America's Erica. Covers requirements, core entities, the no-hallucination architecture, and three production deep dives: hallucination prevention via LLM-as-router, audit trail and regulatory compliance, and graceful degradation with confidence-based fallback. Each deep dive walks through naive, better, and production-grade approaches with trade-offs.

Quick Reference

→The LLM NEVER generates financial numbers — balances, rates, and transactions all come from structured banking APIs
→Architecture: LLM understands intent, calls structured API, formats the response. The LLM is a router, not a data source
→Template-based responses with API-filled slots for factual data — free-form generation is only for conversational framing
→Every response has a full provenance chain: customer question to intent classification to API call to response template to final answer
→Confidence thresholds at every stage — refusing to answer when uncertain is the correct behavior in banking
→One hallucinated interest rate or account balance is a regulatory violation and a potential lawsuit
→Full audit trail with 7-year retention is a regulatory requirement under SOX and banking regulations
→Graceful degradation to human agents is the only acceptable failure mode — never guess a financial figure

Understanding the Problem

A banking chatbot handles account inquiries, transaction history, product information, branch lookups, and simple transactions on behalf of bank customers. The absolute requirement that defines every architectural decision: the chatbot must NEVER hallucinate financial data. One wrong balance, one incorrect interest rate, one fabricated transaction is a regulatory violation and a potential lawsuit. This is not a quality issue — it is a legal liability. The fundamental architectural insight is that the LLM must be a router and formatter, never the source of truth. All factual data — account balances, interest rates, transaction amounts, product terms, fee schedules — comes from structured banking APIs. The LLM understands what the customer wants and formats the API response into natural language. It never generates a financial number from its weights. Products like Bank of America Erica (serving 42 million users), JP Morgan's internal LLM tools, and various fintech chatbots have proven the model works at scale. From a system design perspective, this is a rich problem because it touches zero-tolerance accuracy (hallucination is not a quality tradeoff — it is a hard constraint), regulatory compliance (every response must be auditable for 7 years), authentication and authorization (the chatbot accesses sensitive financial data), and graceful degradation (when uncertain, the only acceptable behavior is to say 'I do not know' and escalate).

Real project

Bank of America's Erica has served over 42 million users with 2 billion interactions since launch. JP Morgan has deployed internal LLM tools across its investment banking and asset management divisions with strict hallucination guardrails. The key insight from production deployments: the LLM-as-router architecture (where the model only selects APIs and formats responses, never generates factual data) achieves near-zero hallucination rates because there is no path through the system where a financial number originates from the model's weights.

The Core Framing

This is fundamentally about building a system where the LLM cannot hallucinate financial data by architectural design, not by prompt engineering. The three hardest sub-problems are: (1) ensuring every factual claim in a response traces back to a specific API response with zero exceptions, (2) maintaining a complete audit trail that satisfies banking regulators for 7 years, and (3) degrading gracefully when the system is uncertain rather than guessing at financial figures.

Requirements

Before designing the system, we need to establish what it must do and how well it must do it. In a real interview, you would drive this conversation with the interviewer to narrow the scope. For a banking chatbot, the requirements split into functional capabilities and non-functional constraints, with an unusually heavy emphasis on accuracy and compliance.

Core Entities

The data model for a banking chatbot captures the lifecycle of a customer interaction with special emphasis on provenance and auditability. Every entity includes fields that support the audit trail — every response must be traceable back to the specific API call and data source that produced it. Getting this model right matters because regulators will ask how any specific response was generated, and you must be able to produce the complete chain.

Sign in to read this article

This is a premium article. Sign in with your Google account to continue.