Design an AI Customer Support System
A hellointerview-style system design deep dive into AI customer support systems like Klarna AI (replaced 700 agents), Sierra AI, and Salesforce Agentforce. Covers requirements, core entities, the conversation pipeline, and three production deep dives: intent classification and routing, action execution and tool use, and human handoff and escalation. Each deep dive walks through naive, better, and production-grade approaches with trade-offs.
Quick Reference
- →Cascading classifier: keyword rules (less than 1ms) then fine-tuned model (200ms) then frontier model (1s) then human agent
- →Klarna's AI handled 2.3M conversations in month one, replacing 700 agents with equivalent CSAT scores
- →The agent can DO things like refunds and order changes — permission boundaries and confirmation flows are non-negotiable
- →Intent classification is the entry point to the entire system — get it wrong and everything downstream fails
- →Human handoff must transfer full conversation context, classified intent, and attempted actions so the customer never repeats themselves
- →Frustration detection (caps lock, profanity, repeated complaints, loop detection) should trigger immediate escalation
- →Policy enforcement must live in code, not in prompts — prompt injection can override instructions but cannot override Python
- →Target 85 percent automation rate for tier-1 inquiries with CSAT parity to human agents
Understanding the Problem
An AI customer support system is a conversational agent that handles tier-1 customer inquiries across chat, email, and voice channels. It understands what the customer wants, retrieves relevant knowledge from help centers and policies, executes real actions like processing refunds or updating orders, and seamlessly escalates to human agents when it cannot resolve the issue. This is not a simple FAQ chatbot. The system classifies intent across dozens of categories, composes multi-step action chains with permission boundaries, defends against prompt injection from untrusted customer input, and maintains conversation state across turns and channels. Products like Klarna AI Assistant, Sierra AI, Salesforce Agentforce, and Intercom Fin have made this a mainstream product category. From a system design perspective, this is a rich problem because it touches natural language understanding (classifying intent from ambiguous customer messages), tool orchestration (the agent executes real actions with financial consequences), safety (customer messages are untrusted input), and human-AI collaboration (knowing when to escalate is as important as knowing how to resolve). The trade-offs are sharp: classifying too aggressively sends customers down the wrong path, executing actions without proper safeguards causes financial losses, and escalating too late damages customer trust irreparably.
Klarna's AI Assistant handled 2.3 million conversations in its first month, performing the equivalent work of 700 full-time human agents. It resolved issues in 2 minutes versus 11 minutes for humans, with equivalent customer satisfaction scores. Salesforce Agentforce reports 85 percent automation of tier-1 inquiries across enterprise customers. Sierra AI powers customer support for major consumer brands including WeightWatchers and SiriusXM. The key insight from all deployments: the economics become overwhelming once automation rate exceeds 70 percent, because each additional percentage point of automation eliminates thousands of human agent hours per month.
This is fundamentally about building a system that can understand customer intent, execute actions safely, and know its own limits. The three hardest sub-problems are: (1) classifying intent accurately enough that downstream handlers work correctly, (2) executing real-world actions like refunds with proper permission boundaries and rollback capabilities, and (3) detecting when the AI is failing and escalating to a human before the customer becomes frustrated.