Faysal Ahmed
Chapter 9

Case Studies

Real-world examples of human–AI interaction design across scheduling, customer support, content moderation, and personal finance.

Case Study 1: Meeting Scheduling Agent

Context: An AI agent that coordinates meeting times across participants’ calendars, sends invitations, and handles rescheduling.

This is the most common agent use case in enterprise settings. It illustrates nearly every design principle in this book — scope definition, progressive autonomy, graceful degradation, and feedback loops.

Design Decisions

  • Autonomy level: Confirm-by-exception (Level 2). The agent proposes times; if all participants are free and the meeting is within policy, it books automatically. Conflicts or policy violations escalate to the user.
  • Scope: Read free/busy for all participants. Create and modify events on the organiser’s calendar only.
  • Failure mode handling: If the agent cannot find a common time within the next two weeks, it escalates with a summary of attempted ranges and each participant’s availability density.

Observed Outcomes

MetricPre-agentWith agent
Time to schedule4.2 minutes0.8 minutes
Rescheduling requests23% of meetings11% of meetings
User corrections7% of bookings
Escalation rate12%

The 7% correction rate primarily involved location and agenda details, not timing. Users reported higher satisfaction when the agent handled timing and they focused on content.

Lesson

Progressive autonomy worked well here. The agent started in suggestion mode for two weeks, then graduated to confirm-by-exception. Users who went through the onboarding phase had higher trust scores and lower correction rates.

Case Study 2: Customer Support Triage Agent

Context: An AI agent that categorises incoming support tickets, suggests responses, and auto-replies to routine inquiries.

Design Decisions

  • Autonomy level: Full autonomy for Level 1 tickets (password resets, account status); confirm-by-exception for Level 2; suggestion only for Level 3 (billing disputes, cancellations).
  • Transparency: Every auto-reply includes the tag “AI-generated” and a one-click path to speak with a human.
  • Failure mode: If confidence is below 85%, the ticket is routed to human support with the agent’s suggested category and draft response attached.

Observed Outcomes

MetricValue
Tickets auto-resolved34%
Human escalation rate18%
User satisfaction (auto-resolved)4.2 / 5.0
Average handle time reduction2.3 minutes per ticket

Lesson

The confidence threshold was critical. Initially set at 90%, it resolved only 22% of tickets. Lowering it to 85% increased auto-resolution to 34% with no measurable drop in satisfaction. The agent’s draft responses also reduced human agent handle time even when escalation occurred.

Case Study 3: Content Moderation Assistant

Context: An AI assistant that flags potentially harmful user-generated content for human moderators, with suggested action labels.

Design Decisions

  • Autonomy level: Suggestion only (Level 1). The assistant never takes action — it surfaces content with a severity score, violation category, and suggested action. Human moderators make the final decision.
  • Feedback loop: Moderators can mark the assistant’s suggestion as correct, incorrect, or uncertain. Correct labels reinforce; incorrect labels trigger a review by the ML team.
  • Fairness monitoring: Accuracy metrics are tracked per demographic group. Disparities trigger an automated alert to the governance team.

Observed Outcomes

MetricWithout assistantWith assistant
Content reviewed per hour45 items128 items
Moderator agreement rate89%
False positive rate4.1%3.4%
False negative rate6.7%4.2%

Lesson

The suggestion-only pattern preserved moderator autonomy while increasing throughput nearly 3×. The key design choice was making disagreement easy — a single click to override — which kept moderators engaged rather than fatigued.

Case Study 4: Personal Finance Agent

Context: An agent that monitors spending, categorises transactions, alerts users to unusual activity, and initiates savings transfers.

Design Decisions

  • Autonomy level: Execute with oversight (Level 3) for routine savings transfers within limits; confirm-by-exception for unusual transaction alerts; suggestion only for budget advice.
  • Privacy: All transaction data processed locally where possible. Aggregated patterns used for model improvement with explicit opt-in.
  • Safety net: Daily spending limit of $500 for automated actions. Any transfer over that amount requires user confirmation.

Observed Outcomes

MetricValue
Savings transfers initiated89% auto-approved
Fraud alerts caught94%
False fraud alerts12% (compared to 28% industry average)
User retention at 6 months71%

Lesson

The explicit spending limit and daily notification summary were the most-valued features in user surveys. Users wanted to know what the agent did without being interrupted for every action. The daily digest pattern — a single notification instead of 20 — was the highest-impact interaction design decision.

Key Takeaways

  • Match autonomy level to task complexity and risk — one agent may operate at multiple levels for different sub-tasks.
  • Progressive onboarding (shadow → limited → full) builds trust and reduces correction rates.
  • Confidence thresholds are powerful levers — tune them based on outcome data, not intuition.
  • The suggestion-only pattern (no autonomous action) is effective for high-stakes decisions where human judgement is primary.
  • Batch feedback (daily digests) reduces interruption while maintaining transparency.

Next: Chapter 10 — Future Directions