←Back/Engineering/ChatGPT/How to Systematically Analyze and Debug Errors in AI Products

IntermediateEngineeringChatGPT

How to Systematically Analyze and Debug Errors in AI Products

A structured, data-driven workflow for improving AI product quality by logging real user traces, manually analyzing errors, creating targeted evaluations (evals), and iterating on prompts or models.

From How I AI

How I AI: Hamel Husain's Guide to Debugging AI Products & Writing Evals

with Claire Vo

How to Systematically Analyze and Debug Errors in AI Products

Tools Used

ChatGPT

OpenAI conversational AI

02Step-by-Step Guide

Log and Examine Real User Traces

Collect and review 'traces'—full, multi-turn conversations your AI has with real users. This reveals how people actually use your product, including typos and ambiguous questions, which is crucial for understanding real-world performance. Use platforms like Braintrust or Arize for this.

Perform Manual Error Analysis

Manually review a sample of traces (e.g., 100) and write a one-sentence note for the *very first error* you find in each conversation. This 'open coding' process helps identify the root cause of failures quickly.

Prompt:

Example error note: "Should have asked follow up questions about the question, what's up with four month rent? Because it's unclear user intent."

Pro Tip: Focusing on the most upstream error is a powerful heuristic. Fixing early intent clarification or tool call issues often resolves many downstream problems.

Create a Custom Annotation System

To speed up manual review, build a simple internal app or custom view in your observability platform. The goal is to make it easy for annotators (like product managers) to quickly categorize and label issues.

Categorize and Prioritize Errors

Group your manual error notes into common themes or categories, either manually or with help from an LLM like ChatGPT. Count the frequency of each category to create a clear, data-driven, and prioritized list of what to fix first.

Write Targeted Evaluations (Evals)

Based on your prioritized error categories, write specific evals. Use code-based evals for objective checks (like data leaks) and LLM judges for subjective issues. Crucially, validate your LLM judges against human-labeled data to ensure they are accurate and produce simple binary (pass/fail) outcomes.

Iterate and Improve

With reliable evals in place, you can now confidently make changes. Use prompt engineering, fine-tuning on difficult examples, or improving your RAG system to address the identified errors and measure the impact with your evals.

03Related Workflows

beginnerPersonalOpenClaw

Create an AI-Powered Inventory of Your Physical Items

Use an AI agent to automatically catalog physical items like toys and books from photos, creating a searchable inventory in a note-taking app. The agent can then connect these physical items to digital plans, like suggesting relevant toys for a lesson.

Feb 25, 2026View workflow

advancedEngineeringOpenClaw

Build a Custom 'Slop-Free' Kids' TV App Without Coding Experience

Partner with an AI coding agent to build and deploy a custom TV application from scratch, even with zero prior coding knowledge. This workflow allows you to create a safe, curated content environment for your family by pulling from approved sources.

Feb 25, 2026View workflow

intermediatePersonalGeminiOpenClaw

Automate Homeschool Lesson Planning and Material Creation with an AI Agent

Use an AI agent to digitize curriculum books from photos, automatically generate structured lesson plans, and create custom learning materials like watercolor illustrations. This saves hours of manual data entry and creative work.

Feb 25, 2026View workflow

Start shipping
better products.

Join 100,000+ product managers who use ChatPRD to write better docs, align teams faster, and build products users love.

Start building free Request a demo

Free to start

No credit card

SOC 2 certified

Enterprise ready

Tools Used

Log and Examine Real User Traces

Perform Manual Error Analysis

Create a Custom Annotation System

Categorize and Prioritize Errors

Write Targeted Evaluations (Evals)

Iterate and Improve

Create an AI-Powered Inventory of Your Physical Items

Build a Custom 'Slop-Free' Kids' TV App Without Coding Experience

Automate Homeschool Lesson Planning and Material Creation with an AI Agent

Start shippingbetter products.

Start shipping
better products.