Skip to content

Using Analyze

A practical guide to surfacing subagent behavior, tool usage, and improvement opportunities with the /agent-flow:analyze command.

Overview

The analyze command parses Claude Code session transcripts — either offline from stored JSONL files or live via hook-captured events — and loads them into a local SQLite database. From there, you can generate reports, run ad-hoc SQL queries, label subagent outputs for recall evaluation, and export data to external tools. All data stays on your machine; no network calls are made.

The command exists because raw transcripts are hard to query. A structured store lets you ask "which tools does Loid actually use?" or "how often does the orchestrator block a subagent?" across many sessions at once.

Quick Start

# Slash command — load the current session and open an interactive report
/agent-flow:analyze

# CLI: load all past sessions, then generate a report
bash scripts/analyze.sh load --all-sessions && bash scripts/analyze.sh report

# CLI: report for a single session
bash scripts/analyze.sh report --session <session_id>

The report is written to .claude/observability/report.md and also printed to stdout.

What You Will Learn from a Report

A report surfaces:

  • Tool usage by agent — which tools each subagent actually calls, and how often
  • MCP / skill invocations — frequency of mcp__* and Skill tool calls per agent
  • Iteration rate — how many times each subagent type is dispatched per session on average
  • Rejection rate — how often a subagent's output is blocked or denied by a hook
  • Thinking effort — total and average thinking-block characters per agent
  • Token use — input, output, cache-read, and cache-creation tokens per agent and model
  • Improvement opportunities — heuristic findings with explanations, for example:
  • High iteration rate on a specific agent (possible planning or scope issue)
  • Tool calls that fall outside the expected allowlist for an agent
  • Missing MCP usage in agents expected to use the knowledge graph
  • Model mismatches (Opus used where Sonnet was intended, or vice versa)
  • Orchestrator IO volume (flags orchestrator doing persona-owned work such as direct Read/Grep calls that should be delegated to Riko)
  • MCP-skipping per task type (architectural tasks that made zero graph calls when graphify was available)
  • Fan-out whitelist (suppresses known intentional fan-out patterns like Semantic extract chunk * to reduce false-positive heuristic noise)
  • Regression guards: decision column 100% NULL despite PreToolUse events; iterations table empty despite ≥10 dispatches

Subcommands

load

Parses transcript files and writes events into the SQLite store.

# Load every session found in the Claude Code data directory
bash scripts/analyze.sh load --all-sessions

# Load a single session by ID
bash scripts/analyze.sh load --session <session_id>

# Print summary statistics after loading without generating a full report
bash scripts/analyze.sh load --all-sessions --stats

# Redact sensitive values before writing (enabled by default for live-hook events)
bash scripts/analyze.sh load --all-sessions --redact

The loader is idempotent — re-running it updates existing rows rather than duplicating them.

report

Generates a Markdown report from whatever is already in the database.

# Report across all loaded sessions
bash scripts/analyze.sh report

# Report for one session only
bash scripts/analyze.sh report --session <session_id>

The report respects the NO_COLOR environment variable: set it to any non-empty value to suppress ANSI colour codes in terminal output.

Output is written to .claude/observability/report.md (all sessions) or .claude/observability/<session_id>.md (single session).

sessions

Lists all sessions currently in the database with their start/end timestamps, branch, and event count.

bash scripts/analyze.sh sessions

Use this to find session IDs before running report --session or label.

sql

Runs an arbitrary SQL query against the database and prints results as a table.

bash scripts/analyze.sh sql "SELECT agent_type, tool_name, COUNT(*) n FROM events GROUP BY agent_type, tool_name ORDER BY n DESC LIMIT 20"

Note

By convention sql is read-only — there is no enforcement preventing writes, but mutating the store manually can corrupt report data.

Pre-built views are available (see Observability Schema) so you rarely need to write raw joins.

retention

Prunes old events and sessions from the database.

# Delete sessions older than 30 days
bash scripts/analyze.sh retention --days 30

# Delete all sessions from the database
bash scripts/analyze.sh retention --all

Warning

retention --all is irreversible. Export data first if you need it.

label

Interactive stdin labeling for evaluating subagent recall quality (M5 milestone). You step through subagent outputs one at a time and assign a verdict.

bash scripts/analyze.sh label <session_id>

Keys during the interactive session:

Key Verdict
c correct
m missed
e extra
w wrong
s skip
q quit and save progress

The session is resumable — quitting and re-running picks up where you left off.

label export

Exports labels to CSV for offline analysis.

# Export labels for one session
bash scripts/analyze.sh label export <session_id>

# Export all labels
bash scripts/analyze.sh label export --all

The CSV includes label_id, session_id, agent_type, verdict, note, and ts columns. Two derived metrics are computed per agent type:

  • precision = correct / (correct + extra + wrong)
  • recall_proxy = correct / (correct + missed)

export

Exports events to an external sink. Behaviour is driven by .claude/observability.json.

bash scripts/analyze.sh export

Supported exporters:

  • jsonl (default, no extra dependencies) — writes to .claude/observability/export.jsonl
  • mlflow (opt-in) — requires mlflow installed; guarded with an ImportError so absence of the package does not break the default path

Configure exporters in .claude/observability.json:

{
  "exporters": [
    { "type": "jsonl" },
    { "type": "mlflow", "tracking_uri": "http://localhost:5000", "experiment": "agent-flow" }
  ]
}

Privacy

The loader applies redaction before writing to the database. Patterns covered:

  • AWS access key IDs and secret keys
  • Anthropic API keys (sk-ant- prefix)
  • OpenAI API keys (sk- prefix)
  • GitHub personal access tokens (classic ghp_ and fine-grained github_pat_)
  • Slack bot/user tokens (xoxb-, xoxp-)
  • PEM private key blocks

Redacted values are replaced with a placeholder. To add your own patterns, edit scripts/analyze/redact.py — each pattern is a compiled regex with a named group secret.

Data Locations

Path Description
.claude/observability/events.db Primary SQLite WAL store (append-only by convention)
.claude/observability/events.jsonl Fallback sink when the DB is locked (~30 ms p95 hook latency)
.claude/observability/report.md Latest all-sessions report
.claude/observability/<session_id>.md Per-session report
.claude/observability/export.jsonl JSONL exporter output
.claude/observability/labels-export.csv CSV from label export
.claude/observability.json Exporter configuration

All paths under .claude/observability/ are gitignored.

Turning Off Live Hooks

The live hook sink captures events automatically during every session. To stop collection:

  • Disable hooks: Comment out the four observability entries in hooks/hooks.json (PreToolUse:Agent|Task, the matcherless PostToolUse, SubagentStop, SessionEnd).
  • Remove the database: Delete .claude/observability/events.db; the hooks will recreate it on the next session unless disabled.

The offline transcript parser (load subcommand) works independently of live hooks — you can still load historical JSONL transcripts with hooks disabled.