Design Decisions¶

Architectural Decision Records (ADRs) documenting the key design choices in Agent Flow.

ADR-001: Multi-Agent Architecture¶

Context¶

When building an AI-assisted development system, we need to decide between: 1. A single general-purpose agent handling all tasks 2. Multiple specialized agents with focused responsibilities

Decision¶

Use multiple specialized agents with distinct roles, tools, and behavioral guidelines.

Rationale¶

Separation of Concerns: Each agent has a clear responsibility
Exploration is separate from implementation
Implementation is separate from verification
This prevents shortcuts and ensures thoroughness
Tool Restriction: Agents only have tools they need
Explorers can't modify files (prevents accidental changes)
Implementers can't skip verification (no temptation)
Verifiers can't fix issues (forces handback)
Model Optimization: Different tasks need different capabilities
Strategic planning benefits from deeper reasoning (Opus)
Execution benefits from speed (Sonnet)
Cost is optimized by matching model to task
Independent Verification: Self-verification is unreliable
Agents reviewing their own work have inherent bias
Separate verifiers provide objective assessment

Consequences¶

Positive: Clear responsibilities, better verification, cost optimization
Negative: More complexity, context passing overhead

Alternatives Considered¶

Single Agent with Modes: One agent switching between modes
Rejected: No clear boundaries, self-verification issues
Parallel Agents with Consensus: Multiple agents vote on decisions
Rejected: Overhead without clear benefit for development tasks

ADR-002: Verification-First Philosophy¶

Context¶

LLMs can generate confident claims about task completion without actual verification. We need to determine how to handle this.

Decision¶

Require evidence, not claims. Every significant assertion must be backed by actual command output.

Rationale¶

LLMs Hallucinate Completion: Models report success based on patterns, not memories
"Tests pass" is easy to generate without running tests
Confidence doesn't correlate with accuracy
Evidence is Verifiable: Command output can be checked
Test results show actual pass/fail
Type errors are explicit
Build failures are undeniable
Prevention Over Detection: Better to require evidence than detect lies
Evidence requirements prevent false claims
Detection is unreliable with capable models

Consequences¶

Positive: Reliable completion status, caught errors, user confidence
Negative: Slower workflows, more verbose output

Alternatives Considered¶

Trust Agent Claims: Accept completion statements at face value
Rejected: High false positive rate, bugs ship
Sample Verification: Randomly verify some claims
Rejected: Inconsistent quality, some bugs slip through

ADR-003: Model Tier Strategy¶

Context¶

Different AI models have different capabilities and costs. We need to decide how to allocate models to tasks.

Decision¶

Use two model tiers: - Opus for strategic/planning tasks (Riko, Senku) - Sonnet for execution/verification tasks (Loid, Lawliet, Alphonse)

Rationale¶

Opus Strengths: Deep reasoning, complex analysis
Valuable for unfamiliar codebase exploration
Important for multi-step planning
Worth the cost for strategic decisions
Sonnet Strengths: Speed, clear task execution
Sufficient for well-defined implementation
Fast iterations for review cycles
Cost-effective for command execution
Cost Optimization: Not all tasks need maximum capability
Running tests doesn't require deep reasoning
Implementing a clear plan is straightforward
Save expensive model for high-value decisions

Consequences¶

Positive: Cost efficiency, appropriate capability matching
Negative: Potential capability gaps, model switching overhead

Alternatives Considered¶

All Opus: Use best model everywhere
Rejected: Excessive cost for simple tasks
All Sonnet: Use fast model everywhere
Rejected: Insufficient for complex exploration/planning
Dynamic Selection: Choose model per-task
Rejected: Complexity without clear benefit over role-based

ADR-004: Hook-Based Lifecycle¶

Context¶

We need to inject behavior at specific points in the Claude Code lifecycle for validation and guidance.

Decision¶

Use hooks at lifecycle events: - UserPromptSubmit: Prompt refinement - PreToolUse: Operation validation - PostToolUse: Result verification - SessionStart: Context loading - Stop: Completion gates

Rationale¶

Non-Invasive: Hooks augment without modifying core behavior
Claude Code remains unchanged
Plugin adds capabilities through hooks
Targeted Intervention: Each hook has a specific purpose
Validation happens before operations
Verification happens after operations
Gates happen before completion
Extensible: New behaviors can be added via hooks
No core changes needed
Multiple hooks can run at same point

Consequences¶

Positive: Clean integration, extensible, targeted
Negative: Limited to supported hook points, execution overhead

Alternatives Considered¶

Core Modification: Modify Claude Code directly
Rejected: Not maintainable, version coupling
Wrapper Approach: Intercept all operations
Rejected: Too invasive, performance impact

ADR-005: State File Design¶

Context¶

Workflows span multiple agent interactions and need persistent state tracking.

Decision¶

Use YAML-frontmatter Markdown files in .claude/ directory: - Human readable - Machine parseable - Git-ignorable - Session-scoped

Rationale¶

Dual-Purpose Format: YAML for structured data, Markdown for logs
Frontmatter holds machine-parseable state
Body holds human-readable history
Local Storage: Files in .claude/ directory
No external dependencies
Easy to inspect and debug
Can be safely deleted
Session Scope: Files are ephemeral
Not committed to git
Fresh state each session
No stale state issues

Consequences¶

Positive: Simple, debuggable, no dependencies
Negative: No persistence across sessions (by design), file I/O overhead

Alternatives Considered¶

In-Memory State: Keep state in conversation context
Rejected: Lost on context overflow, hard to debug
Database Storage: Use SQLite or similar
Rejected: Overkill for session-scoped state

ADR-006: Skill System Architecture¶

Context¶

Domain expertise needs to be encoded and shared across agents.

Decision¶

Create skill modules with: - Owner agent (maintains the skill) - Consumer agents (reference the skill) - Reference materials (detailed documentation) - Examples (worked scenarios)

Rationale¶

Ownership Model: Clear responsibility for each skill
One agent owns each skill
Consumers reference but don't modify
Prevents conflicting guidance
Documentation as Code: Skills are markdown files
Version controlled
Easy to update
Human readable
Hierarchical Structure: SKILL.md + references + examples
Quick reference in main file
Deep dive in references
Practical guidance in examples

Consequences¶

Positive: Clear ownership, extensible, documented
Negative: Potential inconsistency, maintenance burden

Alternatives Considered¶

Inline Prompts: Embed all guidance in agent prompts
Rejected: Duplication, hard to maintain
Shared Knowledge Base: Single document for all agents
Rejected: No ownership, conflicting guidance

ADR-007: Tool Access Control¶

Context¶

Agents need different capabilities for their roles. Unrestricted access enables shortcuts.

Decision¶

Restrict tools per agent role: - Only Loid can Write/Edit - Only Riko can WebSearch/WebFetch - Only Senku can TodoWrite

Rationale¶

Prevents Shortcuts: Agents can't do others' jobs
Explorer can't "just fix it" while exploring
Verifier can't modify code to make tests pass
Planner can't skip to implementation
Clear Boundaries: Role is enforced by capability
Not just guidance but actual restriction
Agents don't need to resist temptation
Audit Trail: Actions map to responsible agents
File modifications came from Loid
Web research came from Riko
Plans came from Senku

Consequences¶

Positive: Clear boundaries, enforced specialization, accountability
Negative: Requires handoffs, potential delays

Alternatives Considered¶

All Tools for All Agents: Trust agents to use appropriately
Rejected: Temptation too strong, boundaries blur
Request-Based Access: Agents request tools as needed
Rejected: Overhead without clear benefit

ADR-008: Iteration and Failure Handling¶

Context¶

Verification may fail, requiring iteration. We need to handle this gracefully.

Decision¶

Implement iteration loops with maximum bounds: - Failed verification returns to implementation - Iteration counter tracks attempts - Maximum iterations prevent infinite loops - State tracks iteration history

Rationale¶

Reality of Development: Not all implementations pass first try
Tests may reveal bugs
Review may find issues
Iteration is normal
Bounded Iteration: Maximum prevents runaway
Default 10 iterations
Configurable per task
Fails cleanly at limit
State Tracking: History enables debugging
Each iteration logged
Failure reasons recorded
Pattern analysis possible

Consequences¶

Positive: Handles reality, bounded, debuggable
Negative: May hit limit on complex tasks, overhead

Alternatives Considered¶

No Iteration: Fail on first error
Rejected: Too strict, wastes progress
Unbounded Iteration: Keep trying until success
Rejected: Potential infinite loops, cost concerns

Summary¶

These decisions collectively create a system that:

Specializes agents for clear responsibilities
Requires evidence for reliable verification
Optimizes costs with model tiers
Integrates cleanly through hooks
Tracks state with simple files
Shares expertise through skills
Enforces boundaries with tool restrictions
Handles failure with bounded iteration

The overall philosophy: build in constraints that prevent problems rather than detecting them after the fact.

Architecture Overview - System design
Data Flows - Information flow
The "Subagents LIE" Principle - Core problem
Agent Specialization - Agent design

Design Decisions¶

ADR-001: Multi-Agent Architecture¶

Context¶

Decision¶

Rationale¶

Consequences¶

Alternatives Considered¶

ADR-002: Verification-First Philosophy¶

Context¶

Decision¶

Rationale¶

Consequences¶

Alternatives Considered¶

ADR-003: Model Tier Strategy¶

Context¶

Decision¶

Rationale¶

Consequences¶

Alternatives Considered¶

ADR-004: Hook-Based Lifecycle¶

Context¶

Decision¶

Rationale¶

Consequences¶

Alternatives Considered¶

ADR-005: State File Design¶

Context¶

Decision¶

Rationale¶

Consequences¶

Alternatives Considered¶

ADR-006: Skill System Architecture¶

Context¶

Decision¶

Rationale¶

Consequences¶

Alternatives Considered¶

ADR-007: Tool Access Control¶

Context¶

Decision¶

Rationale¶

Consequences¶

Alternatives Considered¶

ADR-008: Iteration and Failure Handling¶

Context¶

Decision¶

Rationale¶

Consequences¶

Alternatives Considered¶

Summary¶

Related Documentation¶