Design Decisions¶
Architectural Decision Records (ADRs) documenting the key design choices in Agent Flow.
ADR-001: Multi-Agent Architecture¶
Context¶
When building an AI-assisted development system, we need to decide between: 1. A single general-purpose agent handling all tasks 2. Multiple specialized agents with focused responsibilities
Decision¶
Use multiple specialized agents with distinct roles, tools, and behavioral guidelines.
Rationale¶
- Separation of Concerns: Each agent has a clear responsibility
- Exploration is separate from implementation
- Implementation is separate from verification
-
This prevents shortcuts and ensures thoroughness
-
Tool Restriction: Agents only have tools they need
- Explorers can't modify files (prevents accidental changes)
- Implementers can't skip verification (no temptation)
-
Verifiers can't fix issues (forces handback)
-
Model Optimization: Different tasks need different capabilities
- Strategic planning benefits from deeper reasoning (Opus)
- Execution benefits from speed (Sonnet)
-
Cost is optimized by matching model to task
-
Independent Verification: Self-verification is unreliable
- Agents reviewing their own work have inherent bias
- Separate verifiers provide objective assessment
Consequences¶
- Positive: Clear responsibilities, better verification, cost optimization
- Negative: More complexity, context passing overhead
Alternatives Considered¶
- Single Agent with Modes: One agent switching between modes
-
Rejected: No clear boundaries, self-verification issues
-
Parallel Agents with Consensus: Multiple agents vote on decisions
- Rejected: Overhead without clear benefit for development tasks
ADR-002: Verification-First Philosophy¶
Context¶
LLMs can generate confident claims about task completion without actual verification. We need to determine how to handle this.
Decision¶
Require evidence, not claims. Every significant assertion must be backed by actual command output.
Rationale¶
- LLMs Hallucinate Completion: Models report success based on patterns, not memories
- "Tests pass" is easy to generate without running tests
-
Confidence doesn't correlate with accuracy
-
Evidence is Verifiable: Command output can be checked
- Test results show actual pass/fail
- Type errors are explicit
-
Build failures are undeniable
-
Prevention Over Detection: Better to require evidence than detect lies
- Evidence requirements prevent false claims
- Detection is unreliable with capable models
Consequences¶
- Positive: Reliable completion status, caught errors, user confidence
- Negative: Slower workflows, more verbose output
Alternatives Considered¶
- Trust Agent Claims: Accept completion statements at face value
-
Rejected: High false positive rate, bugs ship
-
Sample Verification: Randomly verify some claims
- Rejected: Inconsistent quality, some bugs slip through
ADR-003: Model Tier Strategy¶
Context¶
Different AI models have different capabilities and costs. We need to decide how to allocate models to tasks.
Decision¶
Use two model tiers: - Opus for strategic/planning tasks (Riko, Senku) - Sonnet for execution/verification tasks (Loid, Lawliet, Alphonse)
Rationale¶
- Opus Strengths: Deep reasoning, complex analysis
- Valuable for unfamiliar codebase exploration
- Important for multi-step planning
-
Worth the cost for strategic decisions
-
Sonnet Strengths: Speed, clear task execution
- Sufficient for well-defined implementation
- Fast iterations for review cycles
-
Cost-effective for command execution
-
Cost Optimization: Not all tasks need maximum capability
- Running tests doesn't require deep reasoning
- Implementing a clear plan is straightforward
- Save expensive model for high-value decisions
Consequences¶
- Positive: Cost efficiency, appropriate capability matching
- Negative: Potential capability gaps, model switching overhead
Alternatives Considered¶
- All Opus: Use best model everywhere
-
Rejected: Excessive cost for simple tasks
-
All Sonnet: Use fast model everywhere
-
Rejected: Insufficient for complex exploration/planning
-
Dynamic Selection: Choose model per-task
- Rejected: Complexity without clear benefit over role-based
ADR-004: Hook-Based Lifecycle¶
Context¶
We need to inject behavior at specific points in the Claude Code lifecycle for validation and guidance.
Decision¶
Use hooks at lifecycle events: - UserPromptSubmit: Prompt refinement - PreToolUse: Operation validation - PostToolUse: Result verification - SessionStart: Context loading - Stop: Completion gates
Rationale¶
- Non-Invasive: Hooks augment without modifying core behavior
- Claude Code remains unchanged
-
Plugin adds capabilities through hooks
-
Targeted Intervention: Each hook has a specific purpose
- Validation happens before operations
- Verification happens after operations
-
Gates happen before completion
-
Extensible: New behaviors can be added via hooks
- No core changes needed
- Multiple hooks can run at same point
Consequences¶
- Positive: Clean integration, extensible, targeted
- Negative: Limited to supported hook points, execution overhead
Alternatives Considered¶
- Core Modification: Modify Claude Code directly
-
Rejected: Not maintainable, version coupling
-
Wrapper Approach: Intercept all operations
- Rejected: Too invasive, performance impact
ADR-005: State File Design¶
Context¶
Workflows span multiple agent interactions and need persistent state tracking.
Decision¶
Use YAML-frontmatter Markdown files in .claude/ directory:
- Human readable
- Machine parseable
- Git-ignorable
- Session-scoped
Rationale¶
- Dual-Purpose Format: YAML for structured data, Markdown for logs
- Frontmatter holds machine-parseable state
-
Body holds human-readable history
-
Local Storage: Files in
.claude/directory - No external dependencies
- Easy to inspect and debug
-
Can be safely deleted
-
Session Scope: Files are ephemeral
- Not committed to git
- Fresh state each session
- No stale state issues
Consequences¶
- Positive: Simple, debuggable, no dependencies
- Negative: No persistence across sessions (by design), file I/O overhead
Alternatives Considered¶
- In-Memory State: Keep state in conversation context
-
Rejected: Lost on context overflow, hard to debug
-
Database Storage: Use SQLite or similar
- Rejected: Overkill for session-scoped state
ADR-006: Skill System Architecture¶
Context¶
Domain expertise needs to be encoded and shared across agents.
Decision¶
Create skill modules with: - Owner agent (maintains the skill) - Consumer agents (reference the skill) - Reference materials (detailed documentation) - Examples (worked scenarios)
Rationale¶
- Ownership Model: Clear responsibility for each skill
- One agent owns each skill
- Consumers reference but don't modify
-
Prevents conflicting guidance
-
Documentation as Code: Skills are markdown files
- Version controlled
- Easy to update
-
Human readable
-
Hierarchical Structure: SKILL.md + references + examples
- Quick reference in main file
- Deep dive in references
- Practical guidance in examples
Consequences¶
- Positive: Clear ownership, extensible, documented
- Negative: Potential inconsistency, maintenance burden
Alternatives Considered¶
- Inline Prompts: Embed all guidance in agent prompts
-
Rejected: Duplication, hard to maintain
-
Shared Knowledge Base: Single document for all agents
- Rejected: No ownership, conflicting guidance
ADR-007: Tool Access Control¶
Context¶
Agents need different capabilities for their roles. Unrestricted access enables shortcuts.
Decision¶
Restrict tools per agent role: - Only Loid can Write/Edit - Only Riko can WebSearch/WebFetch - Only Senku can TodoWrite
Rationale¶
- Prevents Shortcuts: Agents can't do others' jobs
- Explorer can't "just fix it" while exploring
- Verifier can't modify code to make tests pass
-
Planner can't skip to implementation
-
Clear Boundaries: Role is enforced by capability
- Not just guidance but actual restriction
-
Agents don't need to resist temptation
-
Audit Trail: Actions map to responsible agents
- File modifications came from Loid
- Web research came from Riko
- Plans came from Senku
Consequences¶
- Positive: Clear boundaries, enforced specialization, accountability
- Negative: Requires handoffs, potential delays
Alternatives Considered¶
- All Tools for All Agents: Trust agents to use appropriately
-
Rejected: Temptation too strong, boundaries blur
-
Request-Based Access: Agents request tools as needed
- Rejected: Overhead without clear benefit
ADR-008: Iteration and Failure Handling¶
Context¶
Verification may fail, requiring iteration. We need to handle this gracefully.
Decision¶
Implement iteration loops with maximum bounds: - Failed verification returns to implementation - Iteration counter tracks attempts - Maximum iterations prevent infinite loops - State tracks iteration history
Rationale¶
- Reality of Development: Not all implementations pass first try
- Tests may reveal bugs
- Review may find issues
-
Iteration is normal
-
Bounded Iteration: Maximum prevents runaway
- Default 10 iterations
- Configurable per task
-
Fails cleanly at limit
-
State Tracking: History enables debugging
- Each iteration logged
- Failure reasons recorded
- Pattern analysis possible
Consequences¶
- Positive: Handles reality, bounded, debuggable
- Negative: May hit limit on complex tasks, overhead
Alternatives Considered¶
- No Iteration: Fail on first error
-
Rejected: Too strict, wastes progress
-
Unbounded Iteration: Keep trying until success
- Rejected: Potential infinite loops, cost concerns
ADR-009: Graphify Knowledge Graph Integration¶
Context¶
Agents exploring large codebases rely heavily on Grep and Read, which require knowing what to search for. Structural questions — "what imports this module?", "what is the blast radius of changing X?", "which community does this file belong to?" — are expensive to answer with text search alone.
Decision¶
Integrate graphify as a read-only MCP server with access granted to Riko, Senku, and Lawliet only. Expose 7 graph query tools via the mcp__plugin_agent-flow_graphify__* prefix. Auto-detect graphify-out/graph.json at orchestration init via detect-graph-context.sh and write a graph: block into the state file.
Rationale¶
- One-writer invariant: Loid (writes code) and Alphonse (runs tests) must not query the graph. Loid's writes make the graph stale; Alphonse's verification cannot rely on graph freshness. Scoping graph access to read-only exploration/planning/review agents enforces this cleanly.
- MCP provides a clean integration boundary: Tools appear natively in agent tool lists — no prompt engineering required, no bash scripts in tool chains.
- Graph queries complement grep, don't replace it: Structural traversal (callers, communities, paths) belongs to graph tools; literal content and freshly-edited files belong to Grep/Read. The
graphify-usageskill defines this boundary explicitly.
Consequences¶
- Positive: Structural queries (blast-radius, module clustering, dependency mapping) become cheap; agents orient faster in unfamiliar codebases
- Negative: Graph can become stale after edits within a session; requires a build step (
/graphify) before first use
Alternatives Considered¶
- Embedding graph data in prompts: Rejected —
graph.jsonis thousands of nodes; token cost is prohibitive - Giving all agents graph access: Rejected — violates one-writer invariant; Loid querying a stale graph during implementation would produce misleading structural information
ADR-010: Personal Knowledge Base Integration¶
Context¶
Each orchestration session starts without memory of prior sessions, projects, or decisions. Users accumulate patterns, anti-patterns, and preferences across projects that are not available in any single repository.
Decision¶
Integrate a second MCP server for the user's personal knowledge base graph, using the same 7-tool surface as graphify but pointed at $AGENT_FLOW_PERSONAL_KB_PATH/graphify-out/graph.json. Apply the same agent scoping as graphify: Riko, Senku, Lawliet have access; Loid and Alphonse do not.
Rationale¶
- Separate graph avoids mixing concerns: Project structure (current codebase) and personal notes (cross-project memory) are distinct knowledge domains. Keeping them in separate graphs allows independent refresh cycles and avoids contaminating structural queries with personal annotations.
- Same access pattern reduces cognitive load: Agents already know the 7-tool graph API from
graphify-usage.personal-kb-usagereuses the same tool names via themcp__personal-kb__*prefix — only the query intent differs. - Env-var config preserves portability: The path to the personal KB is user-specific and outside the project. An env var (
AGENT_FLOW_PERSONAL_KB_PATH) keeps the plugin project-agnostic.
Consequences¶
- Positive: Cross-project recall (prior decisions, anti-patterns, style preferences) surfaces automatically during exploration and planning; pattern reuse across sessions
- Negative: Personal KB may be stale if the user hasn't re-run graphify on their notes recently; requires env var configuration to activate
Alternatives Considered¶
- Inline notes in prompts: Rejected — personal knowledge bases can be large; pasting them into task prompts is token-expensive and brittle
- Shared project graph: Rejected — mixing project structure with personal annotations makes both harder to query accurately; separate graphs keep each domain clean
Summary¶
These decisions collectively create a system that:
- Specializes agents for clear responsibilities
- Requires evidence for reliable verification
- Optimizes costs with model tiers
- Integrates cleanly through hooks
- Tracks state with simple files
- Shares expertise through skills
- Enforces boundaries with tool restrictions
- Handles failure with bounded iteration
- Queries structure through read-only graph MCP integration
- Recalls cross-project knowledge through personal KB MCP integration
The overall philosophy: build in constraints that prevent problems rather than detecting them after the fact.
Related Documentation¶
- Architecture Overview - System design
- Data Flows - Information flow
- The "Subagents LIE" Principle - Core problem
- Agent Specialization - Agent design