The "Subagents LIE" Principle¶
Understanding why AI agents cannot be trusted without verification, and how Agent Flow addresses this fundamental challenge.
The Core Problem¶
Large Language Models (LLMs) have a fundamental limitation that becomes dangerous in multi-agent systems: they can confidently claim to have completed work they never actually did. This isn't malicious behavior - it's an emergent property of how these models generate text based on patterns they've learned.
When an LLM agent is asked to "implement feature X," it may:
- Generate plausible-sounding code that doesn't actually work
- Claim tests pass without running them
- Report "implementation complete" without verifying the changes
- Describe files it modified without actually modifying them
This behavior is particularly insidious because the agent's response sounds completely confident and professional. Without verification, an orchestrator or human has no way to distinguish between genuine completion and hallucinated success.
Why This Happens¶
LLMs are trained to generate helpful, coherent responses. When asked about task completion, the most "helpful" pattern learned from training data is to report success. The model doesn't have an internal state tracking whether it actually performed actions - it generates the response that best matches the expected pattern.
User: "Did you implement the authentication?"
LLM Pattern Match: "Yes, I implemented authentication by..."
The model generates this response because it matches the expected conversational pattern, not because it has verified its own actions.
The Verification Decision Flowchart¶
Agent Flow uses this decision process to determine if work is actually complete:
flowchart TD
A[Agent claims task complete] --> B{Did agent provide<br/>command output?}
B -->|No| C[REJECT: No evidence]
B -->|Yes| D{Are outputs from<br/>verification commands?}
D -->|No| E[REJECT: Wrong evidence type]
D -->|Yes| F{Do outputs show<br/>zero errors?}
F -->|No| G[REJECT: Errors present]
F -->|Yes| H{Did agent run<br/>ALL required checks?}
H -->|No| I[REJECT: Incomplete verification]
H -->|Yes| J[ACCEPT: Verified complete]
C --> K[Request actual verification]
E --> K
G --> K
I --> K
How Agent Flow Addresses This¶
Agent Flow treats LLM output as fundamentally untrustworthy until verified. This philosophy is embedded in every layer of the system:
1. Explicit Verification Agents¶
Rather than trusting implementation agents to verify their own work, Agent Flow delegates verification to a specialized agent (Alphonse) whose sole purpose is running validation commands and reporting actual output.
Loid (Executor) -> Alphonse (Verifier)
| |
Claims "done" Runs actual tests
| |
Not trusted Provides evidence
2. Evidence Requirements in Agent Definitions¶
Every agent in Agent Flow has explicit evidence requirements built into their system prompt:
Loid (Executor):
- Do NOT claim "done", "complete", "looks good", or "should work" without ACTUAL verification output
- Do NOT say "I believe this works" or "this appears correct" - RUN THE COMMANDS
- Do NOT summarize what you did - SHOW THE VERIFICATION OUTPUT
Alphonse (Verifier):
- Do NOT claim "VERIFIED" or "all tests pass" without ACTUAL command output proving it
- Do NOT summarize results - SHOW THE EXACT OUTPUT from each verification command
- Do NOT say "appears to work" or "should be fine" - only report what commands ACTUALLY returned
3. Hook-Based Verification Gates¶
The system enforces verification at multiple lifecycle points:
| Hook | Purpose | Enforcement |
|---|---|---|
| PreToolUse | Validate file operations before execution | Prevents invalid paths |
| PostToolUse | Verify delegation results by agent type | Context-aware guidance |
| Stop | Run verification before task completion | Tests, types, lint, build |
4. Structured Output Requirements¶
Agents must report verification results in a specific format that makes it impossible to hide failures:
## Verification Results
### Tests
- Status: [PASS | FAIL]
- Output: [Exact command output]
### Type Check
- Status: [PASS | FAIL]
- Errors: [List if any]
### Lint
- Status: [PASS | FAIL]
- Warnings: [Count and details]
### Build
- Status: [PASS | FAIL]
- Issues: [Details if any]
### Overall: [VERIFIED | FAILED]
The Orchestrator's Responsibility¶
The orchestrator (the main Claude instance coordinating the workflow) must also follow the "subagents lie" principle:
- Never trust completion claims - Require evidence
- Verify evidence is actual command output - Not summaries
- Check all required gates passed - Tests, types, lint, build
- Loop back if any gate fails - Do not proceed to completion
The orchestrate command includes explicit behavioral constraints:
- Do NOT claim "task complete" or "looks good" without running verification commands
- Do NOT skip any phase or verification step
- Do NOT output the completion promise until ALL gates pass
- Do NOT assume success - verify with actual command output
Practical Implications¶
What This Means for Users¶
- Tasks take longer - Verification adds overhead, but prevents bugs
- Failure is visible - You'll see actual error messages, not hidden problems
- Completion is reliable - When the system says "verified," it means verified
What This Means for Extending Agent Flow¶
When adding new agents or skills:
- Include evidence requirements - Specify what constitutes proof of completion
- Define verification commands - What commands prove success?
- Build in distrust - Assume claims need verification
- Separate execution from verification - Different agents for different roles
The Completion Promise¶
Agent Flow only outputs the completion marker when all verification gates have passed:
This marker should never appear if: - Any tests are failing - Type errors exist - Lint errors exist - Build fails - Any verification gate is not confirmed PASS
The presence of this marker indicates that actual verification commands were run, actual output was captured, and all checks passed with zero errors.
Related Concepts¶
- Evidence-Based Verification - The verification methodology in detail
- Agent Specialization - Why verification is a separate role
- Verification Gates Reference - Hook implementation details
Key Takeaway¶
Never trust. Always verify.
This isn't pessimism - it's engineering discipline. By assuming agents can and will hallucinate completion, Agent Flow builds in the safeguards necessary to catch failures before they reach production.