A production multi-agent system for researching competitive landscapes. A coordinator spawns parallel subagents for company research, market analysis, and financial data gathering. Results are deduplicated, conflicts surfaced, coverage gaps annotated, and state persisted for fault-tolerant resumption. Covers D1, D2, and D5 exam content comprehensively.
| Domain | Name | Weight | Coverage |
|---|---|---|---|
| D1 | Agent Architecture | 27% | Hub-and-spoke topology, coordinator/subagent roles, parallel spawning, anti-abort pattern, context isolation |
| D2 | Tool & MCP Design | 18% | AgentDefinition config, allowed-tools least privilege, explicit context passing in Task prompts |
| D5 | Context & Reliability | 15% | Conflict detection (epistemic honesty), coverage annotations, state persistence with manifest.json |
Defines all shared data structures: AgentDefinition, ResearchTask, AgentResult, and ResearchReport. The AgentDefinition config pattern is the exam's primary tool design concept — it enforces least-privilege access at the structural level.
The SubagentExecutor class runs individual subagent tasks. The critical exam concept here is context isolation — each subagent is a fresh Claude instance that receives only its system prompt and the Task prompt. It has no memory of coordinator decisions, other subagents, or previous pipeline runs.
# WRONG: assuming subagent inherits coordinator context task_prompt = "Research the company's financials." # RIGHT: inject all context explicitly in the Task prompt task_prompt = f""" Company: {company_name} Industry: {industry} Focus areas: revenue, growth rate, profitability Task: Research recent financial performance. Sources: SEC filings, earnings reports. Return format: JSON with fields: revenue, growth, key_metrics """
The ResearchCoordinator is the hub. It orchestrates the pipeline: checks state, spawns parallel subagents via Task tool, waits for results, handles failures via the anti-abort pattern, then triggers synthesis. The exam's most tested D1 concept: parallel spawning and failure handling.
# Parallel spawning — ALL Task calls in ONE response # These run concurrently, not sequentially results = await asyncio.gather( executor.run(company_task), # Task call 1 executor.run(market_task), # Task call 2 executor.run(financial_task), # Task call 3 return_exceptions=True # don't abort on single failure )
Detects contradictions between subagent results for the same factual claim. When two sources disagree, the system preserves both values rather than silently resolving to one. This is the "epistemic honesty" principle — annotated uncertainty is better than false confidence.
| Conflict Type | Example | Resolution |
|---|---|---|
| Numeric divergence >10% | Source A: revenue $2.1B; Source B: $3.4B | Preserve both, add conflict_flag, route to human |
| Temporal: sources 1+ year apart | 2022 report: 500 employees; 2024 report: 1,200 employees | Keep both, add temporal_explanation, both valid |
| Factual contradiction | Founded 2015 vs. Founded 2018 | Preserve both, flag conflict, don't guess |
Synthesizes results from all subagents into a ResearchReport. Implements coverage annotations for incomplete data and the "KEY FINDINGS at top, ACTION ITEMS at bottom" layout for long-context reliability.
Fault-tolerant state management. Saves each agent's result to a per-agent JSON file and tracks completion in manifest.json. On resumption, completed tasks are skipped, failed/running tasks are restarted, and tasks not in the manifest run fresh — enabling cheap incremental recovery from crashes.
results/{task_id}.json — complete AgentResult object, loaded on resume# Resume logic from manifest for task_id, status in manifest.items(): if status == "completed": results[task_id] = load_result(task_id) # skip re-run elif status == "running": pending.append(task_id) # crashed; restart # not in manifest → run fresh (appended to pending)
Pipeline configuration: max parallel agents (3), retry limits per agent (3), conflict numeric threshold (10%), temporal conflict threshold (365 days), output paths. The 3-agent parallel limit prevents API rate limiting while maximizing throughput for typical research pipelines.
Demonstrates the pipeline on 3 scenarios: fresh run (all agents succeed), resume (simulated crash after 2 of 3 agents), and conflict detection (company researcher and financial analyst return conflicting revenue figures).
Source: explanation Ex4.md