# Template-Guided Team Routing Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Let a root Main Agent choose Team execution on its first provider response whenever an activated Skill supplies a valid Team template, while preserving an intentional zero-extra-round single-agent path. **Architecture:** Keep `ExecutionGraph`, `ExecutionNode`, `LocalAgentRunner`, and `run_agent_team` unchanged. Add a small Main-Agent routing state inside `AgentLoop`: it selects the first valid activated template, adds compact first-turn guidance, classifies the first provider response as `team` or `single`, persists a structured mode event, and prevents a later mid-run Team switch after single-agent work starts. Project that event into the existing Task process stream; no frontend work is included. **Tech Stack:** Python 3.12, asyncio, dataclasses, pytest, existing `AgentLoop`, session event store, process projector, and Team tool runtime. --- ## File Structure - `app-instance/backend/beaver/engine/loop.py`: primary-template selection, first-turn guidance, mode classification/lock, tool-call filtering, and persistent routing event. - `app-instance/backend/beaver/services/process_service.py`: project the routing event into the existing task process stream. - `app-instance/backend/tests/unit/test_agent_loop.py`: Main-Agent prompt, first-turn Team, first-turn Single, mixed-call, and no-template regression tests. - `app-instance/backend/tests/unit/test_process_projection.py`: routing-event projection test. No changes to Planner, Team scheduler/runtime, ToolAssembler, ToolExecutor, evidence gate, final synthesis gate, frontend, or Skill learning are required. ### Task 1: Select a Primary Template and Make First-Turn Routing Explicit **Files:** - Modify: `app-instance/backend/beaver/engine/loop.py` - Modify: `app-instance/backend/tests/unit/test_agent_loop.py` - [ ] **Step 1: Add a sequenced provider and a valid template fixture to the AgentLoop test module** Add imports for `SkillContext` and `ToolCall`, then add a provider that captures the system prompt and returns supplied responses in sequence: ```python class SequencedProvider(LLMProvider): def __init__(self, responses: list[LLMResponse]) -> None: super().__init__() self.responses = list(responses) self.calls: list[dict[str, Any]] = [] async def chat(self, messages: list[dict], tools: list[dict] | None = None, **_: Any) -> LLMResponse: self.calls.append({"messages": messages, "tools": tools}) return self.responses.pop(0) def get_default_model(self) -> str: return "stub-model" def _team_template_skill(name: str = "finance-report") -> SkillContext: return SkillContext( name=name, content="# Finance report", team_template={ "version": 1, "strategy": "dag", "nodes": [{"node_id": "collect", "task": "Collect official sources"}], }, ) ``` - [ ] **Step 2: Write failing first-turn guidance and deterministic-primary tests** ```python def test_root_task_with_template_adds_first_turn_team_routing_guidance(tmp_path) -> None: provider = RecordingProvider() loop = AgentLoop(loader=EngineLoader(workspace=tmp_path)) asyncio.run(loop.process_direct( "compare financial reports", session_id="session", task_id="task-1", task_mode=True, pinned_skill_contexts=[_team_template_skill(), _team_template_skill("ignored")], provider_bundle=_bundle(provider), )) system_content = "\n".join( str(message["content"]) for message in provider.messages_by_call[0] if message["role"] == "system" ) assert "choose one execution path in this first response" in system_content assert "run_agent_team" in system_content assert '"skill_name":"finance-report"' in system_content assert "ignored" not in system_content def test_empty_template_nodes_do_not_enable_first_turn_team_routing(tmp_path) -> None: provider = RecordingProvider() loop = AgentLoop(loader=EngineLoader(workspace=tmp_path)) empty = SkillContext(name="empty", content="# Empty", team_template={"nodes": []}) asyncio.run(loop.process_direct( "single lookup", session_id="session", task_id="task-1", task_mode=True, pinned_skill_contexts=[empty], provider_bundle=_bundle(provider), )) assert "choose one execution path in this first response" not in provider.system_prompts[0] ``` Extend `RecordingProvider` to retain `messages_by_call` and `system_prompts`, instead of creating a second nearly-identical fixture. - [ ] **Step 3: Run the focused tests to verify failure** Run: ```bash cd app-instance/backend && uv run pytest tests/unit/test_agent_loop.py -q ``` Expected: FAIL because no Main-Agent template selector or first-turn routing guidance exists. - [ ] **Step 4: Add a private, immutable routing-selection value and selector in `loop.py`** Place this near `AgentRunResult`: ```python @dataclass(frozen=True, slots=True) class _TeamTemplateRouting: skill_name: str template: dict[str, Any] ignored_skill_names: tuple[str, ...] = () def _select_main_agent_team_template( activated_skills: list[SkillContext], ) -> _TeamTemplateRouting | None: candidates = [ skill for skill in activated_skills if isinstance(skill.team_template, dict) and isinstance(skill.team_template.get("nodes"), list) and bool(skill.team_template["nodes"]) ] if not candidates: return None return _TeamTemplateRouting( skill_name=candidates[0].name, template=dict(candidates[0].team_template or {}), ignored_skill_names=tuple(skill.name for skill in candidates[1:]), ) ``` This intentionally mirrors, but does not alter, `TaskExecutionPlanner._select_team_template()`: planner adaptation metadata and Main-Agent first-turn routing have different lifecycles. Do not move the helper into Planner or use Planner as a runtime dependency. - [ ] **Step 5: Build compact guidance only when a root Task can actually invoke the Team tool** Replace the static-only Team section with a helper that accepts the routing value: ```python @staticmethod def _team_template_routing_prompt(routing: _TeamTemplateRouting) -> str: template_payload = json.dumps( {"skill_name": routing.skill_name, "template": routing.template}, ensure_ascii=False, separators=(",", ":"), ) return ( "# Task Agent Team Routing\n\n" "An active Skill provides this primary Team template:\n" f"{template_payload}\n\n" "Before beginning ordinary work, choose one execution path in this first response. " "For staged collection, extraction, validation, comparison, research, or reporting represented " "by this template, call `run_agent_team` now using task-only nodes derived from it. " "Choose single-agent execution only for a plainly one-step request, an explicit request not to " "delegate, or a template that does not fit the immediate request. Do not call ordinary tools " "before this choice. If choosing single-agent execution, call ordinary tools or answer normally " "without explaining the routing choice." ) ``` In `_process_direct_impl()`, calculate the value after activated Skills are resolved. Pass it into `_extra_guidance_sections()` only when all are true: ```python is_root_task = task_mode and not parent_session_id and not str(source or "").startswith("team:") team_tool_available = any(spec.name == AGENT_TEAM_TOOL_NAME for spec in selected_tool_specs) routing_template = _select_main_agent_team_template(activated_skills) routing_enabled = is_root_task and team_tool_available and routing_template is not None ``` Keep `TASK_AGENT_TEAM_CAPABILITY_PROMPT` for ordinary root Task capability exposure. Do not add guidance for empty/invalid templates, child Team nodes, non-Task runs, or when `run_agent_team` is absent. - [ ] **Step 6: Run the focused tests to verify they pass** Run: ```bash cd app-instance/backend && uv run pytest tests/unit/test_agent_loop.py -q ``` Expected: PASS, including existing root-Team-tool visibility coverage. ### Task 2: Lock First-Turn Mode and Persist the Machine-Readable Decision **Files:** - Modify: `app-instance/backend/beaver/engine/loop.py` - Modify: `app-instance/backend/tests/unit/test_agent_loop.py` - [ ] **Step 1: Write failing Team, Single, mixed-call, and legacy behavior tests** Use `ToolCall` objects in a `SequencedProvider`; use the normal registered `run_agent_team` only with a `tool_executor_override` stub so the test checks AgentLoop routing without starting a real Team. ```python def test_first_turn_agent_team_call_records_team_mode_and_executes_only_team(tmp_path) -> None: provider = SequencedProvider([ LLMResponse( content="", tool_calls=[ ToolCall(id="team", name="run_agent_team", arguments={"nodes": [{"node_id": "collect", "task": "Collect"}]}), ToolCall(id="search", name="web_search", arguments={"query": "must not run"}), ], provider_name="stub", model="stub-model", ), LLMResponse(content="done", provider_name="stub", model="stub-model"), ]) executor = CapturingToolExecutor() loop = AgentLoop(loader=EngineLoader(workspace=tmp_path)) asyncio.run(loop.process_direct( "compare finance reports", session_id="session", task_id="task-1", task_mode=True, pinned_skill_contexts=[_team_template_skill()], provider_bundle=_bundle(provider), tool_executor_override=executor, )) assert [call.name for call in executor.calls] == ["run_agent_team"] decision = _event_payload(loop, "session", "execution_mode_selected") assert decision == { "task_id": "task-1", "execution_mode": "team", "routing_source": "main_agent_first_turn", "primary_template_skill": "finance-report", "ignored_template_skills": [], } def test_first_turn_ordinary_tool_records_single_and_blocks_later_team_call(tmp_path) -> None: provider = SequencedProvider([ LLMResponse( content="", tool_calls=[ToolCall(id="search", name="web_search", arguments={"query": "one step"})], provider_name="stub", model="stub-model", ), LLMResponse( content="", tool_calls=[ToolCall(id="team", name="run_agent_team", arguments={"nodes": [{"node_id": "late", "task": "Late"}]})], provider_name="stub", model="stub-model", ), LLMResponse(content="done", provider_name="stub", model="stub-model"), ]) executor = CapturingToolExecutor() loop = AgentLoop(loader=EngineLoader(workspace=tmp_path)) asyncio.run(loop.process_direct( "one-step lookup", session_id="session", task_id="task-1", task_mode=True, pinned_skill_contexts=[_team_template_skill()], provider_bundle=_bundle(provider), tool_executor_override=executor, )) assert [call.name for call in executor.calls] == ["web_search"] assert "run_agent_team" not in provider.tool_names_by_call[1] late_result = _tool_result_by_call_id(loop, "session", "team") assert late_result["error"] == "execution_mode_locked_single" assert _event_payload(loop, "session", "execution_mode_selected")["execution_mode"] == "single" ``` Also assert that a root Task with no template keeps `run_agent_team` in every provider schema, preserving legacy behavior. - [ ] **Step 2: Run the test module to verify failure** Run: ```bash cd app-instance/backend && uv run pytest tests/unit/test_agent_loop.py -q ``` Expected: FAIL because AgentLoop has no decision event, no per-run mode state, and executes mixed/later Team calls normally. - [ ] **Step 3: Add mode state and first-response classification immediately after the provider response** Before the `while True` loop set: ```python routing_mode: str | None = None ``` After `response = await provider.chat(**chat_kwargs)` and before serializing/appending the assistant message, classify only once when `routing_enabled` is true: ```python if routing_enabled and routing_mode is None: tool_names = {self._tool_call_name(tool_call) for tool_call in response.tool_calls} routing_mode = "team" if AGENT_TEAM_TOOL_NAME in tool_names else "single" append_message( resolved_session_id, run_id=resolved_run_id, role="system", event_type="execution_mode_selected", event_payload={ "task_id": task_id, "attempt_index": attempt_index, "execution_mode": routing_mode, "routing_source": "main_agent_first_turn", "primary_template_skill": routing_template.skill_name, "ignored_template_skills": list(routing_template.ignored_skill_names), }, content=None, context_visible=False, source=source, title=title, model=final_model, user_id=user_id, ) ``` Do not write this event for runs without `routing_enabled`. A no-tool first response selects `single` before the normal final-answer branch. - [ ] **Step 4: Apply the no-mixed-mode and single-lock behavior at the call boundary** Add two private helpers: ```python @staticmethod def _calls_for_execution_mode(tool_calls: list[Any], routing_mode: str | None) -> list[Any]: if routing_mode != "team": return list(tool_calls) return [call for call in tool_calls if AgentLoop._tool_call_name(call) == AGENT_TEAM_TOOL_NAME] @staticmethod def _team_locked_result(tool_call: Any) -> ToolResult: return ToolResult( success=False, content="Agent Team can only be selected in the first response of this Task run.", tool_name=AGENT_TEAM_TOOL_NAME, error="execution_mode_locked_single", ) ``` Then use these rules in the loop: 1. If first response selected `team`, serialize and execute only `run_agent_team`; ordinary calls from that response receive no execution. 2. If `routing_mode == "single"` and the current iteration is after the first response, remove `run_agent_team` from `chat_kwargs["tools"]` before calling the provider. 3. If a later response nevertheless emits `run_agent_team`, do not call the executor. Add `_team_locked_result()` through the same `tool_result_recorded` and context-builder paths as ordinary tool failures. 4. Preserve the normal concurrent-execution decision for the remaining executable calls. Keep original tool schemas and ToolExecutor behavior unchanged for no-template runs. Do not alter `allowed_tool_names` behavior or use it as a source of tools. - [ ] **Step 5: Run focused AgentLoop tests** Run: ```bash cd app-instance/backend && uv run pytest tests/unit/test_agent_loop.py -q ``` Expected: PASS. The test verifies no extra provider call is made solely for mode selection, mixed first-turn calls execute only Team, and late Team calls are rejected after Single mode. ### Task 3: Project Routing Decisions into the Task Process Stream **Files:** - Modify: `app-instance/backend/beaver/services/process_service.py` - Modify: `app-instance/backend/tests/unit/test_process_projection.py` - [ ] **Step 1: Write a failing process-projection test** ```python def test_process_projection_maps_main_agent_execution_mode_selection(tmp_path: Path) -> None: session = SessionManager(tmp_path) run_store = RunMemoryStore(tmp_path / "memory" / "runs") session.append_message( "web:test", run_id="main-run", role="system", event_type="execution_mode_selected", event_payload={ "task_id": "task-1", "attempt_index": 1, "execution_mode": "team", "routing_source": "main_agent_first_turn", "primary_template_skill": "finance-report", "ignored_template_skills": ["secondary-template"], }, context_visible=False, ) projection = SessionProcessProjector(session, run_store).project("web:test") event = next(item for item in projection["events"] if item["kind"] == "execution_mode_selected") assert event["status"] == "done" assert event["metadata"]["execution_mode"] == "team" assert event["metadata"]["primary_template_skill"] == "finance-report" assert event["metadata"]["ignored_template_skills"] == ["secondary-template"] ``` - [ ] **Step 2: Run it to verify failure** Run: ```bash cd app-instance/backend && uv run pytest tests/unit/test_process_projection.py -q ``` Expected: FAIL with `StopIteration`, because the projector ignores `execution_mode_selected`. - [ ] **Step 3: Add a narrow event branch in `SessionProcessProjector.project()`** Place the branch after `skill_activation_snapshotted` and before Team-completion handling: ```python elif record.event_type == "execution_mode_selected": run_id = record.run_id or root_run_id parent_run_id = root_run_id if run_id != root_run_id else None mode = str(payload.get("execution_mode") or "single") add_event( event_id=_event_id(record, "execution-mode"), run_id=str(run_id), parent_run_id=parent_run_id, kind="execution_mode_selected", actor_type="system", actor_id="main-agent-router", actor_name="Main Agent", text="Main Agent selected Team execution." if mode == "team" else "Main Agent selected single-agent execution.", created_at=created_at, status="done", metadata={ **dict(payload), "task_id": task_id, "attempt_index": attempt_index, "timeline_type": "execution_mode", }, ) ``` Do not add frontend rendering in this task. The projected event is enough for the existing API/process payload and future UI work. - [ ] **Step 4: Run focused projection tests** Run: ```bash cd app-instance/backend && uv run pytest tests/unit/test_process_projection.py -q ``` Expected: PASS. ### Task 4: Regression Verification and Steven Docker Acceptance **Files:** - No new production files. - Modify only test fixtures/assertions from Tasks 1–3 if a compatibility issue is exposed. - [ ] **Step 1: Run all directly affected unit tests** Run: ```bash cd app-instance/backend && uv run pytest \ tests/unit/test_agent_loop.py \ tests/unit/test_process_projection.py \ tests/unit/test_team_node_tool_policy.py \ tests/unit/test_task_execution_planner.py \ tests/unit/test_task_team_synthesis_outcome.py \ -q ``` Expected: PASS. Do not change tests outside this feature to accommodate unrelated Python/TestClient cleanup behavior. - [ ] **Step 2: Verify static quality for the scoped diff** Run: ```bash git diff --check -- \ app-instance/backend/beaver/engine/loop.py \ app-instance/backend/beaver/services/process_service.py \ app-instance/backend/tests/unit/test_agent_loop.py \ app-instance/backend/tests/unit/test_process_projection.py ``` Expected: no output and exit status 0. - [ ] **Step 3: Deploy only after local tests pass and verify the real MGM/Galaxy route** Run the established Steven deployment procedure: ```bash docker cp app-instance/backend/beaver app-instance-steven:/opt/app/backend/ docker cp app-instance/backend/pyproject.toml app-instance-steven:/opt/app/backend/pyproject.toml docker exec app-instance-steven sh -lc 'cd /opt/app/backend && uv pip install --system --no-deps -e .' docker restart app-instance-steven curl -fsS http://127.0.0.1:20000/api/ping ``` Create a fresh MGM/Galaxy finance-report Task and inspect its session/task process events. Acceptance requires this ordering: ```text skill_activation_snapshotted → execution_mode_selected {execution_mode: team, primary_template_skill: mgm-galaxy-financial-chart-report-safe} → tool_call_started: run_agent_team → run_agent_team_debug: invoke_started → task_team_run_completed or task_team_run_failed ``` The first ordinary `web_search` must be emitted by a Team node, never by the root Main Agent. If the model intentionally selects Single for this known staged finance template, stop and inspect the captured first-turn system prompt/tool call before changing code. - [ ] **Step 4: Report and stop** Report modified files, focused test outputs, Docker health, real-task event ordering, `git diff --stat`, and remaining model-mediated routing risk. Do not stage or commit unless the user explicitly asks. ## Plan Self-Review - Scope coverage: primary template selection, first-turn guidance, mode selection without extra LLM round/reason text, mode lock, raw event persistence, process projection, and real MGM/Galaxy verification are covered. - Compatibility: no-template runs keep existing Team-tool exposure; child Team nodes still cannot see the tool; graph/runtime/tool scope/evidence/synthesis behavior is untouched. - Out-of-scope guard: no Planner heuristic change, no frontend, no fixed roles, no nested Team, and no new Team model appear in the implementation tasks.