feat(coordinator): 添加团队节点默认最大工具迭代次数配置
添加 DEFAULT_TEAM_NODE_MAX_TOOL_ITERATIONS 配置项以控制团队节点的最大工具迭代次数, 并修改 LocalAgentRunner 中的逻辑来使用此默认值当 envelope 中未指定时。 fix(runtime): 修复团队节点运行成功判断逻辑 更新运行成功判断条件,将 finish_reason 为 "max_tool_iterations_finalized" 的情况 视为运行失败,并添加对原始工具调用输出的检测,避免将其误判为成功完成。 feat(mcp): 添加团队工作流MCP工具类别支持 增加新的本地MCP工具类别 "team_workflow" 及其对应的工具创建功能, 为团队工作流提供本地工具支持。 refactor(engine): 调整AgentLoop最大工具迭代次数设置 将 AgentProfile 中的默认 max_tool_iterations 从 30 增加到 100, 同时移除 TaskExecutionPlanner 构造函数中的重复参数传递。 perf(mcp): 优化MCP连接管理避免重复连接 添加 mcp_connected 标志来跟踪MCP连接状态,确保 connect_all 只执行一次, 提高性能并避免不必要的重复连接。 refactor(skills): 移除技能团队模板相关功能 移除与技能团队模板相关的代码,包括解析、存储和处理逻辑, 简化技能记录结构和加载流程。 feat(process): 增强会话过程投影器功能 添加技能激活快照事件处理,改进团队运行完成消息显示, 并增强技能激活事件的时间戳记录功能。 refactor(tasks): 简化任务尝试编排器团队执行逻辑 移除团队执行相关代码,将所有任务统一按单步执行处理, 简化任务编排器的复杂度并提升执行效率。 fix(evidence): 修复节点证据评估中需求验证逻辑 更新节点证据评估逻辑,跳过自然语言证据需求的确定性验证, 只执行机器可读的需求验证,避免因自然语言需求导致的节点失败。
This commit is contained in:
1526
docs/superpowers/examples/agent-team-flow-logic-visualization.html
Normal file
1526
docs/superpowers/examples/agent-team-flow-logic-visualization.html
Normal file
File diff suppressed because it is too large
Load Diff
85
docs/superpowers/examples/install_mgm_skill_metadata.py
Normal file
85
docs/superpowers/examples/install_mgm_skill_metadata.py
Normal file
@ -0,0 +1,85 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import json
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def main() -> None:
|
||||
skill_name = "mgm-galaxy-financial-chart-report-safe"
|
||||
workspace = Path("/root/.beaver/workspace")
|
||||
skill_dir = workspace / "skills" / skill_name
|
||||
skill_md = skill_dir / "versions" / "v0001" / "SKILL.md"
|
||||
content = skill_md.read_text(encoding="utf-8")
|
||||
digest = "sha256:" + hashlib.sha256(content.encode("utf-8")).hexdigest()
|
||||
now = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
(skill_dir / "current.json").write_text(
|
||||
json.dumps({"current_version": "v0001"}, indent=2, ensure_ascii=False) + "\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
(skill_dir / "skill.json").write_text(
|
||||
json.dumps(
|
||||
{
|
||||
"name": skill_name,
|
||||
"display_name": "MGM/Galaxy Financial Chart Report Safe",
|
||||
"description": "Compare MGM China and Galaxy Entertainment using official financial sources, produce chart-ready Markdown, and avoid claiming generated chart image/file artifacts.",
|
||||
"created_at": now,
|
||||
"updated_at": now,
|
||||
"current_version": "v0001",
|
||||
"status": "active",
|
||||
"tags": ["finance", "research", "report", "chart-ready-data", "mgm", "galaxy"],
|
||||
"owners": ["steven"],
|
||||
"source_kind": "workspace",
|
||||
"lineage": [],
|
||||
},
|
||||
indent=2,
|
||||
ensure_ascii=False,
|
||||
)
|
||||
+ "\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
(skill_dir / "versions" / "v0001" / "version.json").write_text(
|
||||
json.dumps(
|
||||
{
|
||||
"skill_name": skill_name,
|
||||
"version": "v0001",
|
||||
"content_hash": digest,
|
||||
"summary_hash": digest,
|
||||
"created_at": now,
|
||||
"created_by": "steven",
|
||||
"change_reason": "Add real Skill Team Template example for MGM/Galaxy finance report demo",
|
||||
"parent_version": None,
|
||||
"review_state": "published",
|
||||
"frontmatter": {
|
||||
"name": skill_name,
|
||||
"description": "Compare MGM China and Galaxy Entertainment using official financial sources, produce chart-ready Markdown, and avoid claiming generated chart image/file artifacts.",
|
||||
"tools": ["web_search", "web_fetch"],
|
||||
},
|
||||
"summary": "MGM/Galaxy finance report skill with a task-only Beaver team template for official source collection, metric extraction, validation, and Markdown chart-ready reporting.",
|
||||
"tool_hints": ["web_search", "web_fetch"],
|
||||
"provenance": {"source_kind": "manual_demo", "target_instance": "steven"},
|
||||
"tree_hash": "",
|
||||
},
|
||||
indent=2,
|
||||
ensure_ascii=False,
|
||||
)
|
||||
+ "\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
index_path = workspace / "skills" / "_index" / "published.json"
|
||||
try:
|
||||
payload = json.loads(index_path.read_text(encoding="utf-8"))
|
||||
except FileNotFoundError:
|
||||
payload = {"items": []}
|
||||
items = [str(item) for item in payload.get("items", [])]
|
||||
if skill_name not in items:
|
||||
items.append(skill_name)
|
||||
index_path.write_text(json.dumps({"items": items}, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")
|
||||
print(f"installed metadata for {skill_name}: {digest}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@ -0,0 +1,155 @@
|
||||
---
|
||||
name: mgm-galaxy-financial-chart-report-safe
|
||||
description: Compare MGM China and Galaxy Entertainment using official financial sources, produce chart-ready Markdown, and avoid claiming generated chart image/file artifacts.
|
||||
tools:
|
||||
- web_search
|
||||
- web_fetch
|
||||
---
|
||||
|
||||
# MGM/Galaxy Financial Chart Report Safe
|
||||
|
||||
## Overview
|
||||
|
||||
Use this skill when the user asks for a finance comparison report for MGM China Holdings Limited and Galaxy Entertainment Group, especially when the requested output includes a table, chart-ready data, or a textual chart section.
|
||||
|
||||
The skill intentionally separates source collection, metric extraction, validation, and final reporting. It must not invent chart files or image artifacts. If the runtime does not expose a registered chart-rendering tool, the final output should be Markdown plus chart-ready data only.
|
||||
|
||||
```beaver-team-template
|
||||
{
|
||||
"version": 1,
|
||||
"strategy": "dag",
|
||||
"nodes": [
|
||||
{
|
||||
"node_id": "collect_official_sources",
|
||||
"task": "Collect official MGM China Holdings and Galaxy Entertainment financial disclosure sources for the requested period. Prefer annual reports, interim reports, results announcements, investor relations pages, and exchange filings. Return source URLs with short notes about period coverage.",
|
||||
"use_skill": "web-operation",
|
||||
"skill_query": "official financial disclosure web research",
|
||||
"depends_on": [],
|
||||
"requested_tools": ["web_search", "web_fetch"],
|
||||
"required_evidence": ["tool_result", "url"],
|
||||
"evidence_contract": {
|
||||
"version": 1,
|
||||
"entities": ["MGM China Holdings", "Galaxy Entertainment Group"],
|
||||
"source_types": ["annual_report", "interim_report", "results_announcement", "investor_relations", "exchange_filing"],
|
||||
"minimum_sources_per_entity": 1
|
||||
},
|
||||
"validation_rules": [
|
||||
"Prefer official company, investor relations, HKEX, or stock exchange sources.",
|
||||
"Record the reporting period attached to each source.",
|
||||
"Do not use unsourced market commentary as primary evidence."
|
||||
],
|
||||
"required_for_completion": true,
|
||||
"block_downstream_on_partial": true,
|
||||
"max_tool_iterations": 4,
|
||||
"constraints": [
|
||||
"Use only public pages.",
|
||||
"Do not require login or paid databases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"node_id": "extract_financial_metrics",
|
||||
"task": "Extract comparable financial metrics for MGM China Holdings and Galaxy Entertainment from the collected official sources. Include revenue or net revenue, adjusted EBITDA where available, net profit/loss where available, period, currency, unit, and source URL for each metric.",
|
||||
"skill_query": "financial metric extraction from official disclosures",
|
||||
"depends_on": ["collect_official_sources"],
|
||||
"requested_tools": ["web_fetch"],
|
||||
"required_evidence": ["output"],
|
||||
"evidence_contract": {
|
||||
"version": 1,
|
||||
"metrics": ["revenue", "adjusted_ebitda", "net_profit_or_loss"],
|
||||
"required_fields": ["company", "period", "metric", "value", "currency", "unit", "source_url"]
|
||||
},
|
||||
"validation_rules": [
|
||||
"Keep currencies and units explicit.",
|
||||
"Do not compare different reporting periods without labeling the mismatch.",
|
||||
"Mark unavailable metrics as unavailable instead of estimating them."
|
||||
],
|
||||
"required_for_completion": true,
|
||||
"block_downstream_on_partial": true,
|
||||
"max_tool_iterations": 2,
|
||||
"constraints": [
|
||||
"Use upstream official sources before searching for alternatives."
|
||||
]
|
||||
},
|
||||
{
|
||||
"node_id": "validate_metrics",
|
||||
"task": "Validate extracted metrics for source consistency, period alignment, currency/unit consistency, and obvious transcription errors. Produce a concise validation note and list any evidence gaps.",
|
||||
"skill_query": "finance metric validation",
|
||||
"depends_on": ["extract_financial_metrics"],
|
||||
"requested_tools": [],
|
||||
"required_evidence": ["output"],
|
||||
"evidence_contract": {
|
||||
"version": 1,
|
||||
"checks": ["source_consistency", "period_alignment", "currency_unit_consistency", "transcription_sanity"]
|
||||
},
|
||||
"validation_rules": [
|
||||
"Do not introduce new unsourced figures.",
|
||||
"If values are not comparable, explain why and preserve both values with labels."
|
||||
],
|
||||
"required_for_completion": true,
|
||||
"block_downstream_on_partial": true,
|
||||
"max_tool_iterations": 0,
|
||||
"constraints": [
|
||||
"No tools in this validation node; use upstream evidence only."
|
||||
]
|
||||
},
|
||||
{
|
||||
"node_id": "generate_chart_report",
|
||||
"task": "Generate the final Markdown comparison report. Include an executive summary, source-backed comparison table, chart-ready data table, optional Mermaid or text bar chart section, and caveats. Do not claim that a chart image, chart file, or saved artifact was generated.",
|
||||
"skill_query": "financial markdown report with chart-ready data",
|
||||
"depends_on": ["validate_metrics"],
|
||||
"requested_tools": [],
|
||||
"required_evidence": ["output"],
|
||||
"evidence_contract": {
|
||||
"version": 1,
|
||||
"outputs": ["comparison_table", "chart_ready_data", "markdown_report"],
|
||||
"forbidden_claims": ["generated_chart_image", "generated_chart_file", "saved_chart_artifact"]
|
||||
},
|
||||
"validation_rules": [
|
||||
"Every numeric claim must trace back to a source URL or be marked unavailable.",
|
||||
"Do not claim a generated image/file unless a registered chart renderer tool was actually used.",
|
||||
"Prefer Markdown tables and chart-ready data over unsupported rendering claims."
|
||||
],
|
||||
"required_for_completion": true,
|
||||
"block_downstream_on_partial": false,
|
||||
"max_tool_iterations": 0,
|
||||
"constraints": [
|
||||
"No chart renderer is assumed.",
|
||||
"No file/image artifact claims."
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## When to Use
|
||||
|
||||
- The user asks to compare MGM China and Galaxy Entertainment financial performance.
|
||||
- The user asks for a chart, chart-ready data, Markdown chart section, or board-style finance report.
|
||||
- The task requires source-backed public financial data rather than generic market commentary.
|
||||
|
||||
## Required Tools
|
||||
|
||||
- `web_search`
|
||||
- `web_fetch`
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Collect official sources first: company investor relations pages, annual/interim reports, results announcements, and exchange filings.
|
||||
2. Extract comparable metrics with period, currency, unit, and source URL.
|
||||
3. Validate that metrics are comparable before drawing conclusions.
|
||||
4. Produce a Markdown report with comparison table and chart-ready data.
|
||||
5. If a real chart renderer tool is unavailable, say so implicitly by providing chart-ready data; do not claim an image or file was created.
|
||||
|
||||
## Validation
|
||||
|
||||
- Confirm each company has at least one official source.
|
||||
- Confirm all numeric metrics carry period, currency, unit, and source URL.
|
||||
- Confirm the final report does not contain claims such as “saved chart image”, “generated chart file”, or “attached chart artifact”.
|
||||
|
||||
## Boundaries
|
||||
|
||||
- Do not use private, paid, or login-only sources.
|
||||
- Do not fabricate unavailable figures.
|
||||
- Do not use high-risk write, terminal, email, or external-send tools.
|
||||
- Do not create nested teams or role-based agents.
|
||||
- Do not claim chart rendering unless the runtime exposes and actually uses a registered chart-renderer tool.
|
||||
257
docs/superpowers/examples/steven-mgm-galaxy-team-demo.md
Normal file
257
docs/superpowers/examples/steven-mgm-galaxy-team-demo.md
Normal file
@ -0,0 +1,257 @@
|
||||
# Steven MGM/Galaxy Team Template Demo
|
||||
|
||||
## Target
|
||||
|
||||
Install `mgm-galaxy-financial-chart-report-safe` into Steven's Beaver workspace, then run one task that exercises:
|
||||
|
||||
```text
|
||||
Planner
|
||||
→ Skill Template selection
|
||||
→ ExecutionGraph / ExecutionNode adaptation
|
||||
→ Node Skill Binding
|
||||
→ Team execution
|
||||
→ Tool scope filtering
|
||||
→ Evidence gate
|
||||
→ Final synthesis complete/incomplete outcome
|
||||
```
|
||||
|
||||
## Current environment status observed by Codex
|
||||
|
||||
The repository contains Steven's instance metadata:
|
||||
|
||||
```text
|
||||
instance_id: steven
|
||||
container_name: app-instance-steven
|
||||
beaver_home: app-instance/runtime/instances/steven/beaver-home
|
||||
workspace: app-instance/runtime/instances/steven/beaver-home/workspace
|
||||
public_url: http://steven.172.19.0.245.nip.io:8088
|
||||
```
|
||||
|
||||
Codex could not directly apply the skill to the live Steven instance in this session because:
|
||||
|
||||
```text
|
||||
docker API: permission denied while connecting to /var/run/docker.sock
|
||||
Steven workspace/skills parent dir: owned by nobody:nogroup and not writable by current user
|
||||
local backend .venv: incomplete after uv environment rebuild; missing test/runtime dependencies
|
||||
```
|
||||
|
||||
So this runbook is the exact artifact to apply from a shell with Docker or filesystem permission.
|
||||
|
||||
## Install Skill into Steven workspace
|
||||
|
||||
From repository root, run as a user that can write Steven's workspace:
|
||||
|
||||
```bash
|
||||
SKILL_NAME=mgm-galaxy-financial-chart-report-safe
|
||||
WORKSPACE=app-instance/runtime/instances/steven/beaver-home/workspace
|
||||
SKILL_DIR="$WORKSPACE/skills/$SKILL_NAME"
|
||||
|
||||
mkdir -p "$SKILL_DIR/versions/v0001"
|
||||
cp docs/superpowers/examples/mgm-galaxy-financial-chart-report-safe.SKILL.md \
|
||||
"$SKILL_DIR/versions/v0001/SKILL.md"
|
||||
|
||||
python3 - <<'PY'
|
||||
import hashlib
|
||||
import json
|
||||
from pathlib import Path
|
||||
from datetime import datetime, timezone
|
||||
|
||||
skill_name = "mgm-galaxy-financial-chart-report-safe"
|
||||
workspace = Path("app-instance/runtime/instances/steven/beaver-home/workspace")
|
||||
skill_dir = workspace / "skills" / skill_name
|
||||
skill_md = skill_dir / "versions" / "v0001" / "SKILL.md"
|
||||
content = skill_md.read_text(encoding="utf-8")
|
||||
digest = "sha256:" + hashlib.sha256(content.encode("utf-8")).hexdigest()
|
||||
now = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
(skill_dir / "current.json").write_text(
|
||||
json.dumps({"current_version": "v0001"}, indent=2, ensure_ascii=False) + "\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
(skill_dir / "skill.json").write_text(
|
||||
json.dumps(
|
||||
{
|
||||
"name": skill_name,
|
||||
"display_name": "MGM/Galaxy Financial Chart Report Safe",
|
||||
"description": "Compare MGM China and Galaxy Entertainment using official financial sources, produce chart-ready Markdown, and avoid claiming generated chart image/file artifacts.",
|
||||
"created_at": now,
|
||||
"updated_at": now,
|
||||
"current_version": "v0001",
|
||||
"status": "active",
|
||||
"tags": ["finance", "research", "report", "chart-ready-data", "mgm", "galaxy"],
|
||||
"owners": ["steven"],
|
||||
"source_kind": "workspace",
|
||||
"lineage": [],
|
||||
},
|
||||
indent=2,
|
||||
ensure_ascii=False,
|
||||
)
|
||||
+ "\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
(skill_dir / "versions" / "v0001" / "version.json").write_text(
|
||||
json.dumps(
|
||||
{
|
||||
"skill_name": skill_name,
|
||||
"version": "v0001",
|
||||
"content_hash": digest,
|
||||
"summary_hash": digest,
|
||||
"created_at": now,
|
||||
"created_by": "steven",
|
||||
"change_reason": "Add real Skill Team Template example for MGM/Galaxy finance report demo",
|
||||
"parent_version": None,
|
||||
"review_state": "published",
|
||||
"frontmatter": {
|
||||
"name": skill_name,
|
||||
"description": "Compare MGM China and Galaxy Entertainment using official financial sources, produce chart-ready Markdown, and avoid claiming generated chart image/file artifacts.",
|
||||
"tools": ["web_search", "web_fetch"],
|
||||
},
|
||||
"summary": "MGM/Galaxy finance report skill with a task-only Beaver team template for official source collection, metric extraction, validation, and Markdown chart-ready reporting.",
|
||||
"tool_hints": ["web_search", "web_fetch"],
|
||||
"provenance": {"source_kind": "manual_demo", "target_instance": "steven"},
|
||||
"tree_hash": "",
|
||||
},
|
||||
indent=2,
|
||||
ensure_ascii=False,
|
||||
)
|
||||
+ "\n",
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
index_path = workspace / "skills" / "_index" / "published.json"
|
||||
index_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
try:
|
||||
payload = json.loads(index_path.read_text(encoding="utf-8"))
|
||||
except FileNotFoundError:
|
||||
payload = {"items": []}
|
||||
items = [str(item) for item in payload.get("items", [])]
|
||||
if skill_name not in items:
|
||||
items.append(skill_name)
|
||||
index_path.write_text(json.dumps({"items": items}, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")
|
||||
PY
|
||||
```
|
||||
|
||||
## Restart or start Steven container
|
||||
|
||||
If the container already exists:
|
||||
|
||||
```bash
|
||||
docker restart app-instance-steven
|
||||
```
|
||||
|
||||
If it does not exist, use the existing instance metadata and project scripts rather than creating a new instance id.
|
||||
|
||||
## Demo task prompt
|
||||
|
||||
Send this as Steven's user task:
|
||||
|
||||
```text
|
||||
Use the MGM/Galaxy finance report skill to compare MGM China Holdings and Galaxy Entertainment using official public financial disclosures. Produce a concise board-style Markdown report with source URLs, a comparison table, chart-ready data, and a text/Mermaid chart section. Do not claim a generated image or saved chart file.
|
||||
```
|
||||
|
||||
## Expected planning shape
|
||||
|
||||
The planner should produce a team DAG with these task nodes:
|
||||
|
||||
```text
|
||||
collect_official_sources
|
||||
→ extract_financial_metrics
|
||||
→ validate_metrics
|
||||
→ generate_chart_report
|
||||
```
|
||||
|
||||
Expected node constraints:
|
||||
|
||||
```text
|
||||
collect_official_sources.allowed_tool_names = ["web_search", "web_fetch"]
|
||||
extract_financial_metrics.allowed_tool_names = ["web_fetch"]
|
||||
validate_metrics.allowed_tool_names = []
|
||||
generate_chart_report.allowed_tool_names = []
|
||||
```
|
||||
|
||||
The created workers should remain generic:
|
||||
|
||||
```text
|
||||
node.agent.role = ""
|
||||
node.agent.metadata.sub_agent_kind = "generic_skill_worker"
|
||||
```
|
||||
|
||||
## Expected complete outcome
|
||||
|
||||
If source collection and extraction produce required evidence:
|
||||
|
||||
```text
|
||||
Planner
|
||||
→ TeamRunResult with required nodes completion_status=succeeded
|
||||
→ task_outcome=complete
|
||||
→ tool-free final synthesis
|
||||
→ final Markdown report
|
||||
```
|
||||
|
||||
The final output may include:
|
||||
|
||||
```text
|
||||
comparison table
|
||||
chart-ready data
|
||||
Mermaid
|
||||
Markdown chart section
|
||||
text bar chart fallback
|
||||
final textual report
|
||||
```
|
||||
|
||||
It must not claim:
|
||||
|
||||
```text
|
||||
generated chart image
|
||||
generated chart file
|
||||
saved chart artifact
|
||||
```
|
||||
|
||||
## Expected incomplete outcome
|
||||
|
||||
If official-source evidence is missing or web tools fail:
|
||||
|
||||
```text
|
||||
collect_official_sources.completion_status=partial
|
||||
→ evidence_gaps populated
|
||||
→ because block_downstream_on_partial=true, downstream nodes are blocked
|
||||
→ task_outcome=incomplete
|
||||
→ tool-free final synthesis still runs
|
||||
→ final answer is prefixed with an incomplete notice
|
||||
```
|
||||
|
||||
The final response should explain which required evidence was missing instead of fabricating metrics.
|
||||
|
||||
## Verification queries
|
||||
|
||||
After running the task, inspect Steven's event log:
|
||||
|
||||
```bash
|
||||
WORKSPACE=app-instance/runtime/instances/steven/beaver-home/workspace
|
||||
tail -n 200 "$WORKSPACE/tasks/events.jsonl"
|
||||
```
|
||||
|
||||
Look for:
|
||||
|
||||
```text
|
||||
task_execution_planned
|
||||
task_team_run_completed or task_team_run_failed
|
||||
task_synthesis_completed
|
||||
```
|
||||
|
||||
For `task_execution_planned`, verify:
|
||||
|
||||
```text
|
||||
planner_adaptation.template_used = true
|
||||
planner_adaptation.selected_template = mgm-galaxy-financial-chart-report-safe
|
||||
node_ids include collect_official_sources/extract_financial_metrics/validate_metrics/generate_chart_report
|
||||
```
|
||||
|
||||
For `task_synthesis_completed`, verify:
|
||||
|
||||
```text
|
||||
task_outcome = complete | incomplete
|
||||
incomplete_node_ids = [] for complete, otherwise populated
|
||||
```
|
||||
316
docs/superpowers/examples/steven_team_demo_harness.py
Normal file
316
docs/superpowers/examples/steven_team_demo_harness.py
Normal file
@ -0,0 +1,316 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
from dataclasses import asdict
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from beaver.engine import AgentLoop, EngineLoader
|
||||
from beaver.engine.context import SkillContext
|
||||
from beaver.engine.providers.base import LLMProvider, LLMResponse, ToolCallRequest
|
||||
from beaver.engine.providers.factory import ProviderBundle, build_provider_runtime
|
||||
from beaver.services.team_service import TeamService
|
||||
from beaver.skills.catalog.loader import SkillsLoader
|
||||
from beaver.skills.catalog.utils import strip_frontmatter
|
||||
from beaver.skills.drafts import DraftService
|
||||
from beaver.skills.specs import SkillSpecStore
|
||||
from beaver.tasks.attempt_orchestrator import TaskAttemptOrchestrator
|
||||
from beaver.tasks.models import TaskRecord
|
||||
from beaver.tasks.planner import TaskExecutionPlanner
|
||||
from beaver.tasks.skill_resolver import TaskSkillResolver
|
||||
|
||||
|
||||
WORKSPACE = Path("/root/.beaver/workspace")
|
||||
SKILL_NAME = "mgm-galaxy-financial-chart-report-safe"
|
||||
|
||||
|
||||
def _text_from_messages(messages: list[dict[str, Any]]) -> str:
|
||||
return "\n".join(str(message.get("content") or "") for message in messages)
|
||||
|
||||
|
||||
def _tool_names(tools: list[dict[str, Any]] | None) -> list[str]:
|
||||
names: list[str] = []
|
||||
for tool in tools or []:
|
||||
if "function" in tool:
|
||||
names.append(str(tool["function"].get("name") or ""))
|
||||
else:
|
||||
names.append(str(tool.get("name") or ""))
|
||||
return [name for name in names if name]
|
||||
|
||||
|
||||
class DemoProvider(LLMProvider):
|
||||
def __init__(self, *, collect_uses_tool: bool) -> None:
|
||||
super().__init__()
|
||||
self.collect_uses_tool = collect_uses_tool
|
||||
self.calls: list[dict[str, Any]] = []
|
||||
|
||||
async def chat(
|
||||
self,
|
||||
messages: list[dict[str, Any]],
|
||||
tools: list[dict[str, Any]] | None = None,
|
||||
model: str | None = None,
|
||||
max_tokens: int | None = None,
|
||||
temperature: float = 0.0,
|
||||
thinking_enabled: bool | None = None,
|
||||
) -> LLMResponse:
|
||||
text = _text_from_messages(messages)
|
||||
names = _tool_names(tools)
|
||||
self.calls.append(
|
||||
{
|
||||
"tool_names": names,
|
||||
"has_tool_result": any(message.get("role") == "tool" for message in messages),
|
||||
"text_preview": text[:300],
|
||||
}
|
||||
)
|
||||
|
||||
if "You choose whether an internal Beaver Task attempt" in text:
|
||||
return LLMResponse(
|
||||
content=json.dumps(_planner_json(), ensure_ascii=False),
|
||||
provider_name="demo",
|
||||
model="demo-model",
|
||||
)
|
||||
|
||||
if "You select Beaver skills for a single run" in text:
|
||||
return LLMResponse(content="[]", provider_name="demo", model="demo-model")
|
||||
|
||||
if "team:generate_chart_report" in text:
|
||||
return LLMResponse(
|
||||
content=(
|
||||
"# MGM China vs Galaxy Entertainment Demo Report\n\n"
|
||||
"| Company | Metric | Value | Source |\n"
|
||||
"|---|---:|---:|---|\n"
|
||||
"| MGM China | Revenue | demo value | upstream source |\n"
|
||||
"| Galaxy Entertainment | Revenue | demo value | upstream source |\n\n"
|
||||
"Chart-ready data is provided as Markdown. No image or saved chart file was generated."
|
||||
),
|
||||
provider_name="demo",
|
||||
model="demo-model",
|
||||
)
|
||||
|
||||
if "team:validate_metrics" in text:
|
||||
return LLMResponse(
|
||||
content="Validation complete: periods and units are labeled; no generated chart artifact is claimed.",
|
||||
provider_name="demo",
|
||||
model="demo-model",
|
||||
)
|
||||
|
||||
if "team:extract_financial_metrics" in text:
|
||||
return LLMResponse(
|
||||
content=(
|
||||
"Extracted demo metric table: MGM China revenue: source-backed placeholder; "
|
||||
"Galaxy Entertainment revenue: source-backed placeholder. Currency, period, and source URL fields are labeled."
|
||||
),
|
||||
provider_name="demo",
|
||||
model="demo-model",
|
||||
)
|
||||
|
||||
if "team:collect_official_sources" in text:
|
||||
if self.collect_uses_tool and "web_fetch" in names and not any(message.get("role") == "tool" for message in messages):
|
||||
return LLMResponse(
|
||||
content=None,
|
||||
tool_calls=[
|
||||
ToolCallRequest(
|
||||
id="call_collect_fetch",
|
||||
name="web_fetch",
|
||||
arguments={
|
||||
"url": "https://www.bing.com/search?q=MGM+China+Galaxy+Entertainment+annual+report",
|
||||
"max_chars": 1000,
|
||||
},
|
||||
)
|
||||
],
|
||||
finish_reason="tool_calls",
|
||||
provider_name="demo",
|
||||
model="demo-model",
|
||||
)
|
||||
return LLMResponse(
|
||||
content=(
|
||||
"Collected official-source candidates for MGM China Holdings and Galaxy Entertainment. "
|
||||
"Demo evidence includes a successful web_fetch tool result with URL captured by Beaver."
|
||||
),
|
||||
provider_name="demo",
|
||||
model="demo-model",
|
||||
)
|
||||
|
||||
return LLMResponse(content="Demo final synthesis.", provider_name="demo", model="demo-model")
|
||||
|
||||
def get_default_model(self) -> str:
|
||||
return "demo-model"
|
||||
|
||||
|
||||
def _planner_json() -> dict[str, Any]:
|
||||
return {
|
||||
"mode": "team",
|
||||
"reason": "finance comparison benefits from staged source collection, extraction, validation, and reporting",
|
||||
"strategy": "dag",
|
||||
"nodes": [
|
||||
{
|
||||
"node_id": "collect_official_sources",
|
||||
"task": "Collect official MGM China Holdings and Galaxy Entertainment financial disclosure sources for the requested period. Prefer annual reports, interim reports, results announcements, investor relations pages, and exchange filings. Return source URLs with short notes about period coverage.",
|
||||
"use_skill": "web-operation",
|
||||
"skill_query": "official financial disclosure web research",
|
||||
"depends_on": [],
|
||||
"requested_tools": ["web_search", "web_fetch"],
|
||||
"required_evidence": ["tool_result", "url"],
|
||||
"evidence_contract": {"version": 1, "entities": ["MGM China Holdings", "Galaxy Entertainment Group"]},
|
||||
"required_for_completion": True,
|
||||
"block_downstream_on_partial": True,
|
||||
"max_tool_iterations": 2,
|
||||
},
|
||||
{
|
||||
"node_id": "extract_financial_metrics",
|
||||
"task": "Extract comparable financial metrics for MGM China Holdings and Galaxy Entertainment from the collected official sources. Include revenue or net revenue, adjusted EBITDA where available, net profit/loss where available, period, currency, unit, and source URL for each metric.",
|
||||
"use_skill": "web-operation",
|
||||
"skill_query": "financial metric extraction from official disclosures",
|
||||
"depends_on": ["collect_official_sources"],
|
||||
"requested_tools": ["web_fetch"],
|
||||
"required_evidence": ["output"],
|
||||
"evidence_contract": {"version": 1, "metrics": ["revenue", "adjusted_ebitda", "net_profit_or_loss"]},
|
||||
"required_for_completion": True,
|
||||
"block_downstream_on_partial": True,
|
||||
"max_tool_iterations": 1,
|
||||
},
|
||||
{
|
||||
"node_id": "validate_metrics",
|
||||
"task": "Validate extracted metrics for source consistency, period alignment, currency/unit consistency, and obvious transcription errors. Produce a concise validation note and list any evidence gaps.",
|
||||
"use_skill": "utility-tools",
|
||||
"skill_query": "finance metric validation",
|
||||
"depends_on": ["extract_financial_metrics"],
|
||||
"requested_tools": [],
|
||||
"required_evidence": ["output"],
|
||||
"evidence_contract": {"version": 1, "checks": ["source_consistency", "period_alignment"]},
|
||||
"required_for_completion": True,
|
||||
"block_downstream_on_partial": True,
|
||||
"max_tool_iterations": 0,
|
||||
},
|
||||
{
|
||||
"node_id": "generate_chart_report",
|
||||
"task": "Generate the final Markdown comparison report. Include an executive summary, source-backed comparison table, chart-ready data table, optional Mermaid or text bar chart section, and caveats. Do not claim that a chart image, chart file, or saved artifact was generated.",
|
||||
"use_skill": "utility-tools",
|
||||
"skill_query": "financial markdown report with chart-ready data",
|
||||
"depends_on": ["validate_metrics"],
|
||||
"requested_tools": [],
|
||||
"required_evidence": ["output"],
|
||||
"evidence_contract": {"version": 1, "outputs": ["comparison_table", "chart_ready_data", "markdown_report"]},
|
||||
"required_for_completion": True,
|
||||
"block_downstream_on_partial": False,
|
||||
"max_tool_iterations": 0,
|
||||
},
|
||||
],
|
||||
"adaptation": {"template_used": True},
|
||||
"final_synthesis_instruction": "Synthesize node outputs into a concise Markdown finance report.",
|
||||
}
|
||||
|
||||
|
||||
def _task() -> TaskRecord:
|
||||
return TaskRecord(
|
||||
task_id="demo-task-mgm-galaxy",
|
||||
session_id="web:demo-mgm-galaxy-harness",
|
||||
description="Compare MGM China and Galaxy Entertainment using official public financial disclosures.",
|
||||
goal="Compare MGM China and Galaxy Entertainment using official public financial disclosures.",
|
||||
constraints=[],
|
||||
priority=0,
|
||||
status="open",
|
||||
creator="demo",
|
||||
created_at="demo",
|
||||
updated_at="demo",
|
||||
)
|
||||
|
||||
|
||||
def _finance_skill_context(loader: SkillsLoader) -> SkillContext:
|
||||
record = loader.get_skill_record(SKILL_NAME)
|
||||
raw = loader.load_published_skill(SKILL_NAME)
|
||||
if record is None or raw is None:
|
||||
raise RuntimeError(f"missing published skill: {SKILL_NAME}")
|
||||
return SkillContext(
|
||||
name=record.name,
|
||||
version=record.version,
|
||||
content=strip_frontmatter(raw).strip(),
|
||||
content_hash=record.content_hash or "",
|
||||
activation_reason="demo_exact_skill",
|
||||
tool_hints=list(record.tool_hints),
|
||||
team_template=record.team_template,
|
||||
team_template_warnings=list(record.team_template_warnings),
|
||||
)
|
||||
|
||||
|
||||
async def _run_case(*, collect_uses_tool: bool) -> dict[str, Any]:
|
||||
loader = SkillsLoader(WORKSPACE)
|
||||
store = SkillSpecStore(WORKSPACE)
|
||||
runtime = build_provider_runtime(model="demo-model", provider_name="custom", api_key="demo", api_base="http://demo.invalid/v1")
|
||||
provider = DemoProvider(collect_uses_tool=collect_uses_tool)
|
||||
bundle = ProviderBundle(main_runtime=runtime, main_provider=provider)
|
||||
engine_loader = EngineLoader(workspace=WORKSPACE)
|
||||
loop = AgentLoop(loader=engine_loader)
|
||||
loaded = loop.boot()
|
||||
resolver = TaskSkillResolver(skills_loader=loader, draft_service=DraftService(store))
|
||||
planner = TaskExecutionPlanner(task_skill_resolver=resolver, tool_registry=loaded.tool_registry)
|
||||
task = _task()
|
||||
skill_context = _finance_skill_context(loader)
|
||||
plan = await planner.plan(
|
||||
task=task,
|
||||
user_message=task.description,
|
||||
attempt_index=1,
|
||||
provider_bundle=bundle,
|
||||
activated_skills=[skill_context],
|
||||
timeout_seconds=5.0,
|
||||
)
|
||||
team_result = None
|
||||
if plan.is_team:
|
||||
team_result = await TeamService(loop).run_team(
|
||||
plan.graph,
|
||||
parent_task_id=None,
|
||||
parent_session_id=task.session_id,
|
||||
provider_bundle_factory=lambda node: bundle,
|
||||
inherited_pinned_skill_contexts=[skill_context],
|
||||
)
|
||||
context, prefix, metadata = TaskAttemptOrchestrator._team_synthesis_outcome(plan, team_result, prompt_locale="en")
|
||||
return {
|
||||
"case": "complete" if collect_uses_tool else "incomplete",
|
||||
"plan_mode": plan.mode,
|
||||
"plan_reason": plan.reason,
|
||||
"planner_adaptation": plan.planner_adaptation,
|
||||
"node_ids": [node.node_id for node in plan.graph.nodes] if plan.graph else [],
|
||||
"node_tool_scopes": {node.node_id: node.allowed_tool_names for node in plan.graph.nodes} if plan.graph else {},
|
||||
"node_skill_bindings": [
|
||||
{
|
||||
"node_id": node.node_id,
|
||||
"pinned_skill_names": node.inherited_pinned_skills,
|
||||
"pinned_skill_contexts": [skill.name for skill in node.inherited_pinned_skill_contexts],
|
||||
"role": node.agent.role,
|
||||
"sub_agent_kind": node.agent.metadata.get("sub_agent_kind"),
|
||||
"exact_binding_used": node.agent.metadata.get("exact_binding_used"),
|
||||
}
|
||||
for node in (plan.graph.nodes if plan.graph else [])
|
||||
],
|
||||
"team_success": team_result.success if team_result else None,
|
||||
"team_summary": team_result.summary if team_result else None,
|
||||
"team_run_ids": team_result.run_ids if team_result else [],
|
||||
"node_results": [
|
||||
{
|
||||
"node_id": result.node_id,
|
||||
"success": result.success,
|
||||
"completion_status": result.completion_status,
|
||||
"finish_reason": result.finish_reason,
|
||||
"evidence_gaps": result.evidence_gaps,
|
||||
"output_preview": result.output_text[:180],
|
||||
}
|
||||
for result in (team_result.node_results if team_result else [])
|
||||
],
|
||||
"synthesis_metadata": metadata,
|
||||
"incomplete_prefix_present": bool(prefix),
|
||||
"outcome_context_preview": context[:600],
|
||||
"provider_calls": provider.calls,
|
||||
}
|
||||
|
||||
|
||||
async def main() -> None:
|
||||
results = [
|
||||
await _run_case(collect_uses_tool=True),
|
||||
await _run_case(collect_uses_tool=False),
|
||||
]
|
||||
print(json.dumps(results, ensure_ascii=False, indent=2, default=str))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
@ -0,0 +1,531 @@
|
||||
# Template-Guided Team Routing Implementation Plan
|
||||
|
||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
||||
|
||||
**Goal:** Let a root Main Agent choose Team execution on its first provider response whenever an activated Skill supplies a valid Team template, while preserving an intentional zero-extra-round single-agent path.
|
||||
|
||||
**Architecture:** Keep `ExecutionGraph`, `ExecutionNode`, `LocalAgentRunner`, and `run_agent_team` unchanged. Add a small Main-Agent routing state inside `AgentLoop`: it selects the first valid activated template, adds compact first-turn guidance, classifies the first provider response as `team` or `single`, persists a structured mode event, and prevents a later mid-run Team switch after single-agent work starts. Project that event into the existing Task process stream; no frontend work is included.
|
||||
|
||||
**Tech Stack:** Python 3.12, asyncio, dataclasses, pytest, existing `AgentLoop`, session event store, process projector, and Team tool runtime.
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
- `app-instance/backend/beaver/engine/loop.py`: primary-template selection, first-turn guidance, mode classification/lock, tool-call filtering, and persistent routing event.
|
||||
- `app-instance/backend/beaver/services/process_service.py`: project the routing event into the existing task process stream.
|
||||
- `app-instance/backend/tests/unit/test_agent_loop.py`: Main-Agent prompt, first-turn Team, first-turn Single, mixed-call, and no-template regression tests.
|
||||
- `app-instance/backend/tests/unit/test_process_projection.py`: routing-event projection test.
|
||||
|
||||
No changes to Planner, Team scheduler/runtime, ToolAssembler, ToolExecutor, evidence gate, final synthesis gate, frontend, or Skill learning are required.
|
||||
|
||||
### Task 1: Select a Primary Template and Make First-Turn Routing Explicit
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `app-instance/backend/beaver/engine/loop.py`
|
||||
- Modify: `app-instance/backend/tests/unit/test_agent_loop.py`
|
||||
|
||||
- [ ] **Step 1: Add a sequenced provider and a valid template fixture to the AgentLoop test module**
|
||||
|
||||
Add imports for `SkillContext` and `ToolCall`, then add a provider that captures the system prompt and returns supplied responses in sequence:
|
||||
|
||||
```python
|
||||
class SequencedProvider(LLMProvider):
|
||||
def __init__(self, responses: list[LLMResponse]) -> None:
|
||||
super().__init__()
|
||||
self.responses = list(responses)
|
||||
self.calls: list[dict[str, Any]] = []
|
||||
|
||||
async def chat(self, messages: list[dict], tools: list[dict] | None = None, **_: Any) -> LLMResponse:
|
||||
self.calls.append({"messages": messages, "tools": tools})
|
||||
return self.responses.pop(0)
|
||||
|
||||
def get_default_model(self) -> str:
|
||||
return "stub-model"
|
||||
|
||||
|
||||
def _team_template_skill(name: str = "finance-report") -> SkillContext:
|
||||
return SkillContext(
|
||||
name=name,
|
||||
content="# Finance report",
|
||||
team_template={
|
||||
"version": 1,
|
||||
"strategy": "dag",
|
||||
"nodes": [{"node_id": "collect", "task": "Collect official sources"}],
|
||||
},
|
||||
)
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Write failing first-turn guidance and deterministic-primary tests**
|
||||
|
||||
```python
|
||||
def test_root_task_with_template_adds_first_turn_team_routing_guidance(tmp_path) -> None:
|
||||
provider = RecordingProvider()
|
||||
loop = AgentLoop(loader=EngineLoader(workspace=tmp_path))
|
||||
|
||||
asyncio.run(loop.process_direct(
|
||||
"compare financial reports",
|
||||
session_id="session",
|
||||
task_id="task-1",
|
||||
task_mode=True,
|
||||
pinned_skill_contexts=[_team_template_skill(), _team_template_skill("ignored")],
|
||||
provider_bundle=_bundle(provider),
|
||||
))
|
||||
|
||||
system_content = "\n".join(
|
||||
str(message["content"])
|
||||
for message in provider.messages_by_call[0]
|
||||
if message["role"] == "system"
|
||||
)
|
||||
assert "choose one execution path in this first response" in system_content
|
||||
assert "run_agent_team" in system_content
|
||||
assert '"skill_name":"finance-report"' in system_content
|
||||
assert "ignored" not in system_content
|
||||
|
||||
|
||||
def test_empty_template_nodes_do_not_enable_first_turn_team_routing(tmp_path) -> None:
|
||||
provider = RecordingProvider()
|
||||
loop = AgentLoop(loader=EngineLoader(workspace=tmp_path))
|
||||
empty = SkillContext(name="empty", content="# Empty", team_template={"nodes": []})
|
||||
|
||||
asyncio.run(loop.process_direct(
|
||||
"single lookup",
|
||||
session_id="session",
|
||||
task_id="task-1",
|
||||
task_mode=True,
|
||||
pinned_skill_contexts=[empty],
|
||||
provider_bundle=_bundle(provider),
|
||||
))
|
||||
|
||||
assert "choose one execution path in this first response" not in provider.system_prompts[0]
|
||||
```
|
||||
|
||||
Extend `RecordingProvider` to retain `messages_by_call` and `system_prompts`, instead of creating a second nearly-identical fixture.
|
||||
|
||||
- [ ] **Step 3: Run the focused tests to verify failure**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
cd app-instance/backend && uv run pytest tests/unit/test_agent_loop.py -q
|
||||
```
|
||||
|
||||
Expected: FAIL because no Main-Agent template selector or first-turn routing guidance exists.
|
||||
|
||||
- [ ] **Step 4: Add a private, immutable routing-selection value and selector in `loop.py`**
|
||||
|
||||
Place this near `AgentRunResult`:
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True, slots=True)
|
||||
class _TeamTemplateRouting:
|
||||
skill_name: str
|
||||
template: dict[str, Any]
|
||||
ignored_skill_names: tuple[str, ...] = ()
|
||||
|
||||
|
||||
def _select_main_agent_team_template(
|
||||
activated_skills: list[SkillContext],
|
||||
) -> _TeamTemplateRouting | None:
|
||||
candidates = [
|
||||
skill
|
||||
for skill in activated_skills
|
||||
if isinstance(skill.team_template, dict)
|
||||
and isinstance(skill.team_template.get("nodes"), list)
|
||||
and bool(skill.team_template["nodes"])
|
||||
]
|
||||
if not candidates:
|
||||
return None
|
||||
return _TeamTemplateRouting(
|
||||
skill_name=candidates[0].name,
|
||||
template=dict(candidates[0].team_template or {}),
|
||||
ignored_skill_names=tuple(skill.name for skill in candidates[1:]),
|
||||
)
|
||||
```
|
||||
|
||||
This intentionally mirrors, but does not alter, `TaskExecutionPlanner._select_team_template()`: planner adaptation metadata and Main-Agent first-turn routing have different lifecycles. Do not move the helper into Planner or use Planner as a runtime dependency.
|
||||
|
||||
- [ ] **Step 5: Build compact guidance only when a root Task can actually invoke the Team tool**
|
||||
|
||||
Replace the static-only Team section with a helper that accepts the routing value:
|
||||
|
||||
```python
|
||||
@staticmethod
|
||||
def _team_template_routing_prompt(routing: _TeamTemplateRouting) -> str:
|
||||
template_payload = json.dumps(
|
||||
{"skill_name": routing.skill_name, "template": routing.template},
|
||||
ensure_ascii=False,
|
||||
separators=(",", ":"),
|
||||
)
|
||||
return (
|
||||
"# Task Agent Team Routing\n\n"
|
||||
"An active Skill provides this primary Team template:\n"
|
||||
f"{template_payload}\n\n"
|
||||
"Before beginning ordinary work, choose one execution path in this first response. "
|
||||
"For staged collection, extraction, validation, comparison, research, or reporting represented "
|
||||
"by this template, call `run_agent_team` now using task-only nodes derived from it. "
|
||||
"Choose single-agent execution only for a plainly one-step request, an explicit request not to "
|
||||
"delegate, or a template that does not fit the immediate request. Do not call ordinary tools "
|
||||
"before this choice. If choosing single-agent execution, call ordinary tools or answer normally "
|
||||
"without explaining the routing choice."
|
||||
)
|
||||
```
|
||||
|
||||
In `_process_direct_impl()`, calculate the value after activated Skills are resolved. Pass it into `_extra_guidance_sections()` only when all are true:
|
||||
|
||||
```python
|
||||
is_root_task = task_mode and not parent_session_id and not str(source or "").startswith("team:")
|
||||
team_tool_available = any(spec.name == AGENT_TEAM_TOOL_NAME for spec in selected_tool_specs)
|
||||
routing_template = _select_main_agent_team_template(activated_skills)
|
||||
routing_enabled = is_root_task and team_tool_available and routing_template is not None
|
||||
```
|
||||
|
||||
Keep `TASK_AGENT_TEAM_CAPABILITY_PROMPT` for ordinary root Task capability exposure. Do not add guidance for empty/invalid templates, child Team nodes, non-Task runs, or when `run_agent_team` is absent.
|
||||
|
||||
- [ ] **Step 6: Run the focused tests to verify they pass**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
cd app-instance/backend && uv run pytest tests/unit/test_agent_loop.py -q
|
||||
```
|
||||
|
||||
Expected: PASS, including existing root-Team-tool visibility coverage.
|
||||
|
||||
### Task 2: Lock First-Turn Mode and Persist the Machine-Readable Decision
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `app-instance/backend/beaver/engine/loop.py`
|
||||
- Modify: `app-instance/backend/tests/unit/test_agent_loop.py`
|
||||
|
||||
- [ ] **Step 1: Write failing Team, Single, mixed-call, and legacy behavior tests**
|
||||
|
||||
Use `ToolCall` objects in a `SequencedProvider`; use the normal registered `run_agent_team` only with a `tool_executor_override` stub so the test checks AgentLoop routing without starting a real Team.
|
||||
|
||||
```python
|
||||
def test_first_turn_agent_team_call_records_team_mode_and_executes_only_team(tmp_path) -> None:
|
||||
provider = SequencedProvider([
|
||||
LLMResponse(
|
||||
content="",
|
||||
tool_calls=[
|
||||
ToolCall(id="team", name="run_agent_team", arguments={"nodes": [{"node_id": "collect", "task": "Collect"}]}),
|
||||
ToolCall(id="search", name="web_search", arguments={"query": "must not run"}),
|
||||
],
|
||||
provider_name="stub",
|
||||
model="stub-model",
|
||||
),
|
||||
LLMResponse(content="done", provider_name="stub", model="stub-model"),
|
||||
])
|
||||
executor = CapturingToolExecutor()
|
||||
loop = AgentLoop(loader=EngineLoader(workspace=tmp_path))
|
||||
|
||||
asyncio.run(loop.process_direct(
|
||||
"compare finance reports",
|
||||
session_id="session",
|
||||
task_id="task-1",
|
||||
task_mode=True,
|
||||
pinned_skill_contexts=[_team_template_skill()],
|
||||
provider_bundle=_bundle(provider),
|
||||
tool_executor_override=executor,
|
||||
))
|
||||
|
||||
assert [call.name for call in executor.calls] == ["run_agent_team"]
|
||||
decision = _event_payload(loop, "session", "execution_mode_selected")
|
||||
assert decision == {
|
||||
"task_id": "task-1",
|
||||
"execution_mode": "team",
|
||||
"routing_source": "main_agent_first_turn",
|
||||
"primary_template_skill": "finance-report",
|
||||
"ignored_template_skills": [],
|
||||
}
|
||||
|
||||
|
||||
def test_first_turn_ordinary_tool_records_single_and_blocks_later_team_call(tmp_path) -> None:
|
||||
provider = SequencedProvider([
|
||||
LLMResponse(
|
||||
content="",
|
||||
tool_calls=[ToolCall(id="search", name="web_search", arguments={"query": "one step"})],
|
||||
provider_name="stub",
|
||||
model="stub-model",
|
||||
),
|
||||
LLMResponse(
|
||||
content="",
|
||||
tool_calls=[ToolCall(id="team", name="run_agent_team", arguments={"nodes": [{"node_id": "late", "task": "Late"}]})],
|
||||
provider_name="stub",
|
||||
model="stub-model",
|
||||
),
|
||||
LLMResponse(content="done", provider_name="stub", model="stub-model"),
|
||||
])
|
||||
executor = CapturingToolExecutor()
|
||||
loop = AgentLoop(loader=EngineLoader(workspace=tmp_path))
|
||||
|
||||
asyncio.run(loop.process_direct(
|
||||
"one-step lookup",
|
||||
session_id="session",
|
||||
task_id="task-1",
|
||||
task_mode=True,
|
||||
pinned_skill_contexts=[_team_template_skill()],
|
||||
provider_bundle=_bundle(provider),
|
||||
tool_executor_override=executor,
|
||||
))
|
||||
|
||||
assert [call.name for call in executor.calls] == ["web_search"]
|
||||
assert "run_agent_team" not in provider.tool_names_by_call[1]
|
||||
late_result = _tool_result_by_call_id(loop, "session", "team")
|
||||
assert late_result["error"] == "execution_mode_locked_single"
|
||||
assert _event_payload(loop, "session", "execution_mode_selected")["execution_mode"] == "single"
|
||||
```
|
||||
|
||||
Also assert that a root Task with no template keeps `run_agent_team` in every provider schema, preserving legacy behavior.
|
||||
|
||||
- [ ] **Step 2: Run the test module to verify failure**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
cd app-instance/backend && uv run pytest tests/unit/test_agent_loop.py -q
|
||||
```
|
||||
|
||||
Expected: FAIL because AgentLoop has no decision event, no per-run mode state, and executes mixed/later Team calls normally.
|
||||
|
||||
- [ ] **Step 3: Add mode state and first-response classification immediately after the provider response**
|
||||
|
||||
Before the `while True` loop set:
|
||||
|
||||
```python
|
||||
routing_mode: str | None = None
|
||||
```
|
||||
|
||||
After `response = await provider.chat(**chat_kwargs)` and before serializing/appending the assistant message, classify only once when `routing_enabled` is true:
|
||||
|
||||
```python
|
||||
if routing_enabled and routing_mode is None:
|
||||
tool_names = {self._tool_call_name(tool_call) for tool_call in response.tool_calls}
|
||||
routing_mode = "team" if AGENT_TEAM_TOOL_NAME in tool_names else "single"
|
||||
append_message(
|
||||
resolved_session_id,
|
||||
run_id=resolved_run_id,
|
||||
role="system",
|
||||
event_type="execution_mode_selected",
|
||||
event_payload={
|
||||
"task_id": task_id,
|
||||
"attempt_index": attempt_index,
|
||||
"execution_mode": routing_mode,
|
||||
"routing_source": "main_agent_first_turn",
|
||||
"primary_template_skill": routing_template.skill_name,
|
||||
"ignored_template_skills": list(routing_template.ignored_skill_names),
|
||||
},
|
||||
content=None,
|
||||
context_visible=False,
|
||||
source=source,
|
||||
title=title,
|
||||
model=final_model,
|
||||
user_id=user_id,
|
||||
)
|
||||
```
|
||||
|
||||
Do not write this event for runs without `routing_enabled`. A no-tool first response selects `single` before the normal final-answer branch.
|
||||
|
||||
- [ ] **Step 4: Apply the no-mixed-mode and single-lock behavior at the call boundary**
|
||||
|
||||
Add two private helpers:
|
||||
|
||||
```python
|
||||
@staticmethod
|
||||
def _calls_for_execution_mode(tool_calls: list[Any], routing_mode: str | None) -> list[Any]:
|
||||
if routing_mode != "team":
|
||||
return list(tool_calls)
|
||||
return [call for call in tool_calls if AgentLoop._tool_call_name(call) == AGENT_TEAM_TOOL_NAME]
|
||||
|
||||
|
||||
@staticmethod
|
||||
def _team_locked_result(tool_call: Any) -> ToolResult:
|
||||
return ToolResult(
|
||||
success=False,
|
||||
content="Agent Team can only be selected in the first response of this Task run.",
|
||||
tool_name=AGENT_TEAM_TOOL_NAME,
|
||||
error="execution_mode_locked_single",
|
||||
)
|
||||
```
|
||||
|
||||
Then use these rules in the loop:
|
||||
|
||||
1. If first response selected `team`, serialize and execute only `run_agent_team`; ordinary calls from that response receive no execution.
|
||||
2. If `routing_mode == "single"` and the current iteration is after the first response, remove `run_agent_team` from `chat_kwargs["tools"]` before calling the provider.
|
||||
3. If a later response nevertheless emits `run_agent_team`, do not call the executor. Add `_team_locked_result()` through the same `tool_result_recorded` and context-builder paths as ordinary tool failures.
|
||||
4. Preserve the normal concurrent-execution decision for the remaining executable calls.
|
||||
|
||||
Keep original tool schemas and ToolExecutor behavior unchanged for no-template runs. Do not alter `allowed_tool_names` behavior or use it as a source of tools.
|
||||
|
||||
- [ ] **Step 5: Run focused AgentLoop tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
cd app-instance/backend && uv run pytest tests/unit/test_agent_loop.py -q
|
||||
```
|
||||
|
||||
Expected: PASS. The test verifies no extra provider call is made solely for mode selection, mixed first-turn calls execute only Team, and late Team calls are rejected after Single mode.
|
||||
|
||||
### Task 3: Project Routing Decisions into the Task Process Stream
|
||||
|
||||
**Files:**
|
||||
|
||||
- Modify: `app-instance/backend/beaver/services/process_service.py`
|
||||
- Modify: `app-instance/backend/tests/unit/test_process_projection.py`
|
||||
|
||||
- [ ] **Step 1: Write a failing process-projection test**
|
||||
|
||||
```python
|
||||
def test_process_projection_maps_main_agent_execution_mode_selection(tmp_path: Path) -> None:
|
||||
session = SessionManager(tmp_path)
|
||||
run_store = RunMemoryStore(tmp_path / "memory" / "runs")
|
||||
session.append_message(
|
||||
"web:test",
|
||||
run_id="main-run",
|
||||
role="system",
|
||||
event_type="execution_mode_selected",
|
||||
event_payload={
|
||||
"task_id": "task-1",
|
||||
"attempt_index": 1,
|
||||
"execution_mode": "team",
|
||||
"routing_source": "main_agent_first_turn",
|
||||
"primary_template_skill": "finance-report",
|
||||
"ignored_template_skills": ["secondary-template"],
|
||||
},
|
||||
context_visible=False,
|
||||
)
|
||||
|
||||
projection = SessionProcessProjector(session, run_store).project("web:test")
|
||||
|
||||
event = next(item for item in projection["events"] if item["kind"] == "execution_mode_selected")
|
||||
assert event["status"] == "done"
|
||||
assert event["metadata"]["execution_mode"] == "team"
|
||||
assert event["metadata"]["primary_template_skill"] == "finance-report"
|
||||
assert event["metadata"]["ignored_template_skills"] == ["secondary-template"]
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run it to verify failure**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
cd app-instance/backend && uv run pytest tests/unit/test_process_projection.py -q
|
||||
```
|
||||
|
||||
Expected: FAIL with `StopIteration`, because the projector ignores `execution_mode_selected`.
|
||||
|
||||
- [ ] **Step 3: Add a narrow event branch in `SessionProcessProjector.project()`**
|
||||
|
||||
Place the branch after `skill_activation_snapshotted` and before Team-completion handling:
|
||||
|
||||
```python
|
||||
elif record.event_type == "execution_mode_selected":
|
||||
run_id = record.run_id or root_run_id
|
||||
parent_run_id = root_run_id if run_id != root_run_id else None
|
||||
mode = str(payload.get("execution_mode") or "single")
|
||||
add_event(
|
||||
event_id=_event_id(record, "execution-mode"),
|
||||
run_id=str(run_id),
|
||||
parent_run_id=parent_run_id,
|
||||
kind="execution_mode_selected",
|
||||
actor_type="system",
|
||||
actor_id="main-agent-router",
|
||||
actor_name="Main Agent",
|
||||
text="Main Agent selected Team execution." if mode == "team" else "Main Agent selected single-agent execution.",
|
||||
created_at=created_at,
|
||||
status="done",
|
||||
metadata={
|
||||
**dict(payload),
|
||||
"task_id": task_id,
|
||||
"attempt_index": attempt_index,
|
||||
"timeline_type": "execution_mode",
|
||||
},
|
||||
)
|
||||
```
|
||||
|
||||
Do not add frontend rendering in this task. The projected event is enough for the existing API/process payload and future UI work.
|
||||
|
||||
- [ ] **Step 4: Run focused projection tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
cd app-instance/backend && uv run pytest tests/unit/test_process_projection.py -q
|
||||
```
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
### Task 4: Regression Verification and Steven Docker Acceptance
|
||||
|
||||
**Files:**
|
||||
|
||||
- No new production files.
|
||||
- Modify only test fixtures/assertions from Tasks 1–3 if a compatibility issue is exposed.
|
||||
|
||||
- [ ] **Step 1: Run all directly affected unit tests**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
cd app-instance/backend && uv run pytest \
|
||||
tests/unit/test_agent_loop.py \
|
||||
tests/unit/test_process_projection.py \
|
||||
tests/unit/test_team_node_tool_policy.py \
|
||||
tests/unit/test_task_execution_planner.py \
|
||||
tests/unit/test_task_team_synthesis_outcome.py \
|
||||
-q
|
||||
```
|
||||
|
||||
Expected: PASS. Do not change tests outside this feature to accommodate unrelated Python/TestClient cleanup behavior.
|
||||
|
||||
- [ ] **Step 2: Verify static quality for the scoped diff**
|
||||
|
||||
Run:
|
||||
|
||||
```bash
|
||||
git diff --check -- \
|
||||
app-instance/backend/beaver/engine/loop.py \
|
||||
app-instance/backend/beaver/services/process_service.py \
|
||||
app-instance/backend/tests/unit/test_agent_loop.py \
|
||||
app-instance/backend/tests/unit/test_process_projection.py
|
||||
```
|
||||
|
||||
Expected: no output and exit status 0.
|
||||
|
||||
- [ ] **Step 3: Deploy only after local tests pass and verify the real MGM/Galaxy route**
|
||||
|
||||
Run the established Steven deployment procedure:
|
||||
|
||||
```bash
|
||||
docker cp app-instance/backend/beaver app-instance-steven:/opt/app/backend/
|
||||
docker cp app-instance/backend/pyproject.toml app-instance-steven:/opt/app/backend/pyproject.toml
|
||||
docker exec app-instance-steven sh -lc 'cd /opt/app/backend && uv pip install --system --no-deps -e .'
|
||||
docker restart app-instance-steven
|
||||
curl -fsS http://127.0.0.1:20000/api/ping
|
||||
```
|
||||
|
||||
Create a fresh MGM/Galaxy finance-report Task and inspect its session/task process events. Acceptance requires this ordering:
|
||||
|
||||
```text
|
||||
skill_activation_snapshotted
|
||||
→ execution_mode_selected {execution_mode: team, primary_template_skill: mgm-galaxy-financial-chart-report-safe}
|
||||
→ tool_call_started: run_agent_team
|
||||
→ run_agent_team_debug: invoke_started
|
||||
→ task_team_run_completed or task_team_run_failed
|
||||
```
|
||||
|
||||
The first ordinary `web_search` must be emitted by a Team node, never by the root Main Agent. If the model intentionally selects Single for this known staged finance template, stop and inspect the captured first-turn system prompt/tool call before changing code.
|
||||
|
||||
- [ ] **Step 4: Report and stop**
|
||||
|
||||
Report modified files, focused test outputs, Docker health, real-task event ordering, `git diff --stat`, and remaining model-mediated routing risk. Do not stage or commit unless the user explicitly asks.
|
||||
|
||||
## Plan Self-Review
|
||||
|
||||
- Scope coverage: primary template selection, first-turn guidance, mode selection without extra LLM round/reason text, mode lock, raw event persistence, process projection, and real MGM/Galaxy verification are covered.
|
||||
- Compatibility: no-template runs keep existing Team-tool exposure; child Team nodes still cannot see the tool; graph/runtime/tool scope/evidence/synthesis behavior is untouched.
|
||||
- Out-of-scope guard: no Planner heuristic change, no frontend, no fixed roles, no nested Team, and no new Team model appear in the implementation tasks.
|
||||
|
||||
@ -0,0 +1,150 @@
|
||||
# Template-Guided Team Routing Design
|
||||
|
||||
## Goal
|
||||
|
||||
Make an activated Skill with a valid `beaver-team-template` a first-class execution option for the Main Agent. The Main Agent makes the execution-mode choice during its first normal model turn: it either calls `run_agent_team` or proceeds as a single agent. No new model round, natural-language decision reason, fixed-role agent, parallel Team model, or nested Team is introduced.
|
||||
|
||||
## Problem
|
||||
|
||||
The MGM/Galaxy financial-report task activated `mgm-galaxy-financial-chart-report-safe`, whose template describes source collection, metric extraction, validation, and report generation. The initial task planner nevertheless returned `planner_skipped_simple_task`, because its keyword prefilter did not recognize the request. The Main Agent had `run_agent_team` in its tool schema and the template in its Skill guidance, but the existing prompt only said that it "may" or should "prefer" the Team tool. It selected `web_search` directly and the run stopped at the web-search low-quality budget before any Team was created.
|
||||
|
||||
The issue is not missing tool registration or a broken Team runtime. The decision to use an active template is currently an unconstrained, implicit LLM preference.
|
||||
|
||||
## Scope
|
||||
|
||||
In scope:
|
||||
|
||||
- Main-Agent first-turn routing when one or more activated Skills provide a valid Team template.
|
||||
- A compact, structured representation of the selected primary template in Main-Agent guidance.
|
||||
- Explicit first-turn mode semantics inferred from normal execution:
|
||||
- calling `run_agent_team` selects Team mode;
|
||||
- calling another tool, or replying without a tool call, selects single-agent mode.
|
||||
- Structured lifecycle observability for the chosen mode without an LLM-authored reason.
|
||||
- Tests for template-present Team routing, single-agent opt-out, and legacy no-template behavior.
|
||||
|
||||
Out of scope:
|
||||
|
||||
- A separate mode-selection provider call or a `select_execution_mode` tool.
|
||||
- Requiring a natural-language explanation for single-agent execution.
|
||||
- Changing `ExecutionGraph`, `ExecutionNode`, `LocalAgentRunner`, scheduler/evidence/synthesis semantics, node Skill binding, or tool scopes.
|
||||
- Fixed-role agents, nested Teams, a parallel Team runtime, frontend work, or chart-renderer tools.
|
||||
- Changing the existing pre-execution `TaskExecutionPlanner` heuristic; this design makes Main-Agent routing reliable even when that planner returns `single`.
|
||||
|
||||
## Existing Runtime Boundary
|
||||
|
||||
```text
|
||||
TaskAttemptOrchestrator
|
||||
→ TaskExecutionPlanner.plan()
|
||||
→ Main Agent AgentLoop.process_direct()
|
||||
→ Skill activation + ToolAssembler
|
||||
→ provider first response
|
||||
→ selected tool execution
|
||||
```
|
||||
|
||||
Today, `AgentLoop` registers `run_agent_team` for a root Task run and adds `TASK_AGENT_TEAM_CAPABILITY_PROMPT`. `SkillContext.team_template` is already populated from a Skill's `beaver-team-template` block. Therefore the implementation belongs at the Main-Agent prompt/tool boundary, not in a new Team runtime.
|
||||
|
||||
## Design
|
||||
|
||||
### 1. Template Eligibility
|
||||
|
||||
The normal activated Skill list remains authoritative. The routing helper considers only Skills where `SkillContext.team_template` is a non-empty mapping with a non-empty `nodes` list.
|
||||
|
||||
At most one template is primary. Selection is deterministic and preserves activation order: the first eligible activated Skill is primary; later eligible templates are guidance only. The primary Skill name and ignored template Skill names are included in structured runtime metadata, not in an LLM decision response.
|
||||
|
||||
No template is synthesized from Skill prose. Invalid/missing templates retain existing single-agent behavior.
|
||||
|
||||
### 2. First-Turn Routing Contract
|
||||
|
||||
When a root Task run has an eligible primary template and `run_agent_team` is exposed, the system guidance tells the Main Agent:
|
||||
|
||||
```text
|
||||
Before beginning ordinary work, choose one execution path in this first response.
|
||||
|
||||
- For staged collection, extraction, validation, comparison, research, or reporting represented by the active template, call run_agent_team now using task-only nodes derived from the primary template.
|
||||
- Choose single-agent execution only when the user's request is plainly a one-step request, explicitly asks not to delegate, or the template does not fit the immediate request.
|
||||
- Do not call ordinary tools before this choice.
|
||||
- If you choose single-agent execution, call ordinary tools or answer normally; do not explain the routing choice.
|
||||
```
|
||||
|
||||
The template is included as compact JSON in this guidance, along with its Skill name. This prevents the model having to reconstruct the graph from prose. The existing template and task-only schema restrictions remain authoritative: no `agent` or `role` fields, no nested Team.
|
||||
|
||||
This is a prompting constraint, not a second classifier or a forced Team decision. A model can intentionally opt out by beginning normal single-agent work.
|
||||
|
||||
### 3. Mode Lock and Fallback
|
||||
|
||||
The first provider response locks the root run's mode:
|
||||
|
||||
| First response | Mode | Subsequent behavior |
|
||||
| --- | --- | --- |
|
||||
| contains `run_agent_team` | `team` | Execute the Team tool and retain existing Team outcome/synthesis behavior. |
|
||||
| contains another tool call | `single` | Execute the ordinary tool call. Hide/reject later `run_agent_team` calls for this root run. |
|
||||
| contains no tool call | `single` | Return the answer normally. |
|
||||
| Team tool returns an error | `team` | Preserve its tool result and allow the Main Agent's normal post-tool response; do not silently start a second Team. |
|
||||
|
||||
The lock prevents a half-completed single-agent search from later creating a Team with overlapping evidence and ambiguous timeline ownership. A new Task attempt starts a fresh first-turn routing decision.
|
||||
|
||||
The existing pre-execution planner may still create a Team before Main Agent execution. If it does, no Main-Agent route decision is needed: its mode is already `team`. The new behavior applies to the current `single` pre-plan path, where the Main Agent otherwise owns execution.
|
||||
|
||||
### 4. Observability
|
||||
|
||||
Persist a machine-readable event immediately after the first provider response is classified:
|
||||
|
||||
```json
|
||||
{
|
||||
"execution_mode": "team",
|
||||
"routing_source": "main_agent_first_turn",
|
||||
"primary_template_skill": "mgm-galaxy-financial-chart-report-safe",
|
||||
"ignored_template_skills": []
|
||||
}
|
||||
```
|
||||
|
||||
For a no-template run, no new event is necessary. For a template run that selects single mode, the same event is written with `execution_mode: "single"`. There is deliberately no `reason` field and no user-visible decision text.
|
||||
|
||||
The event makes future task detail/process projections able to distinguish “Team was not available” from “Team was available and the Main Agent selected single execution,” without inflating token usage.
|
||||
|
||||
### 5. Failure Semantics
|
||||
|
||||
- Template parse/selection failure: retain existing tool selection and single-agent execution.
|
||||
- `run_agent_team` unavailable: do not claim Team routing; retain normal single-agent execution.
|
||||
- Invalid Team arguments: current Team tool error path remains intact; it is visible as a tool result.
|
||||
- First response tries both `run_agent_team` and ordinary tools: reject/defer ordinary tools for that turn and execute only Team, because Team mode owns the execution plan. This is a defensive runtime rule to preserve the no-mixed-mode invariant.
|
||||
|
||||
## Test Strategy
|
||||
|
||||
Unit tests cover the smallest boundaries:
|
||||
|
||||
1. Eligible activated template produces the first-turn route guidance containing the primary template and task-only Team instruction.
|
||||
2. A first response containing `run_agent_team` records `team` mode and still follows the current Team tool call path.
|
||||
3. A first response containing `web_search` records `single` mode; a later `run_agent_team` call is not exposed/executed in that run.
|
||||
4. A valid template does not make ordinary no-template runs change their tool list or behavior.
|
||||
5. Multiple templates select the first activated template and report the others as ignored metadata.
|
||||
6. Existing agent-loop and Task/Team regression tests stay green.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
For a request that activates the MGM/Galaxy Skill:
|
||||
|
||||
```text
|
||||
Skill activation
|
||||
→ primary template is available to Main Agent first turn
|
||||
→ Main Agent calls run_agent_team before web_search
|
||||
→ Team graph and child nodes are created
|
||||
```
|
||||
|
||||
For a plainly one-step request with an activated but non-fitting template:
|
||||
|
||||
```text
|
||||
Skill activation
|
||||
→ Main Agent begins normal execution directly
|
||||
→ execution_mode=single is recorded
|
||||
→ no additional model round and no reason text are generated
|
||||
```
|
||||
|
||||
All existing semantics for generic workers, node tool scopes, Skill-based tool assembly, evidence gates, final synthesis, and no nested Teams remain unchanged.
|
||||
|
||||
## Risks
|
||||
|
||||
- This relies on the Main Agent following a stronger first-turn instruction; it is materially more reliable than `may/prefer`, but still model-mediated rather than a hard classifier.
|
||||
- A template can be activated for a broad Skill while not fitting a narrow user follow-up. The explicit single-agent route is retained for that case.
|
||||
- Hiding `run_agent_team` after first-turn single selection changes a root run's available tool list over iterations. The event and tests must make that state transition explicit.
|
||||
- Existing pre-planner keyword routing remains a separate heuristic. It can still choose a Team early for known complex tasks; it is no longer the sole mechanism for template-driven Team execution.
|
||||
1259
docs/superpowers/specs/2026-06-26-local-mcp-team-workflows-design.md
Normal file
1259
docs/superpowers/specs/2026-06-26-local-mcp-team-workflows-design.md
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user