feat(tasks): add skill-templated task graph execution
This commit is contained in:
@ -20,6 +20,10 @@
|
||||
- `beaver/engine/loop.py`, `tools/runtime/executor.py`, `coordinator/local.py`: node allowlist and budget enforcement.
|
||||
- `beaver/tasks/evidence.py`, `coordinator/execution/scheduler.py`, `tasks/attempt_orchestrator.py`: evidence completion and incomplete synthesis gate.
|
||||
|
||||
## Execution Reporting Rule
|
||||
|
||||
Do not commit automatically. After every task, stop and report the modified-file list, exact test command and result, `git diff --stat` summary, and remaining risks. Commit only when the user explicitly asks.
|
||||
|
||||
### Task 1: Parse and Propagate Optional Skill Templates
|
||||
|
||||
**Files:**
|
||||
@ -91,9 +95,9 @@ Run: `cd app-instance/backend && uv run pytest tests/unit/test_skill_team_templa
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
- [ ] **Step 5: Stop and report; do not commit**
|
||||
|
||||
Run: `git add app-instance/backend/beaver/skills/catalog/utils.py app-instance/backend/beaver/skills/catalog/loader.py app-instance/backend/beaver/engine/context/builder.py app-instance/backend/beaver/skills/assembler/task_assembler.py app-instance/backend/tests/unit/test_skill_team_template.py && git commit -m "feat(skills): parse optional task graph templates"`
|
||||
Report the modified files, parser/assembler test result, `git diff --stat`, and any template compatibility risk. Do not commit unless explicitly asked.
|
||||
|
||||
### Task 2: Extend Existing Graph Contracts
|
||||
|
||||
@ -107,9 +111,11 @@ Run: `git add app-instance/backend/beaver/skills/catalog/utils.py app-instance/b
|
||||
```python
|
||||
def test_execution_node_contracts_default_for_existing_callers() -> None:
|
||||
node = ExecutionNode("collect", "Collect", AgentDescriptor(name="collect"))
|
||||
assert node.allowed_tool_names == []
|
||||
assert node.allowed_tool_names is None
|
||||
assert node.required_evidence == []
|
||||
assert node.evidence_contract == {}
|
||||
assert node.required_for_completion is True
|
||||
assert node.block_downstream_on_partial is False
|
||||
|
||||
|
||||
def test_graph_rejects_depth_above_configured_limit() -> None:
|
||||
@ -136,14 +142,16 @@ Expected: FAIL because fields and `max_depth` do not exist.
|
||||
```python
|
||||
input_contract: dict[str, Any] = field(default_factory=dict)
|
||||
output_contract: dict[str, Any] = field(default_factory=dict)
|
||||
allowed_tool_names: list[str] = field(default_factory=list)
|
||||
allowed_tool_names: list[str] | None = None
|
||||
required_evidence: list[str] = field(default_factory=list)
|
||||
evidence_contract: dict[str, Any] = field(default_factory=dict)
|
||||
validation_rules: list[str] = field(default_factory=list)
|
||||
required_for_completion: bool = True
|
||||
block_downstream_on_partial: bool = False
|
||||
max_tool_iterations: int | None = None
|
||||
```
|
||||
|
||||
Add the runtime-relevant values to `DelegationEnvelope`. Add `completion_status="succeeded"` and `evidence_gaps=[]` to `NodeRunResult`. Extend `ExecutionGraph.validate(max_depth: int | None = None)` to calculate longest dependency chain with its existing DFS and raise only when an explicit limit is exceeded.
|
||||
Use `allowed_tool_names: list[str] | None = None`, not a default empty list. `None` means no node-level scope and keeps legacy behavior; `[]` explicitly disables tools; a populated list is the node allowlist. Add runtime-relevant values to `DelegationEnvelope`. Add `completion_status="succeeded"` and `evidence_gaps=[]` to `NodeRunResult`. Extend `ExecutionGraph.validate(max_depth: int | None = None)` to calculate longest dependency chain with its existing DFS and raise only when an explicit limit is exceeded.
|
||||
|
||||
- [ ] **Step 4: Run the coordinator regression test**
|
||||
|
||||
@ -151,9 +159,9 @@ Run: `cd app-instance/backend && uv run pytest tests/unit/test_agent_team_v1.py
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
- [ ] **Step 5: Stop and report; do not commit**
|
||||
|
||||
Run: `git add app-instance/backend/beaver/coordinator/models.py app-instance/backend/tests/unit/test_agent_team_v1.py && git commit -m "feat(team): add optional node contracts"`
|
||||
Report the modified files, coordinator test result, `git diff --stat`, and compatibility risk for existing direct graph callers. Do not commit unless explicitly asked.
|
||||
|
||||
### Task 3: Adapt Templates Into Generic Task Graphs
|
||||
|
||||
@ -185,6 +193,15 @@ def test_unknown_tool_is_removed_and_warned() -> None:
|
||||
)
|
||||
assert plan.graph.nodes[0].allowed_tool_names == ["web_search"]
|
||||
assert "unknown tool removed: not_real" in plan.planner_adaptation["warnings"]
|
||||
|
||||
|
||||
def test_high_risk_tool_is_removed_without_failing_low_risk_plan() -> None:
|
||||
plan = TaskExecutionPlanner(tool_registry=_registry()).from_json(
|
||||
'{"mode":"team","strategy":"sequence","nodes":[{"node_id":"collect","task":"Collect",'
|
||||
'"requested_tools":["web_search","terminal"]}]}'
|
||||
)
|
||||
assert plan.graph.nodes[0].allowed_tool_names == ["web_search"]
|
||||
assert "requires_high_risk_review: terminal" in plan.planner_adaptation["warnings"]
|
||||
```
|
||||
|
||||
- [ ] **Step 2: Run it to verify failure**
|
||||
@ -197,9 +214,9 @@ Expected: FAIL because planner has no template context, registry policy, or adap
|
||||
|
||||
Add `tool_registry: ToolRegistry | None` to `TaskExecutionPlanner`. Change `plan()` to receive `activated_skills: list[SkillContext]`, select at most one valid template, and include it in `_prompt`. Add `planner_adaptation: dict[str, Any] = field(default_factory=dict)` to `TaskExecutionPlan` and `to_event_payload()`.
|
||||
|
||||
Accept only `node_id`, `task`, `depends_on`, `input_contract`, `output_contract`, `requested_tools`, `required_evidence`, `validation_rules`, `required_for_completion`, `max_tool_iterations`, and `constraints`. Reject `agent` and `role`; construct `AgentDescriptor(name=node_id, role="", system_prompt="", metadata={"sub_agent_kind": "generic_skill_worker", ...})` internally.
|
||||
Accept only `node_id`, `task`, `depends_on`, `input_contract`, `output_contract`, `requested_tools`, `required_evidence`, `evidence_contract`, `validation_rules`, `required_for_completion`, `block_downstream_on_partial`, `max_tool_iterations`, and `constraints`. Reject `agent` and `role`; construct `AgentDescriptor(name=node_id, role="", system_prompt="", metadata={"sub_agent_kind": "generic_skill_worker", ...})` internally.
|
||||
|
||||
Resolve requested names through registry plus conservative read-only policy. Write allowed names to `ExecutionNode.allowed_tool_names`; write unknown/high-risk removals into adaptation warnings. Validate node count, dependencies, cycles, and `graph.validate(max_depth=4)`. If first provider output is invalid, make exactly one `tools=None` repair request containing validation errors; if it is still invalid, return `TaskExecutionPlan.single("planner_fallback_single", fallback_error=...)`.
|
||||
Resolve requested names through registry plus a conservative interim name-based risk policy. Treat `terminal`, `execute_command`, `write_file`, `delete_file`, `external_send`, and `send_email` as high-risk until stable `ToolSpec.metadata` risk fields exist. Write allowed names to `ExecutionNode.allowed_tool_names`; remove unknown/high-risk names and record warnings. Unknown tools never fail the whole plan; high-risk tools add `requires_high_risk_review` and are never auto-approved. Validate node count, dependencies, cycles, and `graph.validate(max_depth=4)`. If first provider output is invalid, make exactly one `tools=None` repair request containing validation errors; if it is still invalid, return `TaskExecutionPlan.single("planner_fallback_single", fallback_error=...)`.
|
||||
|
||||
Update `TaskAttemptOrchestrator` to pass `preselected_skills`, and `EngineLoader` to construct planner with its registry.
|
||||
|
||||
@ -209,9 +226,9 @@ Run: `cd app-instance/backend && uv run pytest tests/unit/test_task_execution_pl
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
- [ ] **Step 5: Stop and report; do not commit**
|
||||
|
||||
Run: `git add app-instance/backend/beaver/tasks/planner.py app-instance/backend/beaver/tasks/attempt_orchestrator.py app-instance/backend/beaver/engine/loader.py app-instance/backend/tests/unit/test_task_execution_planner.py && git commit -m "feat(tasks): adapt skill templates into task graphs"`
|
||||
Report the modified files, planner/task-mode test result, `git diff --stat`, and any risk-policy false-positive risk. Do not commit unless explicitly asked.
|
||||
|
||||
### Task 4: Enforce Node Tool Allowlists
|
||||
|
||||
@ -231,6 +248,13 @@ def test_team_node_exposes_only_allowed_tool_schema() -> None:
|
||||
assert _tool_names(provider.calls[0]["tools"]) == ["web_search"]
|
||||
|
||||
|
||||
def test_none_tool_scope_preserves_legacy_selection_and_empty_scope_disables_all() -> None:
|
||||
asyncio.run(loop.process_direct("collect", allowed_tool_names=None))
|
||||
assert _tool_names(provider.calls[0]["tools"])
|
||||
asyncio.run(loop.process_direct("collect", allowed_tool_names=[]))
|
||||
assert _tool_names(provider.calls[1]["tools"]) == []
|
||||
|
||||
|
||||
def test_executor_rejects_registered_tool_outside_node_allowlist() -> None:
|
||||
context = ToolContext(metadata={"allowed_tool_names": ["web_search"]})
|
||||
result = asyncio.run(executor.execute("write_file", {"path": "x", "content": "x"}, context=context))
|
||||
@ -262,9 +286,9 @@ Run: `cd app-instance/backend && uv run pytest tests/unit/test_team_node_tool_po
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
- [ ] **Step 5: Stop and report; do not commit**
|
||||
|
||||
Run: `git add app-instance/backend/beaver/engine/loop.py app-instance/backend/beaver/tools/runtime/executor.py app-instance/backend/beaver/coordinator/local.py app-instance/backend/tests/unit/test_agent_loop.py app-instance/backend/tests/unit/test_team_node_tool_policy.py && git commit -m "feat(team): enforce node tool scopes"`
|
||||
Report the modified files, focused/loop test result, `git diff --stat`, and risk that a legacy caller accidentally passes `[]`. Do not commit unless explicitly asked.
|
||||
|
||||
### Task 5: Gate Node Success on Required Evidence
|
||||
|
||||
@ -286,8 +310,19 @@ def test_node_without_required_tool_result_is_partial() -> None:
|
||||
assert result.evidence_gaps == ["missing required evidence: tool_result"]
|
||||
|
||||
|
||||
def test_dag_blocks_dependency_of_partial_required_node() -> None:
|
||||
def test_node_without_evidence_requirement_keeps_legacy_success() -> None:
|
||||
result = asyncio.run(runner.run(_envelope(required_evidence=[])))
|
||||
assert result.success is True
|
||||
assert result.completion_status == "succeeded"
|
||||
|
||||
|
||||
def test_dag_allows_partial_evidence_by_default() -> None:
|
||||
outcome = asyncio.run(scheduler.run(_graph_with_partial_collect_node(), parent_task_id=None, parent_session_id="s"))
|
||||
assert outcome.node_results[1].completion_status == "succeeded"
|
||||
|
||||
|
||||
def test_dag_blocks_partial_node_only_when_node_requests_it() -> None:
|
||||
outcome = asyncio.run(scheduler.run(_graph_with_blocking_partial_collect_node(), parent_task_id=None, parent_session_id="s"))
|
||||
assert outcome.node_results[1].finish_reason == "blocked"
|
||||
```
|
||||
|
||||
@ -299,7 +334,7 @@ Expected: FAIL because evidence requirements do not affect node success.
|
||||
|
||||
- [ ] **Step 3: Implement deterministic evidence checks**
|
||||
|
||||
Add `evaluate_node_evidence(evidence, required_evidence, output_text) -> list[str]`. `tool_result` requires a successful tool result, `url` a tool result URL, and `output` non-empty output; any other requirement produces `unsupported evidence requirement: <name>`. After `LocalAgentRunner` builds `RunEvidence`, set `completion_status="partial"`, `success=False`, and gaps when required evidence is absent. Scheduler-created error/blocked results set status to `failed`/`blocked` while retaining partial evidence.
|
||||
Add `evaluate_node_evidence(evidence, required_evidence, output_text) -> list[str]`. `required_evidence` is a coarse v1 gate: `tool_result` requires a successful tool result, `url` a tool result URL, and `output` non-empty output; any other requirement produces `unsupported evidence requirement: <name>`. Do not interpret `evidence_contract` in v1. After `LocalAgentRunner` builds `RunEvidence`, set `completion_status="partial"`, `success=False`, and gaps only when the node actually declares `required_evidence`. Leave existing no-requirement node success behavior unchanged. Scheduler always blocks `failed`/`blocked`; it passes partial output/evidence onward unless `block_downstream_on_partial=True`.
|
||||
|
||||
- [ ] **Step 4: Run coordinator and evidence regression tests**
|
||||
|
||||
@ -307,9 +342,9 @@ Run: `cd app-instance/backend && uv run pytest tests/unit/test_agent_team_v1.py
|
||||
|
||||
Expected: PASS.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
- [ ] **Step 5: Stop and report; do not commit**
|
||||
|
||||
Run: `git add app-instance/backend/beaver/tasks/evidence.py app-instance/backend/beaver/coordinator/local.py app-instance/backend/beaver/coordinator/execution/scheduler.py app-instance/backend/tests/unit/test_agent_team_v1.py app-instance/backend/tests/unit/test_task_evidence.py && git commit -m "feat(team): require declared node evidence"`
|
||||
Report the modified files, coordinator/evidence test result, `git diff --stat`, and the known coarse-evidence limitation. Do not commit unless explicitly asked.
|
||||
|
||||
### Task 6: Gate Final Synthesis and Verify Finance Planning
|
||||
|
||||
@ -350,7 +385,7 @@ Expected: FAIL because outcome gate is absent.
|
||||
|
||||
Add `_team_synthesis_outcome(plan, result) -> tuple[str, str]`. Every `required_for_completion=True` node whose `completion_status` is not `succeeded` is incomplete. Context includes node id, status, error, and evidence gaps. Keep Team synthesis at `include_tools=False` and `max_tool_iterations=0`; prefix final output only when the incomplete notice is missing. Write `task_outcome` and `incomplete_node_ids` to `task_synthesis_completed`.
|
||||
|
||||
Add `_finance_plan_json()` fixture with four task-oriented nodes and dependencies `collect -> extract -> validate -> report`. Only source/extraction nodes request `web_search`/`web_fetch`; report node uses upstream evidence and produces Markdown/table/chart data, never an unregistered chart renderer. Assert no node is named `researcher`, `writer`, or `reviewer`.
|
||||
Add `_finance_plan_json()` fixture with four task-oriented nodes and dependencies `collect -> extract -> validate -> report`. The report node explicitly uses `allowed_tool_names=[]`; source/extraction nodes request only `web_search`/`web_fetch`. Assert no node is named `researcher`, `writer`, or `reviewer`. The report node may emit a comparison table, chart-ready data, Mermaid chart, Markdown chart section, or text-bar-chart fallback. It must not claim an image/file chart artifact unless a registered chart-renderer tool exists and passes policy.
|
||||
|
||||
- [ ] **Step 4: Run complete backend unit suite**
|
||||
|
||||
@ -358,12 +393,12 @@ Run: `cd app-instance/backend && uv run pytest tests/unit -q`
|
||||
|
||||
Expected: PASS. Fix only compatibility defects in this plan; do not change router, persistent agent registry, frontend, nested-team behavior, or Skill-learning eval semantics.
|
||||
|
||||
- [ ] **Step 5: Commit**
|
||||
- [ ] **Step 5: Stop and report; do not commit**
|
||||
|
||||
Run: `git add app-instance/backend/beaver/tasks/attempt_orchestrator.py app-instance/backend/tests/unit/test_task_mode_feedback.py app-instance/backend/tests/unit/test_task_team_synthesis_outcome.py app-instance/backend/tests/unit/test_task_execution_planner.py app-instance/backend/tests/unit/test_task_skill_resolver.py && git commit -m "test(team): cover skill-templated finance planning"`
|
||||
Report the modified files, complete unit-suite result, `git diff --stat`, and all remaining boundaries. Do not commit unless explicitly asked.
|
||||
|
||||
## Plan Self-Review
|
||||
|
||||
- Coverage: parser compatibility, existing graph contracts, template adaptation/repair, tool enforcement, evidence completion, deterministic synthesis, and finance acceptance all have explicit tasks.
|
||||
- Coverage: parser compatibility, one-primary-template adaptation/repair, `None`/`[]`/allowlist scope semantics, interim high-risk filtering, partial propagation, coarse evidence completion, deterministic synthesis, and finance acceptance all have explicit tasks.
|
||||
- Exclusions: no fixed role Agents, parallel Team model, nested graph execution, chart renderer, high-risk approval UI, frontend work, or Skill-eval redesign appears in the implementation scope.
|
||||
- Compatibility: all new graph fields are defaults-only; `None` tool scope preserves single-agent behavior, while `[]` gives a Team node no tools.
|
||||
- Compatibility: all new graph fields are defaults-only; `allowed_tool_names=None` preserves legacy behavior, `[]` explicitly disables tools, and evidence gating activates only when `required_evidence` is declared.
|
||||
|
||||
@ -45,7 +45,7 @@ Out of scope:
|
||||
- a high-risk approval UI or new approval API;
|
||||
- chart-image rendering.
|
||||
|
||||
The current runtime registers `web_search` and `web_fetch` but no chart renderer. The finance acceptance case therefore produces evidence-backed comparison data and a textual/Markdown report, not a fabricated chart artifact.
|
||||
The current runtime registers `web_search` and `web_fetch` but no chart renderer. The finance acceptance case may produce an evidence-backed comparison table, chart-ready data, Mermaid chart, Markdown chart section, text-bar-chart fallback, and final textual report. It must not claim that an image/file chart artifact was generated unless a registered chart-renderer tool exists and passes runtime safety policy.
|
||||
|
||||
## Data Model Evolution
|
||||
|
||||
@ -54,16 +54,20 @@ The current runtime registers `web_search` and `web_fetch` but no chart renderer
|
||||
```python
|
||||
input_contract: dict[str, object] = field(default_factory=dict)
|
||||
output_contract: dict[str, object] = field(default_factory=dict)
|
||||
allowed_tool_names: list[str] = field(default_factory=list)
|
||||
allowed_tool_names: list[str] | None = None
|
||||
required_evidence: list[str] = field(default_factory=list)
|
||||
evidence_contract: dict[str, Any] = field(default_factory=dict)
|
||||
validation_rules: list[str] = field(default_factory=list)
|
||||
required_for_completion: bool = True
|
||||
block_downstream_on_partial: bool = False
|
||||
max_tool_iterations: int | None = None
|
||||
```
|
||||
|
||||
Existing callers retain their behavior because empty lists and `None` impose no new node requirement.
|
||||
`allowed_tool_names` has three non-overlapping meanings: `None` means node-level tool scope is disabled and retains legacy tool selection; `[]` explicitly prohibits every tool for this node; a populated list permits only those registered, policy-allowed tools. Existing callers retain behavior because the default is `None`.
|
||||
|
||||
`NodeRunResult` remains the node-output container. It gains `completion_status` (`succeeded`, `partial`, `failed`, or `blocked`) and `evidence_gaps`. `success` remains for scheduler compatibility and is true only for `succeeded`. A completed run with missing required evidence is therefore `partial`, and downstream dependencies block exactly as they do for failed nodes.
|
||||
`NodeRunResult` remains the node-output container. It gains `completion_status` (`succeeded`, `partial`, `failed`, or `blocked`) and `evidence_gaps`. `success` remains a compatibility field. Nodes without `required_evidence` retain the current `finish_reason == "stop"` success behavior. For a node that declares evidence requirements, a completed run with missing required evidence becomes `partial` and has `success=False`.
|
||||
|
||||
`failed` and `blocked` always block dependent nodes. `partial` does not imply successful completion, but its output and evidence remain consumable by downstream nodes unless `block_downstream_on_partial=True`. Any required-for-completion node that is partial still forces the final task outcome to `incomplete`.
|
||||
|
||||
`TaskExecutionPlan` gains a planner-adaptation payload rather than a duplicate graph object. The payload records template source/version, whether it was used, added/removed/merged node ids, removed tool names, warnings, and fallback reason. It is written into the existing `task_execution_planned` event.
|
||||
|
||||
@ -100,7 +104,7 @@ The template is an LLM input, not an executable workflow. It supplies candidate
|
||||
|
||||
## Planner Design
|
||||
|
||||
`TaskAttemptOrchestrator` passes activated `SkillContext` objects to the planner rather than only truncated summaries. The planner chooses at most one applicable template for the first implementation; multiple activated Skills remain ordinary guidance. This avoids composing incompatible templates before there is evidence for a composition model.
|
||||
`TaskAttemptOrchestrator` passes activated `SkillContext` objects to the planner rather than only truncated summaries. v1 supports one primary applicable Skill Team Template; other activated Skills remain ordinary guidance. Template composition, sub-skill guidance composition, and multi-Skill planning are explicitly deferred rather than prohibited long-term.
|
||||
|
||||
Planner output uses a task-only JSON schema. It contains `mode`, `reason`, `strategy`, `nodes`, `final_synthesis_instruction`, and `adaptation`. Nodes contain task, dependencies, contracts, requested tools, evidence requirements, validation rules, and completion importance. `agent` and `role` are not accepted as planner schema fields; the adapter creates the existing empty-role `AgentDescriptor` itself.
|
||||
|
||||
@ -125,7 +129,7 @@ template/node requested names
|
||||
∩ node runtime policy
|
||||
```
|
||||
|
||||
Skill hints are suggestions, not authority. The current code has no populated task-time user/workspace permission model, so v1 must not claim that it enforces one. It uses a conservative node runtime policy:
|
||||
Skill hints are suggestions, not authority. The current code has no populated task-time user/workspace permission model, so v1 must not claim that it enforces one. v1 uses a conservative interim tool-risk policy, not a complete task-time permission system. Until `ToolSpec.metadata` has stable fields such as `risk_level`, `mutating`, `external_side_effect`, `requires_approval`, and `readonly`, the interim policy uses a conservative name-based high-risk set such as `terminal`, `execute_command`, `write_file`, `delete_file`, `external_send`, and `send_email`.
|
||||
|
||||
- unknown names are removed and reported as planner warnings;
|
||||
- read-only tools may remain available when the node requests them;
|
||||
@ -134,25 +138,27 @@ Skill hints are suggestions, not authority. The current code has no populated ta
|
||||
|
||||
Provider schemas are filtered to the allowlist, and `ToolExecutor` performs a second allowlist check through `ToolContext.metadata`. This prevents a model-originated call to a registered but unexposed tool from executing.
|
||||
|
||||
A real high-risk approval flow requires a task lifecycle state and UI/API confirmation. It is deferred; v1 blocks and explains rather than auto-approving.
|
||||
A real high-risk runtime approval flow requires a task lifecycle state and UI/API confirmation. It is out of scope; v1 removes high-risk names, records `requires_high_risk_review`, and explains the limitation rather than auto-approving.
|
||||
|
||||
## Runtime and Evidence Semantics
|
||||
|
||||
`DelegationEnvelope` receives node contracts, allowed tools, evidence requirements, and per-node tool budget. `LocalAgentRunner` passes the allowed tools and budget into the current `AgentLoop`, builds existing `RunEvidence`, and classifies completion.
|
||||
|
||||
Evidence requirements have deterministic meanings in v1:
|
||||
`required_evidence` in v1 is a coarse node-level completion gate, not a field-level evidence contract. It can show that a node produced at least one URL or tool result; it cannot prove that every required company, reporting period, metric, and source is present. `evidence_contract: dict[str, Any]` is reserved for a later field-level contract and is not interpreted in v1.
|
||||
|
||||
The coarse requirements have deterministic meanings in v1:
|
||||
|
||||
- `tool_result`: at least one successful tool result;
|
||||
- `url`: at least one tool result with a URL;
|
||||
- `output`: non-empty node output;
|
||||
- any other declared value: explicit evidence gap.
|
||||
|
||||
The scheduler keeps sequence/parallel/DAG semantics. Dependencies only receive succeeded upstream results. It does not retry, recursively expand Skills, or create another Team graph.
|
||||
The scheduler keeps sequence/parallel/DAG semantics. Dependencies never run after an upstream `failed` or `blocked` result. A `partial` upstream result is passed onward as partial evidence by default; a node can opt into blocking it with `block_downstream_on_partial=True`. The scheduler does not retry, recursively expand Skills, or create another Team graph.
|
||||
|
||||
Before final synthesis, `TaskAttemptOrchestrator` derives a task outcome:
|
||||
|
||||
- `complete`: every required-for-completion node succeeded;
|
||||
- `incomplete`: any required node is partial, failed, or blocked;
|
||||
- `incomplete`: any required node is partial, failed, or blocked, even if downstream synthesis produced a useful partial report;
|
||||
- `single`: no Team graph ran.
|
||||
|
||||
Team synthesis continues to run with no tools. For `incomplete`, the synthesis context lists completed work, node failures, evidence gaps, and the deterministic task outcome. The returned user-facing answer is prefixed with an incomplete notice if the model omits it, so runtime—not prompt compliance alone—prevents a false completion claim.
|
||||
@ -166,7 +172,7 @@ Existing task events receive the adaptation report, resolved tools, policy remov
|
||||
Compatibility guarantees:
|
||||
|
||||
- Skills without templates activate and execute unchanged.
|
||||
- Existing direct `ExecutionGraph` callers work because new fields have defaults.
|
||||
- Existing direct `ExecutionGraph` callers work because new fields have compatibility defaults; specifically, `allowed_tool_names=None` does not enable node-level scope and empty `required_evidence` does not enable evidence gating.
|
||||
- Single-agent runs do not receive node tool policies or outcome prefixes.
|
||||
- Existing external registry descriptors are not removed; planner-created Team nodes stay generic and role-empty.
|
||||
- `TaskSkillResolver` remains the per-node published-Skill/ephemeral-guidance fallback.
|
||||
|
||||
Reference in New Issue
Block a user