feat(engine): 添加运行时上下文支持并重构工具迭代限制

添加 RuntimeContext 类用于捕获模型运行时的日期时间信息,
包括UTC时间、本地时间和时区信息,并在系统提示中显示这些信息。

同时增加最大上下文消息数和工具迭代次数的配置选项,
将验证服务从引擎加载器中移除,并更新相关的数据结构和接口。

BREAKING CHANGE: 移除了验证服务,相关字段被替换为证据状态和接受状态。

- 添加 RuntimeContext 类和相关渲染方法
- 增加 max_context_messages 和 max_tool_iterations 配置
- 移除 ValidationService 相关代码
- 更新消息记录中的验证状态字段
- 添加原始工具调用检测和回退处理
This commit is contained in:
2026-05-26 11:18:35 +08:00
parent 16347caf5e
commit 6e9e74d1ee
57 changed files with 5710 additions and 1582 deletions

View File

@ -0,0 +1,954 @@
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Beaver Backend Module Blueprint</title>
<style>
:root {
--c-bg: #f8fafc;
--c-canvas: #ffffff;
--c-border: #cbd5e1;
--c-border-strong: #94a3b8;
--c-text-main: #0f172a;
--c-text-sub: #64748b;
--c-text-soft: #475569;
--c-accent: #111827;
--c-risk: #b91c1c;
--c-ok: #166534;
--font-ui: Inter, Helvetica, Arial, sans-serif;
--font-mono: "JetBrains Mono", Consolas, "Liberation Mono", monospace;
}
* {
box-sizing: border-box;
}
body {
margin: 0;
min-height: 100vh;
background: var(--c-bg);
color: var(--c-text-main);
font-family: var(--font-ui);
line-height: 1.55;
}
a {
color: inherit;
text-decoration: underline;
text-decoration-thickness: 1px;
text-underline-offset: 2px;
}
.page {
width: min(1500px, 100%);
margin: 0 auto;
padding: 32px;
}
.diagram-canvas {
background: var(--c-canvas);
border: 1px solid var(--c-border);
padding: 32px;
}
.diagram-header {
display: grid;
grid-template-columns: minmax(0, 1fr) auto;
gap: 24px;
align-items: start;
border-bottom: 1px solid var(--c-border);
padding-bottom: 18px;
margin-bottom: 24px;
}
.diagram-title {
margin: 0 0 6px;
font-size: 24px;
font-weight: 700;
letter-spacing: 0;
}
.diagram-subtitle,
.meta-line,
.kicker {
font-family: var(--font-mono);
font-size: 11px;
color: var(--c-text-sub);
text-transform: uppercase;
letter-spacing: 0.05em;
}
.meta-box {
border: 1px solid var(--c-border);
padding: 10px 12px;
min-width: 280px;
font-family: var(--font-mono);
font-size: 12px;
color: var(--c-text-soft);
}
.summary {
display: grid;
grid-template-columns: 1.15fr 0.85fr;
gap: 16px;
margin-bottom: 18px;
}
.panel,
.module,
.flow-box,
.note,
.table-wrap {
border: 1px solid var(--c-border);
background: var(--c-canvas);
}
.panel {
padding: 16px;
}
.panel h2,
.section h2 {
margin: 0 0 10px;
font-size: 17px;
letter-spacing: 0;
}
.panel p,
.module p,
.note p {
margin: 0;
color: var(--c-text-soft);
font-size: 13px;
}
.badge-row {
display: flex;
flex-wrap: wrap;
gap: 6px;
margin-top: 12px;
}
.badge {
display: inline-block;
border: 1px solid var(--c-border);
padding: 2px 6px;
font-family: var(--font-mono);
font-size: 10px;
color: var(--c-text-sub);
white-space: nowrap;
}
.badge-solid {
border-color: var(--c-accent);
background: var(--c-accent);
color: var(--c-canvas);
}
.section {
margin-top: 24px;
border-top: 1px solid var(--c-border);
padding-top: 24px;
}
.section-head {
display: grid;
grid-template-columns: minmax(0, 1fr) auto;
gap: 16px;
align-items: end;
margin-bottom: 14px;
}
.section-head p {
margin: 4px 0 0;
max-width: 980px;
color: var(--c-text-sub);
font-size: 13px;
}
.module-grid {
display: grid;
grid-template-columns: repeat(3, minmax(0, 1fr));
gap: 12px;
}
.module {
padding: 14px;
display: flex;
flex-direction: column;
gap: 10px;
min-height: 260px;
}
.module h3 {
margin: 0;
font-size: 15px;
letter-spacing: 0;
}
.module-label {
font-family: var(--font-mono);
font-size: 11px;
color: var(--c-text-sub);
}
.file-list,
.bullets,
.checks {
margin: 0;
padding-left: 18px;
color: var(--c-text-soft);
font-size: 13px;
}
.file-list {
font-family: var(--font-mono);
font-size: 11px;
line-height: 1.55;
}
.flow {
display: grid;
grid-template-columns: repeat(7, minmax(0, 1fr));
gap: 10px;
align-items: stretch;
}
.flow-box {
min-height: 118px;
padding: 12px;
position: relative;
}
.flow-box::after {
content: "";
position: absolute;
top: 50%;
right: -10px;
width: 10px;
border-top: 1px solid var(--c-border-strong);
}
.flow-box:last-child::after {
display: none;
}
.flow-box h3 {
margin: 0 0 8px;
font-size: 13px;
}
.flow-box p {
margin: 0;
color: var(--c-text-soft);
font-size: 12px;
}
.matrix {
display: grid;
grid-template-columns: 280px minmax(0, 1fr);
border-top: 1px solid var(--c-border);
border-left: 1px solid var(--c-border);
}
.matrix > div {
border-right: 1px solid var(--c-border);
border-bottom: 1px solid var(--c-border);
padding: 10px 12px;
font-size: 13px;
}
.matrix .key {
font-family: var(--font-mono);
color: var(--c-text-main);
background: #f8fafc;
}
.table-wrap {
overflow-x: auto;
}
table {
width: 100%;
min-width: 980px;
border-collapse: collapse;
font-size: 13px;
}
th,
td {
border-bottom: 1px solid var(--c-border);
border-right: 1px solid var(--c-border);
padding: 10px 12px;
vertical-align: top;
text-align: left;
}
th {
font-family: var(--font-mono);
font-size: 11px;
text-transform: uppercase;
letter-spacing: 0.05em;
color: var(--c-text-sub);
background: #f8fafc;
}
tr:last-child td {
border-bottom: 0;
}
th:last-child,
td:last-child {
border-right: 0;
}
code,
.mono {
font-family: var(--font-mono);
font-size: 0.92em;
color: var(--c-text-main);
}
.risk {
border-color: #fecaca;
}
.risk h3,
.risk .module-label {
color: var(--c-risk);
}
.ok {
color: var(--c-ok);
font-weight: 600;
}
.cols-2 {
display: grid;
grid-template-columns: repeat(2, minmax(0, 1fr));
gap: 12px;
}
.cols-4 {
display: grid;
grid-template-columns: repeat(4, minmax(0, 1fr));
gap: 12px;
}
.note {
padding: 14px;
}
@media (max-width: 1180px) {
.module-grid,
.summary,
.cols-4 {
grid-template-columns: 1fr 1fr;
}
.flow {
grid-template-columns: 1fr 1fr;
}
.flow-box::after {
display: none;
}
}
@media (max-width: 760px) {
.page {
padding: 12px;
}
.diagram-canvas {
padding: 16px;
}
.diagram-header,
.summary,
.section-head,
.module-grid,
.cols-2,
.cols-4,
.flow,
.matrix {
grid-template-columns: 1fr;
}
.meta-box {
min-width: 0;
}
}
</style>
</head>
<body>
<main class="page">
<article class="diagram-canvas">
<header class="diagram-header">
<div>
<div class="diagram-title">Beaver Backend Module Blueprint</div>
<div class="diagram-subtitle">Flat Engineering Blueprint / app-instance/backend / 2026-05-25</div>
</div>
<div class="meta-box">
SOURCE: <span class="mono">app-instance/backend</span><br>
STYLE: <span class="mono">projcet_review/blueprinter.md</span><br>
SCOPE: <span class="mono">backend code + tests + architecture docs</span><br>
MULTI-PAGE: <a href="backend_blueprint/index.html">backend_blueprint/index.html</a>
</div>
</header>
<section class="summary">
<div class="panel">
<h2>项目是干嘛的</h2>
<p>
Beaver 后端是一个面向用户任务的 agent runtime。它接收来自 Web、WebSocket、CLI、Gateway、Cron 或 MCP 的请求,
用 Main Agent 判断这是不是一个需要跟踪的 Task简单问题直接回复复杂任务进入 Task mode。Task mode 会规划单 agent
或 team 执行,运行统一的 <code>AgentLoop</code>,选择技能和工具,调用模型,记录事实证据,并等待用户接受、修改或放弃。
只有用户接受后的 Task evidence 才会沉淀为可学习的 skill 候选。
</p>
<div class="badge-row">
<span class="badge-solid badge">UNIFIED ENGINE</span>
<span class="badge">TASK MODE</span>
<span class="badge">TEAM COORDINATOR</span>
<span class="badge">SKILL LEARNING</span>
<span class="badge">MCP TOOLS</span>
<span class="badge">SCHEDULED TASKS</span>
</div>
</div>
<div class="panel">
<h2>最关键的架构判断</h2>
<p>
主 agent、team node、sub-agent 都不各自实现一套 runtime它们最后都回到同一个 <code>beaver.engine.AgentLoop</code>
因此后续修改时要优先确认:入口层是不是薄的,服务层是不是只编排,真正 tool loop / prompt / provider / session 逻辑是不是仍在 engine 内收口。
</p>
<div class="badge-row">
<span class="badge">interfaces -> services</span>
<span class="badge">services -> engine</span>
<span class="badge">engine -> skills/tools/memory</span>
</div>
</div>
</section>
<section class="section">
<div class="section-head">
<div>
<h2>主执行流</h2>
<p>这是后端最重要的一条路径,后续逐模块修改文档应该先对齐这条链路。</p>
</div>
<div class="kicker">CHAT / TASK / ACCEPTANCE / LEARNING</div>
</div>
<div class="flow">
<div class="flow-box">
<h3>1. 入口接收</h3>
<p><code>/api/chat</code>、WebSocket、CLI、Gateway 或 Cron 把用户消息转给 <code>AgentService</code></p>
</div>
<div class="flow-box">
<h3>2. 意图路由</h3>
<p><code>MainAgentRouter</code> 结合 active task 和近期会话,判断 simple / new_task / continue / revise / close / abandon。</p>
</div>
<div class="flow-box">
<h3>3. Task 建模</h3>
<p><code>TaskService</code> 写入 <code>tasks.json</code><code>events.jsonl</code>,维护 open/running/awaiting_acceptance/closed 状态。</p>
</div>
<div class="flow-box">
<h3>4. 执行规划</h3>
<p><code>TaskExecutionPlanner</code> 让辅助模型选择 single 或 team并为 team 生成 sequence / parallel / DAG 节点。</p>
</div>
<div class="flow-box">
<h3>5. 统一运行</h3>
<p><code>AgentLoop</code> 冻结 memory选 skill选 tool构建 prompt调用 provider执行 tool loop。</p>
</div>
<div class="flow-box">
<h3>6. 事实证据</h3>
<p><code>EvidenceBuilder</code> 汇总 run/team/tool 证据。Evidence 只记录事实,不判断、不打分、不 gate。</p>
</div>
<div class="flow-box">
<h3>7. 验收学习</h3>
<p>用户接受 Task 后生成 accepted task evidence 和 learning candidatesworker 可生成 draft但不会自动 approve/publish。</p>
</div>
</div>
</section>
<section class="section">
<div class="section-head">
<div>
<h2>模块总览</h2>
<p>每个模块下面都写明责任、逻辑、具体怎么做,以及关键文件。</p>
</div>
<div class="kicker">MODULE RESPONSIBILITY MAP</div>
</div>
<div class="module-grid">
<section class="module">
<div class="module-label">foundation</div>
<h3>底层配置、事件和通用模型</h3>
<p>负责加载实例级配置、定义 provider/MCP/AuthZ/backend identity schema、提供 message bus 和 cron 数据模型。</p>
<ul class="bullets">
<li>配置来源优先级:<code>BEAVER_CONFIG_PATH</code><code>BEAVER_HOME/config.json</code>、workspace 下 <code>.beaver/config.json</code></li>
<li><code>BeaverConfig.resolve_provider_target()</code> 从默认模型、显式 provider 和已配置凭据推导运行目标。</li>
<li><code>MessageBus</code> 用 async queue 承接 gateway inbound/outbound。</li>
<li><code>CronSchedule/CronJob/CronRunRecord</code> 是定时任务持久化模型。</li>
</ul>
<ul class="file-list">
<li>beaver/foundation/config/schema.py</li>
<li>beaver/foundation/config/loader.py</li>
<li>beaver/foundation/events/message_bus.py</li>
<li>beaver/foundation/models/cron.py</li>
<li>beaver/foundation/embedding.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">interfaces</div>
<h3>薄入口层</h3>
<p>负责把 HTTP、WebSocket、CLI、Gateway、MCP server 的输入转换成服务层调用,不应保存核心执行逻辑。</p>
<ul class="bullets">
<li>Web app lifespan 启动 <code>AgentService</code> running mode、<code>CronService</code> 和可选 skill learning worker。</li>
<li><code>/api/chat</code><code>/ws/{session_id}</code> 都委托给 <code>_run_web_direct()</code> / <code>AgentService</code></li>
<li>文件 API 分两类:聊天附件 <code>workspace/files/&lt;id&gt;</code> 与 workspace 浏览/上传/预览。</li>
<li>MCP interface 暴露 memory/tools serverGateway 用 <code>MessageBus</code> 桥接渠道。</li>
</ul>
<ul class="file-list">
<li>beaver/interfaces/web/app.py</li>
<li>beaver/interfaces/web/files.py</li>
<li>beaver/interfaces/cli/main.py</li>
<li>beaver/interfaces/gateway/main.py</li>
<li>beaver/interfaces/mcp/*.py</li>
<li>beaver/interfaces/channels/*.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">services</div>
<h3>应用服务编排层</h3>
<p>负责把入口请求转成系统内部流程agent 运行、task mode、cron、team、memory、skill hub、process projection。</p>
<ul class="bullets">
<li><code>AgentService</code> 是主入口,区分 direct mode 和 running mode。</li>
<li><code>_process_with_main_agent()</code> 先做意图分类,再决定是否进入 Task。</li>
<li><code>_run_task_mode()</code> 管理 task planning、team 执行、主 agent synthesis、evidence 记录和用户验收状态。</li>
<li><code>CronService</code> 负责持久化定时任务、计算下一次运行、记录 history。</li>
<li><code>SessionProcessProjector</code> 把隐藏 task/team 事件投影给前端过程视图。</li>
</ul>
<ul class="file-list">
<li>beaver/services/agent_service.py</li>
<li>beaver/services/team_service.py</li>
<li>beaver/services/cron_service.py</li>
<li>beaver/services/process_service.py</li>
<li>beaver/services/skillhub_service.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">engine</div>
<h3>统一 agent 运行内核</h3>
<p>这是主 agent 和 delegated agent 共用的核心。它装配 runtime构建上下文选择技能和工具驱动 provider/tool loop并记录所有运行事件。</p>
<ul class="bullets">
<li><code>EngineLoader</code> 装配 session、memory、run store、skill store、tool registry、MCP manager、task/evidence 服务。</li>
<li><code>AgentLoop.process_direct()</code> 是单次运行主链running mode 下只能通过 queue <code>submit_direct()</code></li>
<li>每个 run 独立捕获 frozen memory snapshot避免 parallel team runs 共享快照互相污染。</li>
<li>运行时写入 <code>run_started</code>、skill activation、tool selection、LLM request、tool result、run completed/failed 等事件。</li>
</ul>
<ul class="file-list">
<li>beaver/engine/loader.py</li>
<li>beaver/engine/loop.py</li>
<li>beaver/engine/context/builder.py</li>
<li>beaver/engine/providers/*.py</li>
<li>beaver/engine/session/*.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">providers</div>
<h3>模型 provider 抽象与选路</h3>
<p>把不同模型网关统一成 <code>LLMProvider.chat()</code>,返回统一 <code>LLMResponse</code><code>ToolCallRequest</code></p>
<ul class="bullets">
<li><code>ProviderRuntime</code> 描述解析后的 provider、model、api mode、凭据、headers、routing。</li>
<li><code>ProviderBundle</code> 同时包含 main、fallback、auxiliary、embedding runtime。</li>
<li><code>FallbackProviderChain</code> 在主 provider 返回 error 或抛异常时按单次调用切到 fallback。</li>
<li>实现包含 LiteLLM、Anthropic、OpenAI Codex API、OpenAI-compatible custom。</li>
</ul>
<ul class="file-list">
<li>beaver/engine/providers/base.py</li>
<li>beaver/engine/providers/runtime.py</li>
<li>beaver/engine/providers/factory.py</li>
<li>beaver/engine/providers/registry.py</li>
<li>beaver/engine/providers/litellm.py</li>
<li>beaver/engine/providers/anthropic.py</li>
<li>beaver/engine/providers/codex.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">tasks</div>
<h3>内部 Task、事实证据和用户验收</h3>
<p>负责把“需要执行和跟踪”的用户请求变成可持久化、可重试、可验收的 Task。</p>
<ul class="bullets">
<li><code>MainAgentRouter</code> 使用 LLM JSON 决策区分 simple/task/continue/revise/close/abandon。</li>
<li><code>TaskExecutionPlanner</code> 让辅助模型选择 single 或 team并限制 team 节点最多 6 个。</li>
<li><code>TaskSkillResolver</code> 为 team node 匹配 published skill没有匹配时生成 one-run ephemeral guidance。</li>
<li><code>EvidenceBuilder</code> 只记录事实证据Task 是否完成只由用户验收决定。</li>
</ul>
<ul class="file-list">
<li>beaver/tasks/models.py</li>
<li>beaver/tasks/service.py</li>
<li>beaver/tasks/router.py</li>
<li>beaver/tasks/planner.py</li>
<li>beaver/tasks/skill_resolver.py</li>
<li>beaver/tasks/evidence.py</li>
<li>beaver/tasks/store.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">coordinator</div>
<h3>多 agent / team 编排</h3>
<p>负责把 team execution graph 转成多个 delegated runs。v1 真正实现的是 sequence、parallel、DAG其它 strategy 目前保留但未实现。</p>
<ul class="bullets">
<li><code>ExecutionGraph.validate()</code> 校验节点唯一、依赖存在、无环,以及 strategy 是否已实现。</li>
<li><code>TeamGraphScheduler</code> 按策略运行节点,失败依赖会把后续节点标记 blocked。</li>
<li><code>LocalAgentRunner</code> 为每个节点生成 child session并仍调用同一个 <code>AgentLoop</code></li>
<li>Agent registry 和 LocalSubagentStore 支持管理 specialist/subagent但当前 Task 主链主要走 generic skill worker。</li>
</ul>
<ul class="file-list">
<li>beaver/coordinator/models.py</li>
<li>beaver/coordinator/execution/scheduler.py</li>
<li>beaver/coordinator/local.py</li>
<li>beaver/coordinator/registry/*.py</li>
<li>beaver/coordinator/subagents.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">tools</div>
<h3>工具契约、选择和执行</h3>
<p>负责把内建工具和 MCP 工具统一暴露为 provider function schema并在 tool loop 里执行模型返回的调用。</p>
<ul class="bullets">
<li><code>ToolSpec</code> 是工具元数据和 schema 的事实来源,可导出 MCP descriptor 和 provider schema。</li>
<li><code>ToolAssembler</code> 按 always tools、skill tool hints、embedding retrieval 选择本轮工具。</li>
<li><code>ToolExecutor</code> 兼容 <code>ToolCallRequest</code> 和 OpenAI 风格 dict解析参数并调用 registry。</li>
<li>内建工具覆盖 memory、session search、filesystem、web fetch/search、terminal/process/code、cron、skill admin、delegation utility。</li>
</ul>
<ul class="file-list">
<li>beaver/tools/base.py</li>
<li>beaver/tools/registry/tool_registry.py</li>
<li>beaver/tools/runtime/executor.py</li>
<li>beaver/tools/assembler/task_assembler.py</li>
<li>beaver/tools/builtins/*.py</li>
<li>beaver/tools/mcp/wrapper.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">skills</div>
<h3>技能目录、选择、生命周期和学习</h3>
<p>负责发现、选择、注入、版本化、审核、发布和自动学习 Beaver skills。</p>
<ul class="bullets">
<li><code>SkillsLoader</code> 读取 workspace published skills、plugin/extra dirs、builtin skills解析 frontmatter 和工具提示。</li>
<li><code>SkillAssembler</code> 用 embedding 召回候选,再用 LLM 做 shortlist/final 选择,并返回 <code>SkillContext</code></li>
<li><code>SkillSpecStore</code> 管理 <code>skill.json</code><code>current.json</code>、versions、drafts、reviews。</li>
<li><code>SkillLearningPipelineService</code> 协调 candidate -> draft -> safety/eval -> review -> approve -> publish。</li>
</ul>
<ul class="file-list">
<li>beaver/skills/catalog/*.py</li>
<li>beaver/skills/assembler/*.py</li>
<li>beaver/skills/specs/*.py</li>
<li>beaver/skills/drafts/service.py</li>
<li>beaver/skills/reviews/service.py</li>
<li>beaver/skills/publisher/service.py</li>
<li>beaver/skills/learning/*.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">memory</div>
<h3>会话、长期记忆、运行记忆和学习状态</h3>
<p>负责保存对话事件、长期记忆、run receipt、skill effect、skill learning candidates 和安全/eval 报告。</p>
<ul class="bullets">
<li>会话存 SQLite包含 <code>sessions</code><code>messages</code> 和 FTS5 <code>messages_fts</code></li>
<li>长期记忆只有 <code>MEMORY.md</code><code>USER.md</code> 两个桶,写入前扫描 prompt injection / secret exfiltration 风险。</li>
<li>run memory 用 JSONL 保存 <code>RunRecord</code><code>SkillEffectRecord</code></li>
<li>skill learning store 维护候选状态、performance snapshot、safety report、eval report。</li>
</ul>
<ul class="file-list">
<li>beaver/engine/session/*.py</li>
<li>beaver/memory/curated/*.py</li>
<li>beaver/memory/runs/*.py</li>
<li>beaver/memory/skills/*.py</li>
<li>beaver/memory/search/transcript_store.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">integrations</div>
<h3>外部系统与协议集成</h3>
<p>负责连接 AuthZ、MCP 和 Outlook。WhatsApp、A2A、providers 目录当前主要是占位。</p>
<ul class="bullets">
<li><code>MCPConnectionManager</code> 支持 stdio 和 streamable HTTP MCP server并把远端 tools 注册成 <code>mcp_{server}_{tool}</code></li>
<li>远端 MCP 可用 AuthZ backend token 模式,通过 backend identity 换取 bearer token。</li>
<li>Outlook integration 通过 AuthZ 或直接凭据连接,维护 workspace meta提供 status/overview/messages/events/detail。</li>
<li><code>AuthzClient</code> 负责用户/backend 注册、权限查询、token 签发。</li>
</ul>
<ul class="file-list">
<li>beaver/integrations/mcp/connection.py</li>
<li>beaver/integrations/authz/client.py</li>
<li>beaver/integrations/outlook/__init__.py</li>
<li>beaver/integrations/a2a/__init__.py</li>
<li>beaver/integrations/whatsapp/__init__.py</li>
</ul>
</section>
<section class="module risk">
<div class="module-label">permissions</div>
<h3>权限与治理层</h3>
<p>目录已经存在但当前基本是空骨架。实际权限约束主要散落在具体工具、workspace path 校验、memory safety 和 skill draft safety 中。</p>
<ul class="bullets">
<li><code>permissions/guards</code><code>policies</code><code>profiles</code> 只有 docstring。</li>
<li><code>ToolsConfig.restrict_to_workspace</code> 已在配置 schema 里存在,但需要逐工具核对是否真正执行。</li>
<li>后续如果要做能力治理应把工具执行、MCP sensitive 标记、provider/terminal/file 操作统一接入这里。</li>
</ul>
<ul class="file-list">
<li>beaver/permissions/__init__.py</li>
<li>beaver/permissions/guards/__init__.py</li>
<li>beaver/permissions/policies/__init__.py</li>
<li>beaver/permissions/profiles/__init__.py</li>
</ul>
</section>
</div>
</section>
<section class="section">
<div class="section-head">
<div>
<h2>核心数据落点</h2>
<p>这些文件/数据库是运行后最重要的事实来源。后续核对行为是否符合预期时,优先看这里。</p>
</div>
<div class="kicker">PERSISTENCE MAP</div>
</div>
<div class="table-wrap">
<table>
<thead>
<tr>
<th>数据</th>
<th>位置</th>
<th>写入者</th>
<th>用途</th>
</tr>
</thead>
<tbody>
<tr>
<td>Session / transcript event stream</td>
<td><code>&lt;workspace&gt;/sessions/state.db</code></td>
<td><code>SessionManager</code> / <code>AgentLoop</code></td>
<td>保存可见对话、隐藏 system snapshots、tool calls/results、run lifecycle、usage、FTS 搜索。</td>
</tr>
<tr>
<td>Task records</td>
<td><code>&lt;workspace&gt;/tasks/tasks.json</code></td>
<td><code>TaskService</code></td>
<td>保存 task goal/status/run_ids/skill_names/acceptance history。</td>
</tr>
<tr>
<td>Task events</td>
<td><code>&lt;workspace&gt;/tasks/events.jsonl</code></td>
<td><code>TaskService</code></td>
<td>保存 created/run_started/run_completed/evidence_recorded/accepted/revised/closed/abandoned。</td>
</tr>
<tr>
<td>Curated memory</td>
<td><code>&lt;workspace&gt;/memory/curated/MEMORY.md</code>, <code>USER.md</code></td>
<td><code>MemoryTool</code> / <code>MemoryStore</code></td>
<td>长期注入 prompt 的稳定事实;每个 run 冻结快照。</td>
</tr>
<tr>
<td>Run receipts / skill effects</td>
<td><code>&lt;workspace&gt;/memory/runs/*.jsonl</code></td>
<td><code>AgentLoop</code> / <code>AgentService</code> 用户验收入口</td>
<td>skill learning 的原始执行证据、用户验收事件和 final accepted run 标记。</td>
</tr>
<tr>
<td>Skills lifecycle</td>
<td><code>&lt;workspace&gt;/skills/&lt;name&gt;/...</code></td>
<td><code>SkillSpecStore</code> / draft/review/publisher services</td>
<td>published versions、drafts、reviews、current version、supporting files。</td>
</tr>
<tr>
<td>Skill learning state</td>
<td><code>&lt;workspace&gt;/memory/skills/...</code></td>
<td><code>SkillLearningStore</code></td>
<td>候选、performance snapshot、safety report、eval report。</td>
</tr>
<tr>
<td>Cron jobs and runs</td>
<td><code>&lt;workspace&gt;/cron/jobs.json</code></td>
<td><code>CronService</code></td>
<td>定时任务配置、next_run、history、notification/task linkage。</td>
</tr>
<tr>
<td>Agent registry / subagents</td>
<td><code>&lt;workspace&gt;/agents/registry.json</code>, <code>*_agent/AGENTS.json</code></td>
<td><code>AgentRegistry</code> / <code>LocalSubagentStore</code></td>
<td>管理 builtin/workspace/learned agents 和本地 sub-agent workspace。</td>
</tr>
</tbody>
</table>
</div>
</section>
<section class="section">
<div class="section-head">
<div>
<h2>关键流程拆解</h2>
<p>这些流程是后续逐模块修改时最容易产生偏差的地方。</p>
</div>
<div class="kicker">CONTROL FLOWS</div>
</div>
<div class="cols-2">
<div class="panel">
<h2>Simple chat</h2>
<ul class="checks">
<li>入口调用 <code>AgentService._process_with_main_agent()</code></li>
<li><code>MainAgentRouter</code> 返回非 task。</li>
<li>关闭 skill assembly 和 tools<code>include_skill_assembly=False</code><code>include_tools=False</code></li>
<li>仍通过 <code>AgentLoop</code> 写 session/run 事件,但不创建 Task。</li>
</ul>
</div>
<div class="panel">
<h2>Task mode single</h2>
<ul class="checks">
<li>创建或复用 open task。</li>
<li>planner 返回 single主 agent 直接运行。</li>
<li>运行后构建 <code>TaskEvidencePacket</code></li>
<li>运行后状态变 <code>awaiting_acceptance</code>;用户 accept/revise/abandon 决定关闭、修订或放弃。</li>
</ul>
</div>
<div class="panel">
<h2>Task mode team</h2>
<ul class="checks">
<li>planner 生成 <code>ExecutionGraph</code></li>
<li><code>TaskSkillResolver</code> 给节点绑定 published skill 或 ephemeral guidance。</li>
<li><code>TeamService</code> 运行节点,节点仍调用 <code>AgentLoop</code></li>
<li>主 agent synthesis 使用 team evidence通常关闭工具调用避免重复执行子 agent 已做的事情。</li>
</ul>
</div>
<div class="panel">
<h2>Skill learning</h2>
<ul class="checks">
<li>每个 run 记录 activated skill receipt 和 effect。</li>
<li>用户 accept task 后才生成候选;证据包含整个 task 的所有 runs并标记 final_accepted_run_id。</li>
<li>worker 只生成 draft、做 safety/eval不自动 approve/publish。</li>
<li>publish 必须有 approved review、passing safety、没有失败 evalhigh risk 还需要显式确认。</li>
</ul>
</div>
</div>
</section>
<section class="section">
<div class="section-head">
<div>
<h2>后续核对问题清单</h2>
<p>这些问题适合配合 brainstorming / grill-me 逐模块核对想法和现有项目是否一致。</p>
</div>
<div class="kicker">REVIEW PROMPTS</div>
</div>
<div class="matrix">
<div class="key">产品目标</div>
<div>这个后端当前更像“任务型 agent runtime”不是普通聊天后端。你想保留 Task runtime 的 Plan -> Run -> Evidence -> User Acceptance 主体验,还是把它降级成可选高级模式?</div>
<div class="key">主入口边界</div>
<div><code>interfaces/web/app.py</code> 已经超过 3000 行,包含 auth、files、skills、cron、chat 等。后续是否要拆 route 模块,还是先保持单文件以降低迁移风险?</div>
<div class="key">Task 自动化程度</div>
<div>现在 Main Agent 会自动 Task 化复杂请求。你是否接受模型分类误差?是否需要用户显式确认创建 Task</div>
<div class="key">Team 执行策略</div>
<div>当前真正实现 sequence / parallel / DAG其它策略只是保留枚举。是否要支持更多 coordinator还是坚持 v1 只做三种稳定策略?</div>
<div class="key">Agent registry 角色</div>
<div>registry/search/target resolver 已存在,但 Task 主线主要绑定技能而不是 specialist agent。你希望 team node 优先找 specialist agent还是继续 generic skill worker</div>
<div class="key">权限治理</div>
<div>permissions 目录目前是骨架。terminal、filesystem、web、MCP、Outlook 等能力是否需要统一 policy gate</div>
<div class="key">Skill 学习闭环</div>
<div>候选生成应依赖 task accepted。你希望只从用户接受的 task evidence 学习,还是允许人工从 abandoned/revised 历史中手动创建候选?</div>
<div class="key">外部集成</div>
<div>Outlook/AuthZ/MCP 已经比较具体A2A/WhatsApp 仍是占位。后续应该优先补协议,还是先收紧已有集成的权限和错误处理?</div>
</div>
</section>
<section class="section">
<div class="section-head">
<div>
<h2>代码观察与风险点</h2>
<p>这些不是修改建议的最终结论,只是阅读代码后值得后续逐项核对的偏差点。</p>
</div>
<div class="kicker">OPEN RISKS</div>
</div>
<div class="cols-2">
<div class="note risk">
<h3>定时 Task 路径存在明显变量错误</h3>
<p>
<code>AgentService.run_scheduled_task()</code> 末尾更新 assistant event payload 时引用了 <code>job.id</code><code>run.scheduled_run_id</code><code>job.name</code>
但该函数参数只有 <code>cron_job_id</code><code>cron_job_name</code><code>scheduled_run_id</code>。这条路径如果执行到这里会触发 <code>NameError</code>
</p>
</div>
<div class="note risk">
<h3>权限层还没有真正成为执行闸门</h3>
<p>
<code>permissions</code> 目录为空骨架,实际保护分散在工具实现和路径校验里。若后续开放 terminal、filesystem、MCP sensitive tools需要统一执行前 policy。
</p>
</div>
<div class="note risk">
<h3>Web auth 是本地单用户风格</h3>
<p>
本地 auth 文件以 username/password 字段读写,使用 token/handoff code 做前端会话。若目标是多用户或公网后端需要重新评估密码存储、token 生命周期和权限边界。
</p>
</div>
<div class="note risk">
<h3>Skill eval 目前偏轻量启发式</h3>
<p>
<code>SkillDraftEvaluator</code> 基于历史 accepted task evidence 和草稿长度/内容做 bounded report不是真正 replay。它只属于 skill draft 治理,不属于 Task runtime。
</p>
</div>
<div class="note risk">
<h3>接口层过大</h3>
<p>
<code>interfaces/web/app.py</code> 同时承载 app factory、lifespan、auth、provider config、sessions、files、agents、MCP、Outlook、skills、cron、chat、helper functions。
后续修改时容易产生跨功能回归。
</p>
</div>
<div class="note">
<h3>已经落地的稳定点</h3>
<p>
<span class="ok">可依赖:</span>统一 <code>AgentLoop</code>、session event stream、Task evidence/acceptance 状态、team graph v1、skill lifecycle gates、MCP wrapper、workspace path containment。
</p>
</div>
</div>
</section>
<section class="section">
<div class="section-head">
<div>
<h2>测试覆盖信号</h2>
<p>单元测试覆盖了当前后端多数关键行为,可作为后续修改文档的回归索引。</p>
</div>
<div class="kicker">TEST INDEX</div>
</div>
<div class="cols-4">
<div class="panel">
<h2>Task / acceptance</h2>
<p><code>test_task_mode_feedback.py</code>, <code>test_task_evidence.py</code>, <code>test_task_execution_planner.py</code>, <code>test_task_skill_resolver.py</code></p>
</div>
<div class="panel">
<h2>Engine / providers</h2>
<p><code>test_websocket_chat.py</code>, <code>test_main_agent_router.py</code>, <code>test_litellm_thinking_mode.py</code>, <code>test_imports.py</code></p>
</div>
<div class="panel">
<h2>Team / process</h2>
<p><code>test_agent_team_v1.py</code>, <code>test_agent_registry_resolver.py</code>, <code>test_process_projection.py</code></p>
</div>
<div class="panel">
<h2>Skills / tools / web</h2>
<p><code>test_phase5_skills_runtime.py</code>, <code>test_skill_learning_*.py</code>, <code>test_tool_assembler.py</code>, <code>test_web_files_api.py</code></p>
</div>
</div>
</section>
</article>
</main>
</body>
</html>