Files
beaver_project/projcet_review/backend_module_blueprint.html
steven_li 6e9e74d1ee feat(engine): 添加运行时上下文支持并重构工具迭代限制
添加 RuntimeContext 类用于捕获模型运行时的日期时间信息,
包括UTC时间、本地时间和时区信息,并在系统提示中显示这些信息。

同时增加最大上下文消息数和工具迭代次数的配置选项,
将验证服务从引擎加载器中移除,并更新相关的数据结构和接口。

BREAKING CHANGE: 移除了验证服务,相关字段被替换为证据状态和接受状态。

- 添加 RuntimeContext 类和相关渲染方法
- 增加 max_context_messages 和 max_tool_iterations 配置
- 移除 ValidationService 相关代码
- 更新消息记录中的验证状态字段
- 添加原始工具调用检测和回退处理
2026-05-26 11:18:35 +08:00

955 lines
39 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Beaver Backend Module Blueprint</title>
<style>
:root {
--c-bg: #f8fafc;
--c-canvas: #ffffff;
--c-border: #cbd5e1;
--c-border-strong: #94a3b8;
--c-text-main: #0f172a;
--c-text-sub: #64748b;
--c-text-soft: #475569;
--c-accent: #111827;
--c-risk: #b91c1c;
--c-ok: #166534;
--font-ui: Inter, Helvetica, Arial, sans-serif;
--font-mono: "JetBrains Mono", Consolas, "Liberation Mono", monospace;
}
* {
box-sizing: border-box;
}
body {
margin: 0;
min-height: 100vh;
background: var(--c-bg);
color: var(--c-text-main);
font-family: var(--font-ui);
line-height: 1.55;
}
a {
color: inherit;
text-decoration: underline;
text-decoration-thickness: 1px;
text-underline-offset: 2px;
}
.page {
width: min(1500px, 100%);
margin: 0 auto;
padding: 32px;
}
.diagram-canvas {
background: var(--c-canvas);
border: 1px solid var(--c-border);
padding: 32px;
}
.diagram-header {
display: grid;
grid-template-columns: minmax(0, 1fr) auto;
gap: 24px;
align-items: start;
border-bottom: 1px solid var(--c-border);
padding-bottom: 18px;
margin-bottom: 24px;
}
.diagram-title {
margin: 0 0 6px;
font-size: 24px;
font-weight: 700;
letter-spacing: 0;
}
.diagram-subtitle,
.meta-line,
.kicker {
font-family: var(--font-mono);
font-size: 11px;
color: var(--c-text-sub);
text-transform: uppercase;
letter-spacing: 0.05em;
}
.meta-box {
border: 1px solid var(--c-border);
padding: 10px 12px;
min-width: 280px;
font-family: var(--font-mono);
font-size: 12px;
color: var(--c-text-soft);
}
.summary {
display: grid;
grid-template-columns: 1.15fr 0.85fr;
gap: 16px;
margin-bottom: 18px;
}
.panel,
.module,
.flow-box,
.note,
.table-wrap {
border: 1px solid var(--c-border);
background: var(--c-canvas);
}
.panel {
padding: 16px;
}
.panel h2,
.section h2 {
margin: 0 0 10px;
font-size: 17px;
letter-spacing: 0;
}
.panel p,
.module p,
.note p {
margin: 0;
color: var(--c-text-soft);
font-size: 13px;
}
.badge-row {
display: flex;
flex-wrap: wrap;
gap: 6px;
margin-top: 12px;
}
.badge {
display: inline-block;
border: 1px solid var(--c-border);
padding: 2px 6px;
font-family: var(--font-mono);
font-size: 10px;
color: var(--c-text-sub);
white-space: nowrap;
}
.badge-solid {
border-color: var(--c-accent);
background: var(--c-accent);
color: var(--c-canvas);
}
.section {
margin-top: 24px;
border-top: 1px solid var(--c-border);
padding-top: 24px;
}
.section-head {
display: grid;
grid-template-columns: minmax(0, 1fr) auto;
gap: 16px;
align-items: end;
margin-bottom: 14px;
}
.section-head p {
margin: 4px 0 0;
max-width: 980px;
color: var(--c-text-sub);
font-size: 13px;
}
.module-grid {
display: grid;
grid-template-columns: repeat(3, minmax(0, 1fr));
gap: 12px;
}
.module {
padding: 14px;
display: flex;
flex-direction: column;
gap: 10px;
min-height: 260px;
}
.module h3 {
margin: 0;
font-size: 15px;
letter-spacing: 0;
}
.module-label {
font-family: var(--font-mono);
font-size: 11px;
color: var(--c-text-sub);
}
.file-list,
.bullets,
.checks {
margin: 0;
padding-left: 18px;
color: var(--c-text-soft);
font-size: 13px;
}
.file-list {
font-family: var(--font-mono);
font-size: 11px;
line-height: 1.55;
}
.flow {
display: grid;
grid-template-columns: repeat(7, minmax(0, 1fr));
gap: 10px;
align-items: stretch;
}
.flow-box {
min-height: 118px;
padding: 12px;
position: relative;
}
.flow-box::after {
content: "";
position: absolute;
top: 50%;
right: -10px;
width: 10px;
border-top: 1px solid var(--c-border-strong);
}
.flow-box:last-child::after {
display: none;
}
.flow-box h3 {
margin: 0 0 8px;
font-size: 13px;
}
.flow-box p {
margin: 0;
color: var(--c-text-soft);
font-size: 12px;
}
.matrix {
display: grid;
grid-template-columns: 280px minmax(0, 1fr);
border-top: 1px solid var(--c-border);
border-left: 1px solid var(--c-border);
}
.matrix > div {
border-right: 1px solid var(--c-border);
border-bottom: 1px solid var(--c-border);
padding: 10px 12px;
font-size: 13px;
}
.matrix .key {
font-family: var(--font-mono);
color: var(--c-text-main);
background: #f8fafc;
}
.table-wrap {
overflow-x: auto;
}
table {
width: 100%;
min-width: 980px;
border-collapse: collapse;
font-size: 13px;
}
th,
td {
border-bottom: 1px solid var(--c-border);
border-right: 1px solid var(--c-border);
padding: 10px 12px;
vertical-align: top;
text-align: left;
}
th {
font-family: var(--font-mono);
font-size: 11px;
text-transform: uppercase;
letter-spacing: 0.05em;
color: var(--c-text-sub);
background: #f8fafc;
}
tr:last-child td {
border-bottom: 0;
}
th:last-child,
td:last-child {
border-right: 0;
}
code,
.mono {
font-family: var(--font-mono);
font-size: 0.92em;
color: var(--c-text-main);
}
.risk {
border-color: #fecaca;
}
.risk h3,
.risk .module-label {
color: var(--c-risk);
}
.ok {
color: var(--c-ok);
font-weight: 600;
}
.cols-2 {
display: grid;
grid-template-columns: repeat(2, minmax(0, 1fr));
gap: 12px;
}
.cols-4 {
display: grid;
grid-template-columns: repeat(4, minmax(0, 1fr));
gap: 12px;
}
.note {
padding: 14px;
}
@media (max-width: 1180px) {
.module-grid,
.summary,
.cols-4 {
grid-template-columns: 1fr 1fr;
}
.flow {
grid-template-columns: 1fr 1fr;
}
.flow-box::after {
display: none;
}
}
@media (max-width: 760px) {
.page {
padding: 12px;
}
.diagram-canvas {
padding: 16px;
}
.diagram-header,
.summary,
.section-head,
.module-grid,
.cols-2,
.cols-4,
.flow,
.matrix {
grid-template-columns: 1fr;
}
.meta-box {
min-width: 0;
}
}
</style>
</head>
<body>
<main class="page">
<article class="diagram-canvas">
<header class="diagram-header">
<div>
<div class="diagram-title">Beaver Backend Module Blueprint</div>
<div class="diagram-subtitle">Flat Engineering Blueprint / app-instance/backend / 2026-05-25</div>
</div>
<div class="meta-box">
SOURCE: <span class="mono">app-instance/backend</span><br>
STYLE: <span class="mono">projcet_review/blueprinter.md</span><br>
SCOPE: <span class="mono">backend code + tests + architecture docs</span><br>
MULTI-PAGE: <a href="backend_blueprint/index.html">backend_blueprint/index.html</a>
</div>
</header>
<section class="summary">
<div class="panel">
<h2>项目是干嘛的</h2>
<p>
Beaver 后端是一个面向用户任务的 agent runtime。它接收来自 Web、WebSocket、CLI、Gateway、Cron 或 MCP 的请求,
用 Main Agent 判断这是不是一个需要跟踪的 Task简单问题直接回复复杂任务进入 Task mode。Task mode 会规划单 agent
或 team 执行,运行统一的 <code>AgentLoop</code>,选择技能和工具,调用模型,记录事实证据,并等待用户接受、修改或放弃。
只有用户接受后的 Task evidence 才会沉淀为可学习的 skill 候选。
</p>
<div class="badge-row">
<span class="badge-solid badge">UNIFIED ENGINE</span>
<span class="badge">TASK MODE</span>
<span class="badge">TEAM COORDINATOR</span>
<span class="badge">SKILL LEARNING</span>
<span class="badge">MCP TOOLS</span>
<span class="badge">SCHEDULED TASKS</span>
</div>
</div>
<div class="panel">
<h2>最关键的架构判断</h2>
<p>
主 agent、team node、sub-agent 都不各自实现一套 runtime它们最后都回到同一个 <code>beaver.engine.AgentLoop</code>
因此后续修改时要优先确认:入口层是不是薄的,服务层是不是只编排,真正 tool loop / prompt / provider / session 逻辑是不是仍在 engine 内收口。
</p>
<div class="badge-row">
<span class="badge">interfaces -> services</span>
<span class="badge">services -> engine</span>
<span class="badge">engine -> skills/tools/memory</span>
</div>
</div>
</section>
<section class="section">
<div class="section-head">
<div>
<h2>主执行流</h2>
<p>这是后端最重要的一条路径,后续逐模块修改文档应该先对齐这条链路。</p>
</div>
<div class="kicker">CHAT / TASK / ACCEPTANCE / LEARNING</div>
</div>
<div class="flow">
<div class="flow-box">
<h3>1. 入口接收</h3>
<p><code>/api/chat</code>、WebSocket、CLI、Gateway 或 Cron 把用户消息转给 <code>AgentService</code></p>
</div>
<div class="flow-box">
<h3>2. 意图路由</h3>
<p><code>MainAgentRouter</code> 结合 active task 和近期会话,判断 simple / new_task / continue / revise / close / abandon。</p>
</div>
<div class="flow-box">
<h3>3. Task 建模</h3>
<p><code>TaskService</code> 写入 <code>tasks.json</code><code>events.jsonl</code>,维护 open/running/awaiting_acceptance/closed 状态。</p>
</div>
<div class="flow-box">
<h3>4. 执行规划</h3>
<p><code>TaskExecutionPlanner</code> 让辅助模型选择 single 或 team并为 team 生成 sequence / parallel / DAG 节点。</p>
</div>
<div class="flow-box">
<h3>5. 统一运行</h3>
<p><code>AgentLoop</code> 冻结 memory选 skill选 tool构建 prompt调用 provider执行 tool loop。</p>
</div>
<div class="flow-box">
<h3>6. 事实证据</h3>
<p><code>EvidenceBuilder</code> 汇总 run/team/tool 证据。Evidence 只记录事实,不判断、不打分、不 gate。</p>
</div>
<div class="flow-box">
<h3>7. 验收学习</h3>
<p>用户接受 Task 后生成 accepted task evidence 和 learning candidatesworker 可生成 draft但不会自动 approve/publish。</p>
</div>
</div>
</section>
<section class="section">
<div class="section-head">
<div>
<h2>模块总览</h2>
<p>每个模块下面都写明责任、逻辑、具体怎么做,以及关键文件。</p>
</div>
<div class="kicker">MODULE RESPONSIBILITY MAP</div>
</div>
<div class="module-grid">
<section class="module">
<div class="module-label">foundation</div>
<h3>底层配置、事件和通用模型</h3>
<p>负责加载实例级配置、定义 provider/MCP/AuthZ/backend identity schema、提供 message bus 和 cron 数据模型。</p>
<ul class="bullets">
<li>配置来源优先级:<code>BEAVER_CONFIG_PATH</code><code>BEAVER_HOME/config.json</code>、workspace 下 <code>.beaver/config.json</code></li>
<li><code>BeaverConfig.resolve_provider_target()</code> 从默认模型、显式 provider 和已配置凭据推导运行目标。</li>
<li><code>MessageBus</code> 用 async queue 承接 gateway inbound/outbound。</li>
<li><code>CronSchedule/CronJob/CronRunRecord</code> 是定时任务持久化模型。</li>
</ul>
<ul class="file-list">
<li>beaver/foundation/config/schema.py</li>
<li>beaver/foundation/config/loader.py</li>
<li>beaver/foundation/events/message_bus.py</li>
<li>beaver/foundation/models/cron.py</li>
<li>beaver/foundation/embedding.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">interfaces</div>
<h3>薄入口层</h3>
<p>负责把 HTTP、WebSocket、CLI、Gateway、MCP server 的输入转换成服务层调用,不应保存核心执行逻辑。</p>
<ul class="bullets">
<li>Web app lifespan 启动 <code>AgentService</code> running mode、<code>CronService</code> 和可选 skill learning worker。</li>
<li><code>/api/chat</code><code>/ws/{session_id}</code> 都委托给 <code>_run_web_direct()</code> / <code>AgentService</code></li>
<li>文件 API 分两类:聊天附件 <code>workspace/files/&lt;id&gt;</code> 与 workspace 浏览/上传/预览。</li>
<li>MCP interface 暴露 memory/tools serverGateway 用 <code>MessageBus</code> 桥接渠道。</li>
</ul>
<ul class="file-list">
<li>beaver/interfaces/web/app.py</li>
<li>beaver/interfaces/web/files.py</li>
<li>beaver/interfaces/cli/main.py</li>
<li>beaver/interfaces/gateway/main.py</li>
<li>beaver/interfaces/mcp/*.py</li>
<li>beaver/interfaces/channels/*.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">services</div>
<h3>应用服务编排层</h3>
<p>负责把入口请求转成系统内部流程agent 运行、task mode、cron、team、memory、skill hub、process projection。</p>
<ul class="bullets">
<li><code>AgentService</code> 是主入口,区分 direct mode 和 running mode。</li>
<li><code>_process_with_main_agent()</code> 先做意图分类,再决定是否进入 Task。</li>
<li><code>_run_task_mode()</code> 管理 task planning、team 执行、主 agent synthesis、evidence 记录和用户验收状态。</li>
<li><code>CronService</code> 负责持久化定时任务、计算下一次运行、记录 history。</li>
<li><code>SessionProcessProjector</code> 把隐藏 task/team 事件投影给前端过程视图。</li>
</ul>
<ul class="file-list">
<li>beaver/services/agent_service.py</li>
<li>beaver/services/team_service.py</li>
<li>beaver/services/cron_service.py</li>
<li>beaver/services/process_service.py</li>
<li>beaver/services/skillhub_service.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">engine</div>
<h3>统一 agent 运行内核</h3>
<p>这是主 agent 和 delegated agent 共用的核心。它装配 runtime构建上下文选择技能和工具驱动 provider/tool loop并记录所有运行事件。</p>
<ul class="bullets">
<li><code>EngineLoader</code> 装配 session、memory、run store、skill store、tool registry、MCP manager、task/evidence 服务。</li>
<li><code>AgentLoop.process_direct()</code> 是单次运行主链running mode 下只能通过 queue <code>submit_direct()</code></li>
<li>每个 run 独立捕获 frozen memory snapshot避免 parallel team runs 共享快照互相污染。</li>
<li>运行时写入 <code>run_started</code>、skill activation、tool selection、LLM request、tool result、run completed/failed 等事件。</li>
</ul>
<ul class="file-list">
<li>beaver/engine/loader.py</li>
<li>beaver/engine/loop.py</li>
<li>beaver/engine/context/builder.py</li>
<li>beaver/engine/providers/*.py</li>
<li>beaver/engine/session/*.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">providers</div>
<h3>模型 provider 抽象与选路</h3>
<p>把不同模型网关统一成 <code>LLMProvider.chat()</code>,返回统一 <code>LLMResponse</code><code>ToolCallRequest</code></p>
<ul class="bullets">
<li><code>ProviderRuntime</code> 描述解析后的 provider、model、api mode、凭据、headers、routing。</li>
<li><code>ProviderBundle</code> 同时包含 main、fallback、auxiliary、embedding runtime。</li>
<li><code>FallbackProviderChain</code> 在主 provider 返回 error 或抛异常时按单次调用切到 fallback。</li>
<li>实现包含 LiteLLM、Anthropic、OpenAI Codex API、OpenAI-compatible custom。</li>
</ul>
<ul class="file-list">
<li>beaver/engine/providers/base.py</li>
<li>beaver/engine/providers/runtime.py</li>
<li>beaver/engine/providers/factory.py</li>
<li>beaver/engine/providers/registry.py</li>
<li>beaver/engine/providers/litellm.py</li>
<li>beaver/engine/providers/anthropic.py</li>
<li>beaver/engine/providers/codex.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">tasks</div>
<h3>内部 Task、事实证据和用户验收</h3>
<p>负责把“需要执行和跟踪”的用户请求变成可持久化、可重试、可验收的 Task。</p>
<ul class="bullets">
<li><code>MainAgentRouter</code> 使用 LLM JSON 决策区分 simple/task/continue/revise/close/abandon。</li>
<li><code>TaskExecutionPlanner</code> 让辅助模型选择 single 或 team并限制 team 节点最多 6 个。</li>
<li><code>TaskSkillResolver</code> 为 team node 匹配 published skill没有匹配时生成 one-run ephemeral guidance。</li>
<li><code>EvidenceBuilder</code> 只记录事实证据Task 是否完成只由用户验收决定。</li>
</ul>
<ul class="file-list">
<li>beaver/tasks/models.py</li>
<li>beaver/tasks/service.py</li>
<li>beaver/tasks/router.py</li>
<li>beaver/tasks/planner.py</li>
<li>beaver/tasks/skill_resolver.py</li>
<li>beaver/tasks/evidence.py</li>
<li>beaver/tasks/store.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">coordinator</div>
<h3>多 agent / team 编排</h3>
<p>负责把 team execution graph 转成多个 delegated runs。v1 真正实现的是 sequence、parallel、DAG其它 strategy 目前保留但未实现。</p>
<ul class="bullets">
<li><code>ExecutionGraph.validate()</code> 校验节点唯一、依赖存在、无环,以及 strategy 是否已实现。</li>
<li><code>TeamGraphScheduler</code> 按策略运行节点,失败依赖会把后续节点标记 blocked。</li>
<li><code>LocalAgentRunner</code> 为每个节点生成 child session并仍调用同一个 <code>AgentLoop</code></li>
<li>Agent registry 和 LocalSubagentStore 支持管理 specialist/subagent但当前 Task 主链主要走 generic skill worker。</li>
</ul>
<ul class="file-list">
<li>beaver/coordinator/models.py</li>
<li>beaver/coordinator/execution/scheduler.py</li>
<li>beaver/coordinator/local.py</li>
<li>beaver/coordinator/registry/*.py</li>
<li>beaver/coordinator/subagents.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">tools</div>
<h3>工具契约、选择和执行</h3>
<p>负责把内建工具和 MCP 工具统一暴露为 provider function schema并在 tool loop 里执行模型返回的调用。</p>
<ul class="bullets">
<li><code>ToolSpec</code> 是工具元数据和 schema 的事实来源,可导出 MCP descriptor 和 provider schema。</li>
<li><code>ToolAssembler</code> 按 always tools、skill tool hints、embedding retrieval 选择本轮工具。</li>
<li><code>ToolExecutor</code> 兼容 <code>ToolCallRequest</code> 和 OpenAI 风格 dict解析参数并调用 registry。</li>
<li>内建工具覆盖 memory、session search、filesystem、web fetch/search、terminal/process/code、cron、skill admin、delegation utility。</li>
</ul>
<ul class="file-list">
<li>beaver/tools/base.py</li>
<li>beaver/tools/registry/tool_registry.py</li>
<li>beaver/tools/runtime/executor.py</li>
<li>beaver/tools/assembler/task_assembler.py</li>
<li>beaver/tools/builtins/*.py</li>
<li>beaver/tools/mcp/wrapper.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">skills</div>
<h3>技能目录、选择、生命周期和学习</h3>
<p>负责发现、选择、注入、版本化、审核、发布和自动学习 Beaver skills。</p>
<ul class="bullets">
<li><code>SkillsLoader</code> 读取 workspace published skills、plugin/extra dirs、builtin skills解析 frontmatter 和工具提示。</li>
<li><code>SkillAssembler</code> 用 embedding 召回候选,再用 LLM 做 shortlist/final 选择,并返回 <code>SkillContext</code></li>
<li><code>SkillSpecStore</code> 管理 <code>skill.json</code><code>current.json</code>、versions、drafts、reviews。</li>
<li><code>SkillLearningPipelineService</code> 协调 candidate -> draft -> safety/eval -> review -> approve -> publish。</li>
</ul>
<ul class="file-list">
<li>beaver/skills/catalog/*.py</li>
<li>beaver/skills/assembler/*.py</li>
<li>beaver/skills/specs/*.py</li>
<li>beaver/skills/drafts/service.py</li>
<li>beaver/skills/reviews/service.py</li>
<li>beaver/skills/publisher/service.py</li>
<li>beaver/skills/learning/*.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">memory</div>
<h3>会话、长期记忆、运行记忆和学习状态</h3>
<p>负责保存对话事件、长期记忆、run receipt、skill effect、skill learning candidates 和安全/eval 报告。</p>
<ul class="bullets">
<li>会话存 SQLite包含 <code>sessions</code><code>messages</code> 和 FTS5 <code>messages_fts</code></li>
<li>长期记忆只有 <code>MEMORY.md</code><code>USER.md</code> 两个桶,写入前扫描 prompt injection / secret exfiltration 风险。</li>
<li>run memory 用 JSONL 保存 <code>RunRecord</code><code>SkillEffectRecord</code></li>
<li>skill learning store 维护候选状态、performance snapshot、safety report、eval report。</li>
</ul>
<ul class="file-list">
<li>beaver/engine/session/*.py</li>
<li>beaver/memory/curated/*.py</li>
<li>beaver/memory/runs/*.py</li>
<li>beaver/memory/skills/*.py</li>
<li>beaver/memory/search/transcript_store.py</li>
</ul>
</section>
<section class="module">
<div class="module-label">integrations</div>
<h3>外部系统与协议集成</h3>
<p>负责连接 AuthZ、MCP 和 Outlook。WhatsApp、A2A、providers 目录当前主要是占位。</p>
<ul class="bullets">
<li><code>MCPConnectionManager</code> 支持 stdio 和 streamable HTTP MCP server并把远端 tools 注册成 <code>mcp_{server}_{tool}</code></li>
<li>远端 MCP 可用 AuthZ backend token 模式,通过 backend identity 换取 bearer token。</li>
<li>Outlook integration 通过 AuthZ 或直接凭据连接,维护 workspace meta提供 status/overview/messages/events/detail。</li>
<li><code>AuthzClient</code> 负责用户/backend 注册、权限查询、token 签发。</li>
</ul>
<ul class="file-list">
<li>beaver/integrations/mcp/connection.py</li>
<li>beaver/integrations/authz/client.py</li>
<li>beaver/integrations/outlook/__init__.py</li>
<li>beaver/integrations/a2a/__init__.py</li>
<li>beaver/integrations/whatsapp/__init__.py</li>
</ul>
</section>
<section class="module risk">
<div class="module-label">permissions</div>
<h3>权限与治理层</h3>
<p>目录已经存在但当前基本是空骨架。实际权限约束主要散落在具体工具、workspace path 校验、memory safety 和 skill draft safety 中。</p>
<ul class="bullets">
<li><code>permissions/guards</code><code>policies</code><code>profiles</code> 只有 docstring。</li>
<li><code>ToolsConfig.restrict_to_workspace</code> 已在配置 schema 里存在,但需要逐工具核对是否真正执行。</li>
<li>后续如果要做能力治理应把工具执行、MCP sensitive 标记、provider/terminal/file 操作统一接入这里。</li>
</ul>
<ul class="file-list">
<li>beaver/permissions/__init__.py</li>
<li>beaver/permissions/guards/__init__.py</li>
<li>beaver/permissions/policies/__init__.py</li>
<li>beaver/permissions/profiles/__init__.py</li>
</ul>
</section>
</div>
</section>
<section class="section">
<div class="section-head">
<div>
<h2>核心数据落点</h2>
<p>这些文件/数据库是运行后最重要的事实来源。后续核对行为是否符合预期时,优先看这里。</p>
</div>
<div class="kicker">PERSISTENCE MAP</div>
</div>
<div class="table-wrap">
<table>
<thead>
<tr>
<th>数据</th>
<th>位置</th>
<th>写入者</th>
<th>用途</th>
</tr>
</thead>
<tbody>
<tr>
<td>Session / transcript event stream</td>
<td><code>&lt;workspace&gt;/sessions/state.db</code></td>
<td><code>SessionManager</code> / <code>AgentLoop</code></td>
<td>保存可见对话、隐藏 system snapshots、tool calls/results、run lifecycle、usage、FTS 搜索。</td>
</tr>
<tr>
<td>Task records</td>
<td><code>&lt;workspace&gt;/tasks/tasks.json</code></td>
<td><code>TaskService</code></td>
<td>保存 task goal/status/run_ids/skill_names/acceptance history。</td>
</tr>
<tr>
<td>Task events</td>
<td><code>&lt;workspace&gt;/tasks/events.jsonl</code></td>
<td><code>TaskService</code></td>
<td>保存 created/run_started/run_completed/evidence_recorded/accepted/revised/closed/abandoned。</td>
</tr>
<tr>
<td>Curated memory</td>
<td><code>&lt;workspace&gt;/memory/curated/MEMORY.md</code>, <code>USER.md</code></td>
<td><code>MemoryTool</code> / <code>MemoryStore</code></td>
<td>长期注入 prompt 的稳定事实;每个 run 冻结快照。</td>
</tr>
<tr>
<td>Run receipts / skill effects</td>
<td><code>&lt;workspace&gt;/memory/runs/*.jsonl</code></td>
<td><code>AgentLoop</code> / <code>AgentService</code> 用户验收入口</td>
<td>skill learning 的原始执行证据、用户验收事件和 final accepted run 标记。</td>
</tr>
<tr>
<td>Skills lifecycle</td>
<td><code>&lt;workspace&gt;/skills/&lt;name&gt;/...</code></td>
<td><code>SkillSpecStore</code> / draft/review/publisher services</td>
<td>published versions、drafts、reviews、current version、supporting files。</td>
</tr>
<tr>
<td>Skill learning state</td>
<td><code>&lt;workspace&gt;/memory/skills/...</code></td>
<td><code>SkillLearningStore</code></td>
<td>候选、performance snapshot、safety report、eval report。</td>
</tr>
<tr>
<td>Cron jobs and runs</td>
<td><code>&lt;workspace&gt;/cron/jobs.json</code></td>
<td><code>CronService</code></td>
<td>定时任务配置、next_run、history、notification/task linkage。</td>
</tr>
<tr>
<td>Agent registry / subagents</td>
<td><code>&lt;workspace&gt;/agents/registry.json</code>, <code>*_agent/AGENTS.json</code></td>
<td><code>AgentRegistry</code> / <code>LocalSubagentStore</code></td>
<td>管理 builtin/workspace/learned agents 和本地 sub-agent workspace。</td>
</tr>
</tbody>
</table>
</div>
</section>
<section class="section">
<div class="section-head">
<div>
<h2>关键流程拆解</h2>
<p>这些流程是后续逐模块修改时最容易产生偏差的地方。</p>
</div>
<div class="kicker">CONTROL FLOWS</div>
</div>
<div class="cols-2">
<div class="panel">
<h2>Simple chat</h2>
<ul class="checks">
<li>入口调用 <code>AgentService._process_with_main_agent()</code></li>
<li><code>MainAgentRouter</code> 返回非 task。</li>
<li>关闭 skill assembly 和 tools<code>include_skill_assembly=False</code><code>include_tools=False</code></li>
<li>仍通过 <code>AgentLoop</code> 写 session/run 事件,但不创建 Task。</li>
</ul>
</div>
<div class="panel">
<h2>Task mode single</h2>
<ul class="checks">
<li>创建或复用 open task。</li>
<li>planner 返回 single主 agent 直接运行。</li>
<li>运行后构建 <code>TaskEvidencePacket</code></li>
<li>运行后状态变 <code>awaiting_acceptance</code>;用户 accept/revise/abandon 决定关闭、修订或放弃。</li>
</ul>
</div>
<div class="panel">
<h2>Task mode team</h2>
<ul class="checks">
<li>planner 生成 <code>ExecutionGraph</code></li>
<li><code>TaskSkillResolver</code> 给节点绑定 published skill 或 ephemeral guidance。</li>
<li><code>TeamService</code> 运行节点,节点仍调用 <code>AgentLoop</code></li>
<li>主 agent synthesis 使用 team evidence通常关闭工具调用避免重复执行子 agent 已做的事情。</li>
</ul>
</div>
<div class="panel">
<h2>Skill learning</h2>
<ul class="checks">
<li>每个 run 记录 activated skill receipt 和 effect。</li>
<li>用户 accept task 后才生成候选;证据包含整个 task 的所有 runs并标记 final_accepted_run_id。</li>
<li>worker 只生成 draft、做 safety/eval不自动 approve/publish。</li>
<li>publish 必须有 approved review、passing safety、没有失败 evalhigh risk 还需要显式确认。</li>
</ul>
</div>
</div>
</section>
<section class="section">
<div class="section-head">
<div>
<h2>后续核对问题清单</h2>
<p>这些问题适合配合 brainstorming / grill-me 逐模块核对想法和现有项目是否一致。</p>
</div>
<div class="kicker">REVIEW PROMPTS</div>
</div>
<div class="matrix">
<div class="key">产品目标</div>
<div>这个后端当前更像“任务型 agent runtime”不是普通聊天后端。你想保留 Task runtime 的 Plan -> Run -> Evidence -> User Acceptance 主体验,还是把它降级成可选高级模式?</div>
<div class="key">主入口边界</div>
<div><code>interfaces/web/app.py</code> 已经超过 3000 行,包含 auth、files、skills、cron、chat 等。后续是否要拆 route 模块,还是先保持单文件以降低迁移风险?</div>
<div class="key">Task 自动化程度</div>
<div>现在 Main Agent 会自动 Task 化复杂请求。你是否接受模型分类误差?是否需要用户显式确认创建 Task</div>
<div class="key">Team 执行策略</div>
<div>当前真正实现 sequence / parallel / DAG其它策略只是保留枚举。是否要支持更多 coordinator还是坚持 v1 只做三种稳定策略?</div>
<div class="key">Agent registry 角色</div>
<div>registry/search/target resolver 已存在,但 Task 主线主要绑定技能而不是 specialist agent。你希望 team node 优先找 specialist agent还是继续 generic skill worker</div>
<div class="key">权限治理</div>
<div>permissions 目录目前是骨架。terminal、filesystem、web、MCP、Outlook 等能力是否需要统一 policy gate</div>
<div class="key">Skill 学习闭环</div>
<div>候选生成应依赖 task accepted。你希望只从用户接受的 task evidence 学习,还是允许人工从 abandoned/revised 历史中手动创建候选?</div>
<div class="key">外部集成</div>
<div>Outlook/AuthZ/MCP 已经比较具体A2A/WhatsApp 仍是占位。后续应该优先补协议,还是先收紧已有集成的权限和错误处理?</div>
</div>
</section>
<section class="section">
<div class="section-head">
<div>
<h2>代码观察与风险点</h2>
<p>这些不是修改建议的最终结论,只是阅读代码后值得后续逐项核对的偏差点。</p>
</div>
<div class="kicker">OPEN RISKS</div>
</div>
<div class="cols-2">
<div class="note risk">
<h3>定时 Task 路径存在明显变量错误</h3>
<p>
<code>AgentService.run_scheduled_task()</code> 末尾更新 assistant event payload 时引用了 <code>job.id</code><code>run.scheduled_run_id</code><code>job.name</code>
但该函数参数只有 <code>cron_job_id</code><code>cron_job_name</code><code>scheduled_run_id</code>。这条路径如果执行到这里会触发 <code>NameError</code>
</p>
</div>
<div class="note risk">
<h3>权限层还没有真正成为执行闸门</h3>
<p>
<code>permissions</code> 目录为空骨架,实际保护分散在工具实现和路径校验里。若后续开放 terminal、filesystem、MCP sensitive tools需要统一执行前 policy。
</p>
</div>
<div class="note risk">
<h3>Web auth 是本地单用户风格</h3>
<p>
本地 auth 文件以 username/password 字段读写,使用 token/handoff code 做前端会话。若目标是多用户或公网后端需要重新评估密码存储、token 生命周期和权限边界。
</p>
</div>
<div class="note risk">
<h3>Skill eval 目前偏轻量启发式</h3>
<p>
<code>SkillDraftEvaluator</code> 基于历史 accepted task evidence 和草稿长度/内容做 bounded report不是真正 replay。它只属于 skill draft 治理,不属于 Task runtime。
</p>
</div>
<div class="note risk">
<h3>接口层过大</h3>
<p>
<code>interfaces/web/app.py</code> 同时承载 app factory、lifespan、auth、provider config、sessions、files、agents、MCP、Outlook、skills、cron、chat、helper functions。
后续修改时容易产生跨功能回归。
</p>
</div>
<div class="note">
<h3>已经落地的稳定点</h3>
<p>
<span class="ok">可依赖:</span>统一 <code>AgentLoop</code>、session event stream、Task evidence/acceptance 状态、team graph v1、skill lifecycle gates、MCP wrapper、workspace path containment。
</p>
</div>
</div>
</section>
<section class="section">
<div class="section-head">
<div>
<h2>测试覆盖信号</h2>
<p>单元测试覆盖了当前后端多数关键行为,可作为后续修改文档的回归索引。</p>
</div>
<div class="kicker">TEST INDEX</div>
</div>
<div class="cols-4">
<div class="panel">
<h2>Task / acceptance</h2>
<p><code>test_task_mode_feedback.py</code>, <code>test_task_evidence.py</code>, <code>test_task_execution_planner.py</code>, <code>test_task_skill_resolver.py</code></p>
</div>
<div class="panel">
<h2>Engine / providers</h2>
<p><code>test_websocket_chat.py</code>, <code>test_main_agent_router.py</code>, <code>test_litellm_thinking_mode.py</code>, <code>test_imports.py</code></p>
</div>
<div class="panel">
<h2>Team / process</h2>
<p><code>test_agent_team_v1.py</code>, <code>test_agent_registry_resolver.py</code>, <code>test_process_projection.py</code></p>
</div>
<div class="panel">
<h2>Skills / tools / web</h2>
<p><code>test_phase5_skills_runtime.py</code>, <code>test_skill_learning_*.py</code>, <code>test_tool_assembler.py</code>, <code>test_web_files_api.py</code></p>
</div>
</div>
</section>
</article>
</main>
</body>
</html>