beaver_project/app-instance/backend/flow.md

# Beaver Backend Flow

这份文档只记录两件事：

1. 我们**为什么这么实现**
2. 当前代码里**真实已经实现了什么**

它不是蓝图，也不是未来设计草稿。以后只要主链、装配逻辑、运行时边界发生变化，就必须同步更新它。

---

## 1. 参考项目各自借什么

当前 Beaver 的实现思路，主要借了三个参考项目，但借的点是分开的。

### 1.1 `OpenHarness`

借的是**模块边界和 Harness 形态**：

1. `Harness / Runtime` 应该和 Web、Gateway、产品接入分开
2. `skills / memory / tools / session / orchestration` 都属于平台层
3. 运行时最好是可装配的，而不是所有逻辑都塞进一个大 agent 类

所以 Beaver 现在一直在做的事情，是把：

- `EngineLoader`
- `AgentLoop`
- `ContextBuilder`
- `Session`
- `Tools`
- `Skills`

收成一个清晰的运行内核。

### 1.2 `hermes-agent`

借的是**memory、skills、session 的运行时风格**：

1. memory 用 curated CRUD + frozen snapshot
2. `session_search` 查历史细节，不把所有历史都塞进 memory
3. skills 用：
   - 显式 skill loading path
   - 激活后的 skill 正文显式注入

所以 Beaver 现在这些点都明显受 Hermes 影响：

1. `MemoryService` + frozen snapshot
2. `session_search`
3. `skill_view`
4. activated skill messages

### 1.3 `swarms`

借的是**后面多智能体 orchestration 的方向**：

1. team orchestration
2. swarm strategy
3. multi-agent execution backend

但要注意：它现在**还不是当前主链的核心**。
当前我们主要先把单 agent runtime 打稳，多智能体还没正式接回主链。

---

## 2. 当前我们到底做到哪了

当前已经不是“搭骨架”阶段了，而是：

**最小单 agent runtime 已经跑通。**

现在已经完成的核心段落是：

1. `4.1 session`
2. `4.2 provider`
3. `4.3 context`
4. `4.4 tools framework + 最小内建工具`
5. `4.5 最小主链`
6. `5.1 memory 最小接入`
7. `5.2 skills 最小接入`
8. `6.1 session-first / event-source 第一阶段`
9. `6.2 runtime lifecycle 最小骨架`
10. `6.2.1 Web / Gateway 最小接主链`
11. app-instance Docker 镜像切到新 `beaver` 后端

更准确地说，当前 Beaver 已经有：

1. 一个可运行的 `AgentService -> AgentLoop` 主链
2. 一个外部化的 Session 子系统
3. 一个可工作的 tool loop 框架
4. Hermes 风格的 memory / skills 接入
5. LLM-driven 的 `SkillAssembler`
6. embedding-driven 的 `ToolAssembler`
7. MCP-style 本地工具描述
8. skill frontmatter `tools` 会影响本轮工具选择
9. `start()/submit_direct()/stop()/shutdown()/close()` 最小 lifecycle
10. FastAPI `/api/ping` + `/api/chat`
11. Gateway `MessageBus -> AgentService -> MessageBus` 最小桥接
12. Docker app-instance 使用 `/root/.beaver/config.json` 和 `/root/.beaver/workspace`

已经实测通过：

1. Docker image build
2. container `/api/ping`
3. `/api/chat` 调用 `qwen-plus`
4. Session SQLite 事件写入
5. 宿主机 `curl` 直连 app-instance

但还没有：

1. shell / web 等高风险或外部访问工具
2. 完整 tool permission gates
3. Web / Gateway 的 realtime streaming
4. bus retry / routing / persistence
5. delegation / swarm / team runtime
6. MCP 全量工具接回 runtime
7. checkpoint / rewind / fork / crash-resume
8. skill selector 的 embedding / LLM 选择细节还没有写入 Session event stream
9. 前端完整 auth / sessions / skills / files / ws 兼容新 Beaver API

---

## 3. 当前真实主链

当前主入口已经不是 CLI 逻辑，而是：

```python
service = AgentService()
await service.process_direct("你好")
```

上面是 direct/debug path。宿主层进入运行模式后，正式入口是：

```python
service = AgentService()
await service.start()
result = await service.submit_direct("你好")
await service.stop()
service.close()
```

宿主层现在也已经开始接到这条 lifecycle 上：

```python
app = create_app()        # FastAPI lifespan 内部托管 AgentService.start()/shutdown()
await run_gateway()       # Gateway 常驻进程托管 AgentService.start()/shutdown()
```

模型与 provider 配置现在从 backend sandbox config 统一读取，而不是从前端或 channel
请求里传密钥。Docker 单实例部署时，配置路径优先级是：

1. `BEAVER_CONFIG_PATH`
2. `NANOBOT_CONFIG_PATH`
3. `BEAVER_HOME/config.json`
4. `NANOBOT_HOME/config.json`
5. `<workspace>/.beaver/config.json`

当前 app-instance 会把每个用户实例自己的数据目录挂到 `/root/.beaver`，所以
Beaver 会默认读取：

```text
/root/.beaver/config.json
```

这份配置跟随单个 sandbox 容器/数据卷，不放在前端，也不放在宿主机全局目录。
Web / Gateway / Channel 只传 `message/session_id/user_id` 等业务输入。

app-instance 镜像当前也已经切到新 Beaver 后端：

```text
entrypoint.sh
├─ 启动 python -m uvicorn beaver.interfaces.web.app:create_app --factory
├─ 使用 /root/.beaver/config.json
└─ 使用 /root/.beaver/workspace
```

旧的 `nanobot web`、`backend/nanobot`、`backend/bridge`、vendored `swarms` 不再进入新镜像。

这套 lifecycle 当前明确是：

1. `start()` 进入一个 `AgentLoop` 实例的运行模式
2. 运行模式下，外部任务只能走 `submit_direct()`
3. 运行模式下，不允许再直接调用 `process_direct()`
4. `stop()` 是 **instance-scoped**
   - 只针对当前这个 `AgentLoop` 实例
   - 不是 session-scoped
   - 也不是 platform-scoped
5. `stop()` 调用后会拒绝新任务，已入队任务正常收尾
6. `stop()` / `shutdown()` 支持 graceful timeout；必要时可 force cancel
7. `close()` 只能在该实例已停止后调用

### 3.1 Web / Gateway 当前怎么接

这一层现在已经不是纯占位了，而是最小宿主层：

1. `beaver/interfaces/web/app.py`
   - FastAPI lifespan 启动时：
     - 创建或接收 `AgentService`
     - 如果 app 自己创建 service，则 `await service.start()`
   - Web 接口现在有最小正式 schema：
     - `WebChatRequest`
     - `WebChatResponse`
     - `WebStatusResponse`
   - `/api/chat` 请求：
     - 用结构化 request schema 校验输入
     - `await service.submit_direct(...)`
     - 把常见 runtime / config 错误收成 HTTP 错误
     - 外部注入但尚未进入 running mode 的 service，会返回 `503`
   - `/api/ping`：
     - 返回 `status/running/mode`
     - 不会为了 health check 额外 boot runtime
   - app 关闭时：
     - 如果 app 自己创建 service，则 `await service.shutdown(timeout_seconds=5.0, force=True)`
   - app 自己接管 lifecycle 时：
     - 若 `start()` 失败，会立即 `close()` 做 startup cleanup

2. `beaver/interfaces/gateway/main.py`
   - `run_gateway()` 启动时：
     - 如果 gateway 自己创建 service，则 `await service.start()`
   - 持有最小 `MessageBus`
   - 可选接收 `ChannelManager` / channel adapters
   - `ChannelManager` 和 `channels` 参数二选一：
     - 传 `ChannelManager`：外部提前配置好 channel
     - 传 `channels`：gateway 内部创建 `ChannelManager` 并注册这些 channel
   - inbound 流向：
     - channel adapter 发布 `InboundMessage`
     - `MessageBus.inbound`
     - gateway bridge 常驻消费
     - `await service.handle_inbound_message(...)`
   - outbound 流向：
     - `AgentService` 内部完成 `InboundMessage -> OutboundMessage` 映射
     - gateway bridge 写回 `MessageBus.outbound`
     - 如果启用了 `ChannelManager`，则分发给对应 channel adapter
     - 未启用 `ChannelManager` 时，保留直接消费 `bus.outbound` 的最小测试能力
   - 同时等待 `stop_event`
   - 退出时：
     - 先尝试 `await service.shutdown(timeout_seconds=5.0, force=True)`
     - 再等待 bridge 协程收尾；必要时取消 bridge
     - 再等待 outbound dispatch 协程收尾；必要时取消 dispatch
   - 如果 gateway 自己接管 lifecycle 且 `start()` 失败：
     - 会立即 `close()` 做 startup cleanup
   - 未处理完的 inbound：
     - 不再静默丢下
     - 会被冲刷成结构化 outbound error

3. `beaver/foundation/events/message_bus.py`
   - 已有最小：
     - `MessageBus`
     - `InboundMessage`
     - `OutboundMessage`
   - 当前只做双队列桥接：
     - `inbound`
     - `outbound`
   - 还没有 broker / topic routing / retry / persistence

4. `beaver/interfaces/channels/*`
   - 已有最小 channel adapter 层：
     - `ChannelAdapter`
     - `ChannelManager`
     - `MemoryChannelAdapter`
   - 当前 channel 职责很窄：
     - 把外部输入发布成 `InboundMessage`
     - 接收并投递 `OutboundMessage`
   - `MemoryChannelAdapter` 只用于本地测试和内嵌接入，不是正式消息 broker

所以现在已经明确：

1. Web / Gateway 属于宿主层
2. 它们不直接 new `AgentLoop` 或绕过运行模式
3. 它们复用：
   - `start()`
   - `submit_direct()`
   - `stop()`
   - `shutdown()`
4. ownership 语义：
   - 自己创建的 `AgentService`：自己负责 lifecycle
   - 外部注入的 `AgentService`：默认不自动 start/shutdown，除非显式要求接管
5. gateway 已经从“只会常驻等待”推进到“最小消息桥接层”
   - external inbound message
   - channel adapter
   - `MessageBus.inbound`
   - `service.handle_inbound_message(...)`
   - `MessageBus.outbound`
   - channel adapter outbound delivery

### 3.2 总体链路

当前代码里的主链可以概括成：

```text
AgentService
  -> AgentLoop
    -> Session
    -> Memory
    -> SkillAssembler
    -> ToolAssembler
    -> ContextBuilder
    -> Provider
    -> ToolExecutor
    -> Session writeback
```

### 3.3 详细顺序

```text
用户输入 task
│
├─ AgentService.create_loop()
│  ├─ 创建 AgentLoop(profile, loader)
│  └─ loop.boot()
│
├─ AgentLoop.boot()
│  └─ EngineLoader.load()
│     ├─ SessionManager
│     ├─ MemoryStore
│     ├─ MemoryService
│     ├─ ToolRegistry
│     ├─ ToolAssembler
│     ├─ ToolExecutor
│     ├─ SkillsLoader
│     ├─ SkillAssembler
│     └─ ContextBuilder
│
├─ AgentLoop.process_direct(task)
│  │
│  ├─ 生成 `session_id` / `run_id`
│  │
│  ├─ memory_service.reload_for_new_run()
│  │  └─ 建立本轮 frozen memory snapshot
│  │
│  ├─ sessions.ensure_session(session_id)
│  ├─ sessions.append_message(event_type="run_started", hidden)
│  │
│  ├─ make_provider_bundle()
│  │  ├─ main provider
│  │  ├─ fallback provider
│  │  ├─ auxiliary provider 可用于 skill 选择
│  │  └─ embedding runtime 提供 embeddings 的 model/api_key/api_base
│  │     说明：它是独立配置线，只支持 OpenAI-compatible embeddings endpoint
│  │
│  ├─ skill_assembler.assemble(task_description=task, provider=selector_provider, embedding_runtime=..., ...)
│  │  ├─ 读取全量可用 skill 候选摘要
│  │  ├─ 用 `text-embedding-v4` 对全量候选做相似度召回
│  │  ├─ 把召回结果交给 LLM 做最终选择
│  │  └─ 返回 activated_skills
│  │
│  ├─ ContextBuilder.build_skill_activation_messages(...)
│  ├─ 如果 activated_skills 非空：
│  │  └─ sessions.append_message(event_type="skill_activation_snapshotted", hidden)
│  │
│  ├─ tool_assembler.assemble(task_description=task, activated_skills=..., ...)
│  │  ├─ always tools
│  │  │  ├─ memory
│  │  │  ├─ session_search
│  │  │  └─ skill_view
│  │  ├─ 读取 activated skill 的 frontmatter `tools`
│  │  ├─ 用 `text-embedding-v4` 对工具描述做相似度召回
│  │  ├─ 返回本轮选中的 ToolSpec
│  │  └─ ToolSpec 同时可导出 MCP descriptor 与 provider schema
│  │
│  ├─ sessions.append_message(event_type="tool_selection_snapshotted", hidden)
│  │
│  ├─ ContextBuilder.build_messages()
│  │  ├─ system prompt 包含：
│  │  │  ├─ base system prompt
│  │  │  ├─ session metadata
│  │  │  ├─ execution context
│  │  │  └─ frozen memory snapshot
│  │  ├─ messages 里显式插入 activated skill messages
│  │  ├─ 再拼 visible history
│  │  └─ 最后追加当前 user input
│  │
│  ├─ sessions.update_system_prompt()
│  ├─ sessions.append_message(event_type="system_prompt_snapshotted", hidden)
│  ├─ sessions.append_message(event_type="user_message_added")
│  │
│  ├─ 进入最小 tool loop
│  │  ├─ provider.chat(messages, tools=schemas)
│  │  ├─ sessions.update_usage()
│  │  ├─ sessions.append_message(event_type="assistant_message_added")
│  │  ├─ ContextBuilder.add_assistant_message(...)
│  │  ├─ 如果没有 tool calls：
│  │  │  └─ 结束
│  │  └─ 如果有 tool calls：
│  │     ├─ ToolExecutor.execute_tool_call(...)
│  │     ├─ sessions.append_message(event_type="tool_result_recorded")
│  │     ├─ ContextBuilder.add_tool_result(...)
│  │     └─ 再回 provider.chat(...)
│  │
│  ├─ 成功结束：
│  │  └─ sessions.append_message(event_type="run_completed", hidden)
│  │
│  ├─ 异常结束：
│  │  ├─ 补 assistant error message
│  │  └─ sessions.append_message(event_type="run_failed", hidden)
│  │
│  └─ return AgentRunResult
│     ├─ session_id
│     ├─ run_id
│     ├─ output_text
│     ├─ finish_reason
│     ├─ tool_iterations
│     ├─ provider_name
│     ├─ model
│     └─ usage
```

---

## 4. 当前模块边界

### 4.1 `EngineLoader`

职责：装配运行时依赖。

当前已经装配：

1. `SessionManager`
2. `MemoryStore`
3. `MemoryService`
4. `ToolRegistry`
5. `ToolAssembler`
6. `ToolExecutor`
7. `SkillsLoader`
8. `SkillAssembler`
9. `ContextBuilder`

### 4.2 `AgentLoop`

职责：执行单次 run。

当前已经负责：

1. direct run 主链
2. provider 调用
3. 最小 tool loop
4. session 事件写回
5. usage 汇总

当前还没负责：

1. 更复杂的 message bus mode
2. 多 worker / 并发调度
3. provider/client 级 async shutdown hooks
4. multi-agent orchestration

### 4.3 `Session`

职责：外部化的运行事实存储。

当前实现重点：

1. `sessions` 表
   - projection / summary row
2. `messages` 表
   - 当前主事件流
3. `run_id`
   - 把同一个 session 里的多次 run 切开

当前主要读取接口：

1. `get_event_records(session_id)`
   - 整个 session 的完整事件流
2. `get_run_event_records(session_id, run_id)`
   - 某一次 run 的事件片段
3. `list_run_ids(session_id)`
   - 发现当前 session 中有哪些 run
4. `get_visible_history(session_id)`
   - 给 ContextBuilder 用的可见历史切片
5. `session_search`
   - 只检索可见 transcript
   - 不把 hidden prompt / skill snapshot 当成搜索候选

当前关键 hidden 事件：

1. `run_started`
2. `skill_activation_snapshotted`
3. `tool_selection_snapshotted`
4. `system_prompt_snapshotted`
5. `run_completed`
6. `run_failed`

### 4.4 `Memory`

职责：durable facts，不是 transcript。

当前实现重点：

1. curated CRUD
2. frozen snapshot
3. 每次新 run 开始时刷新 snapshot
4. 当前 run 中途写 memory 不反向污染本轮 prompt

### 4.5 `Skills`

职责：外置 skill 装配与按需查看。

当前实现重点：

1. `SkillsLoader`
   - 扫描 `workspace/skills/*/SKILL.md`
   - 扫描 builtin skills
2. `SkillAssembler`
   - 输入 task description + 候选 skill 摘要
   - 先用 embedding 做语义召回
   - 再调一次 LLM 直接选择 skills
   - 没有匹配时返回空 skills
3. `skill_view`
   - 显式加载 skill 正文或支持文件
4. activated skills
   - 按 Hermes 风格作为显式消息注入

当前 skill 语义已经定成：

1. **run-scoped**
   - skill 激活只对当前 run 生效
2. **不是 session-scoped**
   - 不默认跨 run 持久化为 session 状态
3. **explicit loading path**
    - `skill_view`
4. **no-match means no skill injection**
   - 如果 assembler 没选出 skill
   - 当前 run 不拼接 skill messages
   - 也不会写 `skill_activation_snapshotted`

### 4.6 `Tools`

当前内建工具：

1. `echo`
2. `memory`
3. `skill_view`
4. `session_search`
5. `list_directory`
6. `read_file`
7. `search_files`

当前工具基础设施：

1. `ToolSpec`
   - 以 MCP-style descriptor 作为本地统一描述
   - 可导出 `to_mcp_descriptor()`
   - 可导出 OpenAI-compatible `to_provider_schema()`
2. `ObjectBackedTool`
3. `ToolRegistry`
4. `ToolExecutor`
5. `ToolAssembler`

当前工具选择语义：

1. 工具选择是 **run-scoped**
2. `memory` / `session_search` / `skill_view` / 只读 filesystem tools 是 always tools
3. activated skill 的 frontmatter 可声明：

```yaml
---
tools:
  - terminal
  - read_file
---
```

4. `ToolAssembler` 会合并：
   - always tools
   - activated skill 显式声明的 tools
   - task description embedding top10 tools
5. 当前只信任 frontmatter / metadata 里的显式 tools，不从 skill 正文里猜工具名
6. 如果 skill 声明了未注册工具，当前会忽略，不阻断 run

当前 filesystem tools 的边界：

1. `list_directory` 只能列当前 `ToolContext.workspace` 内的目录
2. `read_file` 只能读 workspace 内 UTF-8 文本文件
3. `search_files` 只能搜索 workspace 内文件名和 UTF-8 文本内容
4. 绝对路径如果解析后不在 workspace 内，会拒绝
5. workspace 内指向外部的符号链接，读取 / 搜索时会拒绝
6. 二进制文件会拒绝读取，并在搜索时跳过

当前还没有默认注册：

1. shell / exec tools
2. web search / web fetch tools
3. MCP tools
4. spawn / team tools

### 4.7 `Providers`

当前已经实现：

1. provider registry
2. runtime resolution
3. main provider
4. fallback provider
5. auxiliary provider
6. embedding runtime 配置线

当前状态：

1. fallback 已经是“每次调用都先 main，再 fallback”
2. auxiliary provider 已经可用于 skill 选择
3. embedding runtime 当前用于 SkillAssembler 的候选召回
4. embedding runtime 当前也用于 ToolAssembler 的工具召回
5. auxiliary provider 还没有进入主对话 tool loop

---

## 5. 当前最重要的设计决定

这几条是现在已经定下来的，不应该再反复漂：

### 5.1 `Session-first`

当前 Beaver 明确在往这个方向走：

1. 运行事实优先写回 Session
2. Session 是 replay / audit / resume 的基础
3. prompt 不是状态源，Session 才是

### 5.2 `Harness != Product Interface`

当前主入口已经是：

- `AgentService`
- `AgentLoop`

而不是 CLI 本身。
CLI、Web、Gateway 后面都应该只是接口层。

### 5.3 `Skill selection` 外置

已经不再让 `AgentLoop` 自己“决定该选哪个 skill”，而是：

```text
task description
  -> SkillAssembler
    -> AgentLoop
```

### 5.4 `Skills` 采用 Hermes 风格

不是：

- skill 正文长期塞进 system prompt
- summary 让模型自己猜怎么展开

而是：

1. activated skill messages
2. `skill_view`

### 5.5 `Tools` 采用 MCP-style 描述

当前本地工具不再只是一段 OpenAI function schema，而是先收敛成：

```text
ToolSpec
├─ name
├─ description
├─ input_schema
├─ toolset
└─ always_available
```

其中 `name/description/input_schema` 可直接导出 MCP-style descriptor：

```json
{
  "name": "memory",
  "description": "...",
  "inputSchema": {}
}
```

provider 需要的 OpenAI-compatible schema 由 `ToolSpec.to_provider_schema()` 转换出来。

---

## 6. 对照施工指南，我们现在处于哪一步

这部分严格对齐 `施工指南.md` 的第 6 阶段编号，不再自行改号。

### 6.1 第一步：Session 升级为事件源模型

当前状态：**基本完成第一阶段目标，但还不是完整 event-source 系统。**

已经具备：

1. `messages` 表已经承担主事件流语义
2. 每次 run 都有独立 `run_id`
3. `AgentLoop.process_direct()` 已按事件阶段写回 Session
4. 已有：
   - `get_event_records(session_id)`
   - `get_run_event_records(session_id, run_id)`
   - `list_run_ids(session_id)`
   - `get_visible_history(session_id)`
5. `session_search` 只检索可见 transcript，不把 hidden snapshots 当搜索候选

当前还没做：

1. `checkpoint`
2. `rewind`
3. `fork session`
4. `crash-resume protocol`

所以更准确地说：

1. `6.1` 的“Session-first / event-source 第一阶段”已经落地
2. 但更完整的 event-source 能力还没有做完

### 6.2 第二步：runtime 生命周期协议补齐

当前状态：**最小 lifecycle 骨架已经完成。**

已完成：

1. `EngineLoadResult.close()`
2. `AgentLoop.close()`
3. `AgentService.close()`
4. `AgentService.shutdown()`
5. `AgentLoop.run()`
6. `AgentLoop.stop()`
7. `AgentLoop.submit_direct()`
8. `AgentService.start()`
9. `AgentService.stop()`
10. `AgentService.submit_direct()`

还没做：

1. 统一 shutdown hooks
2. 更完整的 provider/client 资源释放协议
3. 多 worker / bus / 调度策略

### 6.2.1 Web / Gateway 现在如何接这套 lifecycle

当前状态：**最小宿主层接入已经完成。**

已经完成：

1. Web 通过 FastAPI lifespan 托管 `AgentService.start()/shutdown()`
2. Web 请求只走 `AgentService.submit_direct()`
3. Gateway 已有最小 `MessageBus -> AgentService.handle_inbound_message() -> MessageBus` 桥接
4. Gateway 已支持可选 `ChannelManager`，把 outbound 分发回 channel adapter

当前 app-instance Docker 已完成：

1. Dockerfile 只安装 `backend/beaver`
2. entrypoint 启动 `beaver.interfaces.web.app:create_app`
3. 每个实例挂载 `/root/.beaver`
4. 配置读取 `/root/.beaver/config.json`
5. workspace 使用 `/root/.beaver/workspace`
6. 宿主 `curl /api/chat` 已实测通过

这一小步还没做：

1. realtime streaming
2. retry / broker persistence
3. 外部真实 channel adapter 全量接入

### 6.3 第三步：回填 bus 模式

当前状态：**只完成了前置地基，还没有按施工指南真正收口。**

已经具备的前置件：

1. `MessageBus`
2. `InboundMessage`
3. `OutboundMessage`
4. `AgentService.handle_inbound_message()`
5. Gateway bridge 常驻消费 inbound 并写回 outbound
6. `AgentLoop.run()` 已有最小运行循环

但严格按 `施工指南.md` 来看，`6.3` 还没有正式完成，因为现在还缺：

1. 把 bus mode 明确成 runtime 的正式运行形态之一
2. 明确 `run()` 如何稳定消费 inbound message
3. 明确 bus mode 与 direct mode / queue mode 的职责边界
4. 明确停机、取消、冲刷 pending inbound 时的统一语义
5. 再决定后续是否需要更复杂的 worker / retry / routing

也就是说：

1. 现在不是“还没 bus”
2. 而是“已经把 bus 协议映射收口到 `AgentService`，但还没按施工指南把它扩成完整 bus runtime 模式”

### 6.4 单 agent lifecycle 如何扩展到 team

当前状态：**关系已经定死，但实现还没开始。**

当前已经明确：

1. team 不会共享一个大 `AgentLoop` 跑所有成员
2. 每个 team member 都应有自己独立的 `AgentService / AgentLoop`
3. team coordinator 在上层调度多个 member 实例
4. 因此当前这套 `start()/submit_direct()/stop()/close()` 首先是 member-level lifecycle

当前还没开始的部分：

1. delegation
2. team runtime
3. swarms orchestration backend
4. group discussion / workflow orchestration

---

## 7. 对照 `change.md`，哪些长期目标还没开始

`change.md` 讲的是总蓝图，不是当前施工编号。下面这些仍然是长期目标，还没有正式进入当前阶段实现：

1. skills 生命周期系统
   - `SkillDraft`
   - `SkillVersion`
   - review / publish / rollback
2. Hermes-style learning loop
   - 智能体定期整理 / 提示记忆
   - 复杂任务完成后可自主创建技能
   - 技能在使用过程中自我提升
   - FTS5 + LLM 摘要的跨会话回忆增强
   - Honcho 风格辩证用户建模
3. swarms 作为正式 backend 接回平台
4. delegation / subagent / team orchestration

当前只完成了这些基础入口：

1. curated memory CRUD
2. session_search
3. skill loader / skill_view
4. skill assembler
5. tool assembler

### 7.1 权限与治理

还没做：

1. 完整 permission gates
2. tool policy
3. MCP 工具治理

已完成的最小边界：

1. 只读 filesystem tools 强制限制在 `ToolContext.workspace`
2. 路径解析使用真实路径，防止相对路径、绝对路径、符号链接逃逸
3. 当前还没有 shell / write / network 工具，因此还没进入高风险授权阶段

### 7.2 前端兼容

当前只做了最小 chat response 兼容：

1. 前端 `sendMessage()` 已兼容 Beaver 的 `output_text`

还没做：

1. `/api/auth/*`
2. `/api/sessions`
3. `/api/status` 完整页面数据
4. `/api/skills`
5. `/api/files`
6. `/ws`
7. 浏览器端免登录或新 auth 接入策略

---

## 8. 下一步从哪开始最合理

如果严格按 `施工指南.md` 的施工顺序继续，下一步应是：

1. 完成 `6.3 回填 bus 模式`
   - 明确 bus mode 的正式运行语义
   - 让 `AgentLoop.run()` 与 `MessageBus` 的关系稳定收口
   - 把 inbound / outbound 结果结构定稳
2. 然后再进入 `6.4`
   - 先把 team lifecycle 关系写成更可实现的 coordinator 约束
3. 再进入第 7 阶段
   - delegation
   - local subagent
4. 再进入第 8 阶段
   - team / swarms backend

如果按 `change.md` 的长期方向看，后面还要补：

1. skills 生命周期
2. Hermes-style learning loop
3. 更完整的 memory / governance / frontend

一句话总结：

**当前 Beaver 已经完成到“单 agent runtime + memory/skills + lifecycle + Web/Gateway 最小接入”，按施工指南的编号，下一步应是 `6.3 回填 bus 模式`。**

---

## 8. 文档维护要求

以后只要发生以下任一变动，必须同步更新本文件：

1. `EngineLoader` 装配项变化
2. `AgentLoop` 主链变化
3. `Session` 事件流结构变化
4. `Memory` 接入方式变化
5. `Skills` 装配方式变化
6. `Tools` 默认集合变化
7. Web / Gateway / multi-agent 真正接入主链