feat(app-instance): 集成Beaver后端并更新配置管理

集成新的Beaver后端服务到应用实例中，替换原有的nanobot实现。主要变更包括： - 在Dockerfile和环境配置中添加Beaver相关路径和配置变量 - 更新工作目录结构从.nanobot到.beaver - 实现Beaver引擎加载器，支持配置文件加载和工具组装 - 添加内置工具如ListDirectoryTool、ReadFileTool、SearchFilesTool - 更新消息处理流程，支持通道适配器和网关模式 - 重构技能系统，支持显式工具提示和嵌入式检索 - 改进错误处理和生命周期管理此变更使应用实例能够使用统一的Beaver后端进行AI代理运行时管理。
2026-04-27 17:37:40 +08:00
parent 36882a7d7b
commit 5ba5c7e4c1
47 changed files with 2821 additions and 462 deletions
--- a/app-instance/backend/flow.md
+++ b/app-instance/backend/flow.md
@ -73,27 +73,49 @@
 1. `4.1 session`
 2. `4.2 provider`
 3. `4.3 context`
-4. `4.4 tools`
+4. `4.4 tools framework + 最小内建工具`
 5. `4.5 最小主链`
 6. `5.1 memory 最小接入`
 7. `5.2 skills 最小接入`
 8. `6.1 session-first / event-source 第一阶段`
+9. `6.2 runtime lifecycle 最小骨架`
+10. `6.2.1 Web / Gateway 最小接主链`
+11. app-instance Docker 镜像切到新 `beaver` 后端

 更准确地说，当前 Beaver 已经有：

 1. 一个可运行的 `AgentService -> AgentLoop` 主链
 2. 一个外部化的 Session 子系统
-3. 一个可工作的 tool loop
+3. 一个可工作的 tool loop 框架
 4. Hermes 风格的 memory / skills 接入
 5. LLM-driven 的 `SkillAssembler`
+6. embedding-driven 的 `ToolAssembler`
+7. MCP-style 本地工具描述
+8. skill frontmatter `tools` 会影响本轮工具选择
+9. `start()/submit_direct()/stop()/shutdown()/close()` 最小 lifecycle
+10. FastAPI `/api/ping` + `/api/chat`
+11. Gateway `MessageBus -> AgentService -> MessageBus` 最小桥接
+12. Docker app-instance 使用 `/root/.beaver/config.json` 和 `/root/.beaver/workspace`
+
+已经实测通过：
+
+1. Docker image build
+2. container `/api/ping`
+3. `/api/chat` 调用 `qwen-plus`
+4. Session SQLite 事件写入
+5. 宿主机 `curl` 直连 app-instance

 但还没有：

-1. 更完整的 shutdown hooks
-2. Web / Gateway 的 bus / channels / realtime 全量接入
-3. delegation / swarm / team runtime
-4. 权限系统
-5. MCP 全量工具接回 runtime
+1. shell / web 等高风险或外部访问工具
+2. 完整 tool permission gates
+3. Web / Gateway 的 realtime streaming
+4. bus retry / routing / persistence
+5. delegation / swarm / team runtime
+6. MCP 全量工具接回 runtime
+7. checkpoint / rewind / fork / crash-resume
+8. skill selector 的 embedding / LLM 选择细节还没有写入 Session event stream
+9. 前端完整 auth / sessions / skills / files / ws 兼容新 Beaver API

 ---

@ -106,7 +128,7 @@ service = AgentService()
 await service.process_direct("你好")
 ```

-同时，第 6 阶段的最小运行循环已经有了：
+上面是 direct/debug path。宿主层进入运行模式后，正式入口是：

 ```python
 service = AgentService()
@ -123,6 +145,36 @@ app = create_app()        # FastAPI lifespan 内部托管 AgentService.start()/s
 await run_gateway()       # Gateway 常驻进程托管 AgentService.start()/shutdown()
 ```

+模型与 provider 配置现在从 backend sandbox config 统一读取，而不是从前端或 channel
+请求里传密钥。Docker 单实例部署时，配置路径优先级是：
+
+1. `BEAVER_CONFIG_PATH`
+2. `NANOBOT_CONFIG_PATH`
+3. `BEAVER_HOME/config.json`
+4. `NANOBOT_HOME/config.json`
+5. `<workspace>/.beaver/config.json`
+
+当前 app-instance 会把每个用户实例自己的数据目录挂到 `/root/.beaver`，所以
+Beaver 会默认读取：
+
+```text
+/root/.beaver/config.json
+```
+
+这份配置跟随单个 sandbox 容器/数据卷，不放在前端，也不放在宿主机全局目录。
+Web / Gateway / Channel 只传 `message/session_id/user_id` 等业务输入。
+
+app-instance 镜像当前也已经切到新 Beaver 后端：
+
+```text
+entrypoint.sh
+├─ 启动 python -m uvicorn beaver.interfaces.web.app:create_app --factory
+├─ 使用 /root/.beaver/config.json
+└─ 使用 /root/.beaver/workspace
+```
+
+旧的 `nanobot web`、`backend/nanobot`、`backend/bridge`、vendored `swarms` 不再进入新镜像。
+
 这套 lifecycle 当前明确是：

 1. `start()` 进入一个 `AgentLoop` 实例的运行模式
@ -165,15 +217,27 @@ await run_gateway()       # Gateway 常驻进程托管 AgentService.start()/shut
   - `run_gateway()` 启动时：
     - 如果 gateway 自己创建 service，则 `await service.start()`
   - 持有最小 `MessageBus`
-   - 常驻消费 `bus.inbound`
-   - 调 `await service.submit_direct(...)`
-   - 把结果写回 `bus.outbound`
+   - 可选接收 `ChannelManager` / channel adapters
+   - `ChannelManager` 和 `channels` 参数二选一：
+     - 传 `ChannelManager`：外部提前配置好 channel
+     - 传 `channels`：gateway 内部创建 `ChannelManager` 并注册这些 channel
+   - inbound 流向：
+     - channel adapter 发布 `InboundMessage`
+     - `MessageBus.inbound`
+     - gateway bridge 常驻消费
+     - `await service.handle_inbound_message(...)`
+   - outbound 流向：
+     - `AgentService` 内部完成 `InboundMessage -> OutboundMessage` 映射
+     - gateway bridge 写回 `MessageBus.outbound`
+     - 如果启用了 `ChannelManager`，则分发给对应 channel adapter
+     - 未启用 `ChannelManager` 时，保留直接消费 `bus.outbound` 的最小测试能力
   - 同时等待 `stop_event`
   - 退出时：
     - 先尝试 `await service.shutdown(timeout_seconds=5.0, force=True)`
     - 再等待 bridge 协程收尾；必要时取消 bridge
-    - 如果 gateway 自己接管 lifecycle 且 `start()` 失败：
-      - 会立即 `close()` 做 startup cleanup
+     - 再等待 outbound dispatch 协程收尾；必要时取消 dispatch
+   - 如果 gateway 自己接管 lifecycle 且 `start()` 失败：
+     - 会立即 `close()` 做 startup cleanup
   - 未处理完的 inbound：
     - 不再静默丢下
     - 会被冲刷成结构化 outbound error
@ -188,6 +252,16 @@ await run_gateway()       # Gateway 常驻进程托管 AgentService.start()/shut
     - `outbound`
   - 还没有 broker / topic routing / retry / persistence

+4. `beaver/interfaces/channels/*`
+   - 已有最小 channel adapter 层：
+     - `ChannelAdapter`
+     - `ChannelManager`
+     - `MemoryChannelAdapter`
+   - 当前 channel 职责很窄：
+     - 把外部输入发布成 `InboundMessage`
+     - 接收并投递 `OutboundMessage`
+   - `MemoryChannelAdapter` 只用于本地测试和内嵌接入，不是正式消息 broker
+
 所以现在已经明确：

 1. Web / Gateway 属于宿主层
@ -202,9 +276,11 @@ await run_gateway()       # Gateway 常驻进程托管 AgentService.start()/shut
   - 外部注入的 `AgentService`：默认不自动 start/shutdown，除非显式要求接管
 5. gateway 已经从“只会常驻等待”推进到“最小消息桥接层”
   - external inbound message
+   - channel adapter
   - `MessageBus.inbound`
-   - `service.submit_direct(...)`
+   - `service.handle_inbound_message(...)`
   - `MessageBus.outbound`
+   - channel adapter outbound delivery

 ### 3.2 总体链路

@ -216,6 +292,7 @@ AgentService
    -> Session
    -> Memory
    -> SkillAssembler
+    -> ToolAssembler
    -> ContextBuilder
    -> Provider
    -> ToolExecutor
@ -237,6 +314,7 @@ AgentService
 │     ├─ MemoryStore
 │     ├─ MemoryService
 │     ├─ ToolRegistry
+│     ├─ ToolAssembler
 │     ├─ ToolExecutor
 │     ├─ SkillsLoader
 │     ├─ SkillAssembler
@ -269,6 +347,18 @@ AgentService
 │  ├─ 如果 activated_skills 非空：
 │  │  └─ sessions.append_message(event_type="skill_activation_snapshotted", hidden)
 │  │
+│  ├─ tool_assembler.assemble(task_description=task, activated_skills=..., ...)
+│  │  ├─ always tools
+│  │  │  ├─ memory
+│  │  │  ├─ session_search
+│  │  │  └─ skill_view
+│  │  ├─ 读取 activated skill 的 frontmatter `tools`
+│  │  ├─ 用 `text-embedding-v4` 对工具描述做相似度召回
+│  │  ├─ 返回本轮选中的 ToolSpec
+│  │  └─ ToolSpec 同时可导出 MCP descriptor 与 provider schema
+│  │
+│  ├─ sessions.append_message(event_type="tool_selection_snapshotted", hidden)
+│  │
 │  ├─ ContextBuilder.build_messages()
 │  │  ├─ system prompt 包含：
 │  │  │  ├─ base system prompt
@ -328,10 +418,11 @@ AgentService
 2. `MemoryStore`
 3. `MemoryService`
 4. `ToolRegistry`
-5. `ToolExecutor`
-6. `SkillsLoader`
-7. `SkillAssembler`
-8. `ContextBuilder`
+5. `ToolAssembler`
+6. `ToolExecutor`
+7. `SkillsLoader`
+8. `SkillAssembler`
+9. `ContextBuilder`

 ### 4.2 `AgentLoop`

@ -349,7 +440,7 @@ AgentService

 1. 更复杂的 message bus mode
 2. 多 worker / 并发调度
-3. 更完整的 runtime lifecycle
+3. provider/client 级 async shutdown hooks
 4. multi-agent orchestration

 ### 4.3 `Session`
@ -383,9 +474,10 @@ AgentService

 1. `run_started`
 2. `skill_activation_snapshotted`
-3. `system_prompt_snapshotted`
-4. `run_completed`
-5. `run_failed`
+3. `tool_selection_snapshotted`
+4. `system_prompt_snapshotted`
+5. `run_completed`
+6. `run_failed`

 ### 4.4 `Memory`

@ -438,13 +530,57 @@ AgentService
 2. `memory`
 3. `skill_view`
 4. `session_search`
+5. `list_directory`
+6. `read_file`
+7. `search_files`

 当前工具基础设施：

 1. `ToolSpec`
+   - 以 MCP-style descriptor 作为本地统一描述
+   - 可导出 `to_mcp_descriptor()`
+   - 可导出 OpenAI-compatible `to_provider_schema()`
 2. `ObjectBackedTool`
 3. `ToolRegistry`
 4. `ToolExecutor`
+5. `ToolAssembler`
+
+当前工具选择语义：
+
+1. 工具选择是 **run-scoped**
+2. `memory` / `session_search` / `skill_view` / 只读 filesystem tools 是 always tools
+3. activated skill 的 frontmatter 可声明：
+
+```yaml
+---
+tools:
+  - terminal
+  - read_file
+---
+```
+
+4. `ToolAssembler` 会合并：
+   - always tools
+   - activated skill 显式声明的 tools
+   - task description embedding top10 tools
+5. 当前只信任 frontmatter / metadata 里的显式 tools，不从 skill 正文里猜工具名
+6. 如果 skill 声明了未注册工具，当前会忽略，不阻断 run
+
+当前 filesystem tools 的边界：
+
+1. `list_directory` 只能列当前 `ToolContext.workspace` 内的目录
+2. `read_file` 只能读 workspace 内 UTF-8 文本文件
+3. `search_files` 只能搜索 workspace 内文件名和 UTF-8 文本内容
+4. 绝对路径如果解析后不在 workspace 内，会拒绝
+5. workspace 内指向外部的符号链接，读取 / 搜索时会拒绝
+6. 二进制文件会拒绝读取，并在搜索时跳过
+
+当前还没有默认注册：
+
+1. shell / exec tools
+2. web search / web fetch tools
+3. MCP tools
+4. spawn / team tools

 ### 4.7 `Providers`

@ -454,12 +590,16 @@ AgentService
 2. runtime resolution
 3. main provider
 4. fallback provider
+5. auxiliary provider
+6. embedding runtime 配置线

 当前状态：

 1. fallback 已经是“每次调用都先 main，再 fallback”
 2. auxiliary provider 已经可用于 skill 选择
-3. auxiliary provider 还没有进入主对话 tool loop
+3. embedding runtime 当前用于 SkillAssembler 的候选召回
+4. embedding runtime 当前也用于 ToolAssembler 的工具召回
+5. auxiliary provider 还没有进入主对话 tool loop

 ---

@ -507,26 +647,81 @@ task description
 1. activated skill messages
 2. `skill_view`

+### 5.5 `Tools` 采用 MCP-style 描述
+
+当前本地工具不再只是一段 OpenAI function schema，而是先收敛成：
+
+```text
+ToolSpec
+├─ name
+├─ description
+├─ input_schema
+├─ toolset
+└─ always_available
+```
+
+其中 `name/description/input_schema` 可直接导出 MCP-style descriptor：
+
+```json
+{
+  "name": "memory",
+  "description": "...",
+  "inputSchema": {}
+}
+```
+
+provider 需要的 OpenAI-compatible schema 由 `ToolSpec.to_provider_schema()` 转换出来。
+
 ---

-## 6. 当前还没完成什么
+## 6. 对照施工指南，我们现在处于哪一步

-这部分是接下来继续施工的重点。
+这部分严格对齐 `施工指南.md` 的第 6 阶段编号，不再自行改号。

-### 6.1 运行时生命周期
+### 6.1 第一步：Session 升级为事件源模型

-已做第一步：
+当前状态：**基本完成第一阶段目标，但还不是完整 event-source 系统。**
+
+已经具备：
+
+1. `messages` 表已经承担主事件流语义
+2. 每次 run 都有独立 `run_id`
+3. `AgentLoop.process_direct()` 已按事件阶段写回 Session
+4. 已有：
+   - `get_event_records(session_id)`
+   - `get_run_event_records(session_id, run_id)`
+   - `list_run_ids(session_id)`
+   - `get_visible_history(session_id)`
+5. `session_search` 只检索可见 transcript，不把 hidden snapshots 当搜索候选
+
+当前还没做：
+
+1. `checkpoint`
+2. `rewind`
+3. `fork session`
+4. `crash-resume protocol`
+
+所以更准确地说：
+
+1. `6.1` 的“Session-first / event-source 第一阶段”已经落地
+2. 但更完整的 event-source 能力还没有做完
+
+### 6.2 第二步：runtime 生命周期协议补齐
+
+当前状态：**最小 lifecycle 骨架已经完成。**
+
+已完成：

 1. `EngineLoadResult.close()`
 2. `AgentLoop.close()`
 3. `AgentService.close()`
 4. `AgentService.shutdown()`
-
-已做第二步的最小版本：
-
-1. `AgentLoop.run()`
-2. `AgentLoop.stop()`
-3. `AgentLoop.submit_direct()`
+5. `AgentLoop.run()`
+6. `AgentLoop.stop()`
+7. `AgentLoop.submit_direct()`
+8. `AgentService.start()`
+9. `AgentService.stop()`
+10. `AgentService.submit_direct()`

 还没做：

@ -534,66 +729,160 @@ task description
 2. 更完整的 provider/client 资源释放协议
 3. 多 worker / bus / 调度策略

-### 6.2 Web / Gateway 接主链
+### 6.2.1 Web / Gateway 现在如何接这套 lifecycle

-现在主链已经能跑，但还没正式变成：
+当前状态：**最小宿主层接入已经完成。**

-1. Web 真正调用 `AgentService.process_direct()`
-2. Gateway 真正调用 `AgentService.process_direct()`
+已经完成：

-### 6.3 Session 更完整的 event-source 能力
+1. Web 通过 FastAPI lifespan 托管 `AgentService.start()/shutdown()`
+2. Web 请求只走 `AgentService.submit_direct()`
+3. Gateway 已有最小 `MessageBus -> AgentService.handle_inbound_message() -> MessageBus` 桥接
+4. Gateway 已支持可选 `ChannelManager`，把 outbound 分发回 channel adapter

-还没做：
+当前 app-instance Docker 已完成：

-1. checkpoint
-2. rewind
-3. fork session
-4. crash-resume protocol
+1. Dockerfile 只安装 `backend/beaver`
+2. entrypoint 启动 `beaver.interfaces.web.app:create_app`
+3. 每个实例挂载 `/root/.beaver`
+4. 配置读取 `/root/.beaver/config.json`
+5. workspace 使用 `/root/.beaver/workspace`
+6. 宿主 `curl /api/chat` 已实测通过

-### 6.4 Multi-agent / swarms
+这一小步还没做：

-还没正式接回主链：
+1. realtime streaming
+2. retry / broker persistence
+3. 外部真实 channel adapter 全量接入

-1. delegation
-2. team runtime
-3. swarms orchestration backend
+### 6.3 第三步：回填 bus 模式

-但 lifecycle 关系已经先定下来了：
+当前状态：**只完成了前置地基，还没有按施工指南真正收口。**
+
+已经具备的前置件：
+
+1. `MessageBus`
+2. `InboundMessage`
+3. `OutboundMessage`
+4. `AgentService.handle_inbound_message()`
+5. Gateway bridge 常驻消费 inbound 并写回 outbound
+6. `AgentLoop.run()` 已有最小运行循环
+
+但严格按 `施工指南.md` 来看，`6.3` 还没有正式完成，因为现在还缺：
+
+1. 把 bus mode 明确成 runtime 的正式运行形态之一
+2. 明确 `run()` 如何稳定消费 inbound message
+3. 明确 bus mode 与 direct mode / queue mode 的职责边界
+4. 明确停机、取消、冲刷 pending inbound 时的统一语义
+5. 再决定后续是否需要更复杂的 worker / retry / routing
+
+也就是说：
+
+1. 现在不是“还没 bus”
+2. 而是“已经把 bus 协议映射收口到 `AgentService`，但还没按施工指南把它扩成完整 bus runtime 模式”
+
+### 6.4 单 agent lifecycle 如何扩展到 team
+
+当前状态：**关系已经定死，但实现还没开始。**
+
+当前已经明确：

 1. team 不会共享一个大 `AgentLoop` 跑所有成员
 2. 每个 team member 都应有自己独立的 `AgentService / AgentLoop`
 3. team coordinator 在上层调度多个 member 实例
 4. 因此当前这套 `start()/submit_direct()/stop()/close()` 首先是 member-level lifecycle
+
+当前还没开始的部分：
+
+1. delegation
 2. team runtime
-3. swarms backend
+3. swarms orchestration backend
 4. group discussion / workflow orchestration

-### 6.5 权限与治理
-
-还没做：
-
-1. permission gates
-2. tool policy
-3. MCP 工具治理
-
 ---

-## 7. 下一步从哪开始最合理
+## 7. 对照 `change.md`，哪些长期目标还没开始

-如果现在继续施工，最合理的顺序是：
+`change.md` 讲的是总蓝图，不是当前施工编号。下面这些仍然是长期目标，还没有正式进入当前阶段实现：

-1. 先把 `flow.md` 作为当前基线固定下来
-2. 再继续第 6 阶段：
-   - runtime lifecycle
-   - `boot / close / run / stop`
-3. 然后再接：
-   - Web / Gateway
-4. 最后才是：
-   - multi-agent / swarms
+1. skills 生命周期系统
+   - `SkillDraft`
+   - `SkillVersion`
+   - review / publish / rollback
+2. Hermes-style learning loop
+   - 智能体定期整理 / 提示记忆
+   - 复杂任务完成后可自主创建技能
+   - 技能在使用过程中自我提升
+   - FTS5 + LLM 摘要的跨会话回忆增强
+   - Honcho 风格辩证用户建模
+3. swarms 作为正式 backend 接回平台
+4. delegation / subagent / team orchestration
+
+当前只完成了这些基础入口：
+
+1. curated memory CRUD
+2. session_search
+3. skill loader / skill_view
+4. skill assembler
+5. tool assembler
+
+### 7.1 权限与治理
+
+还没做：
+
+1. 完整 permission gates
+2. tool policy
+3. MCP 工具治理
+
+已完成的最小边界：
+
+1. 只读 filesystem tools 强制限制在 `ToolContext.workspace`
+2. 路径解析使用真实路径，防止相对路径、绝对路径、符号链接逃逸
+3. 当前还没有 shell / write / network 工具，因此还没进入高风险授权阶段
+
+### 7.2 前端兼容
+
+当前只做了最小 chat response 兼容：
+
+1. 前端 `sendMessage()` 已兼容 Beaver 的 `output_text`
+
+还没做：
+
+1. `/api/auth/*`
+2. `/api/sessions`
+3. `/api/status` 完整页面数据
+4. `/api/skills`
+5. `/api/files`
+6. `/ws`
+7. 浏览器端免登录或新 auth 接入策略
+
+---
+
+## 8. 下一步从哪开始最合理
+
+如果严格按 `施工指南.md` 的施工顺序继续，下一步应是：
+
+1. 完成 `6.3 回填 bus 模式`
+   - 明确 bus mode 的正式运行语义
+   - 让 `AgentLoop.run()` 与 `MessageBus` 的关系稳定收口
+   - 把 inbound / outbound 结果结构定稳
+2. 然后再进入 `6.4`
+   - 先把 team lifecycle 关系写成更可实现的 coordinator 约束
+3. 再进入第 7 阶段
+   - delegation
+   - local subagent
+4. 再进入第 8 阶段
+   - team / swarms backend
+
+如果按 `change.md` 的长期方向看，后面还要补：
+
+1. skills 生命周期
+2. Hermes-style learning loop
+3. 更完整的 memory / governance / frontend

 一句话总结：

-**当前 Beaver 已经有一个可运行的单 agent runtime；接下来不是继续堆局部能力，而是把它升级成有完整生命周期的标准 harness。**
+**当前 Beaver 已经完成到“单 agent runtime + memory/skills + lifecycle + Web/Gateway 最小接入”，按施工指南的编号，下一步应是 `6.3 回填 bus 模式`。**

 ---