```

feat(llm): 添加 Hermes Gateway LLM 设计文档 ```
2026-06-01 16:05:15 +08:00
parent 33a9845566
commit 826db8ec2e
3 changed files with 2534 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -21,6 +21,7 @@ sessions/
 **/.ruff_cache/
 **/.mypy_cache/
 **/.cache/
 **/.codegraph/
 **/.venv/
 **/dist/
 **/build/
--- a/2026-06-01-hermes-gateway-llm-design.md
+++ b/2026-06-01-hermes-gateway-llm-design.md
@ -0,0 +1,177 @@
 # Hermes Gateway LLM Design
 Date: 2026-06-01
 ## Goal
 Replace the OpenAI-compatible LLM call path in `custom/custom_agent.py` with a LiveKit LLM
 adapter that talks to NousResearch Hermes Agent through the OpenClaw gateway protocol.
 The integration must keep the existing custom agent behavior:
 - Chinese room-locator and general assistant instructions
 - Emotion prefix parsing with `<emotion=...>`
 - Memory recall for room-locator queries
 - Optional vision-frame attachment
 - LiveKit ASR, TTS, VAD, turn handling, metrics, and interruption behavior
 The Hermes session strategy is `per_room`: one LiveKit room should map to one Hermes gateway
 session for the lifetime of that room.
 ## Non-Goals
 - Do not replace LiveKit `AgentSession`, ASR, TTS, VAD, or room I/O.
 - Do not move room-locator classification into Hermes Agent.
 - Do not implement Hermes-side tools in the first pass.
 - Do not require an OpenAI-compatible proxy in front of the gateway.
 ## Recommended Architecture
 Add a new custom LiveKit LLM implementation in `custom/hermes_gateway.py`.
 The adapter will implement the LiveKit `llm.LLM` interface and return a custom `LLMStream`.
 The stream will own a single gateway request/response cycle while the LLM object owns the
 per-room gateway session state.
 `custom/custom_agent.py` will continue to call `selected_llm.chat(...)` through
 `_run_selected_llm()`. That preserves the existing `llm_node()` pipeline and keeps Hermes
 behind the same abstraction as OpenAI-compatible models.
 ## Components
 ### HermesGatewayLLM
 Responsibilities:
 - Store gateway configuration: URL, auth token, agent identifier, request timeout, and reconnect
  policy.
 - Lazily create one Hermes gateway session per LiveKit room.
 - Expose `model` as the configured Hermes agent/model identifier.
 - Expose `provider` as `hermes-gateway`.
 - Create `HermesGatewayLLMStream` from `chat(...)`.
 - Close any persistent WebSocket/session resources in `aclose()`.
 ### HermesGatewayLLMStream
 Responsibilities:
 - Serialize LiveKit `ChatContext` into the gateway request payload.
 - Send the latest turn to the per-room Hermes session.
 - Consume gateway events until the turn completes or fails.
 - Yield LiveKit `llm.ChatChunk` objects for assistant text deltas.
 - Surface recoverable connection failures through the normal LiveKit LLM error path.
 ### custom_agent.py Wiring
 Add env-driven provider selection:
 - `CUSTOM_LLM_PROVIDER=openai` keeps the current behavior.
 - `CUSTOM_LLM_PROVIDER=hermes_gateway` constructs `HermesGatewayLLM`.
 New Hermes-specific env vars:
 - `CUSTOM_HERMES_GATEWAY_URL`
 - `CUSTOM_HERMES_API_KEY`
 - `CUSTOM_HERMES_AGENT_ID`
 - `CUSTOM_HERMES_SESSION_MODE=per_room`
 - `CUSTOM_HERMES_REQUEST_TIMEOUT`
 - `CUSTOM_HERMES_VERIFY_SSL`
 When `CUSTOM_LLM_PROVIDER=hermes_gateway`, `base_llm`, `text_llm`, and `vision_llm` should all
 point at the same Hermes adapter. Separate Hermes text/vision agent IDs are out of scope for this
 design.
 ## Data Flow
 1. User speaks or sends text.
 2. Existing LiveKit/STT flow updates `ChatContext`.
 3. `CustomAgent.llm_node()` selects `general` or `room_locator`.
 4. Existing code injects the appropriate instructions and emotion-prefix requirement.
 5. Existing code optionally augments the latest user message with memory context.
 6. Existing code optionally attaches a fresh vision frame.
 7. `_run_selected_llm()` calls `HermesGatewayLLM.chat(...)`.
 8. The Hermes adapter sends the request to the per-room gateway session.
 9. Gateway text events are converted to `llm.ChatChunk` deltas.
 10. Existing emotion observation and TTS stripping continue unchanged.
 ## ChatContext Serialization
 Text messages should be serialized first.
 Supported LiveKit content:
 - `str`: send as normal message content.
 - instruction/config updates: preserve the final active instructions as the leading instruction
  message in the gateway payload. If the deployed gateway only accepts user/assistant messages,
  prepend the instruction text to the latest user message before sending.
 - image content: attempt to send through the gateway image/multimodal field. If the deployed
  Hermes gateway rejects or ignores image content, log a warning and fall back to text-only
  generation for that turn.
 Function tool calls should not be sent in the first implementation. If tool messages appear, log
 that they were omitted.
 ## per_room Session Lifecycle
 The adapter should derive a stable room key from the active LiveKit session or job context. If a
 room name/SID is not available, fall back to one adapter-local session.
 For each room key:
 1. Open or reuse a gateway connection.
 2. Send the gateway `connect` handshake if needed.
 3. Create a Hermes session once.
 4. Reuse that Hermes session for all future turns from the same room.
 5. Close the gateway connection when the LiveKit LLM is closed.
 This lets Hermes maintain its own conversational state while LiveKit still keeps the visible
 conversation history.
 ## Gateway Event Mapping
 Map streaming text events to LiveKit chunks:
 - Gateway assistant text delta -> `llm.ChatChunk(delta=llm.ChoiceDelta(content=delta))`
 - Gateway final assistant message -> emit any remaining text not already streamed
 - Gateway usage metadata -> `llm.CompletionUsage` when token counts are available
 - Gateway tool/action events -> log at debug/info level in the first implementation
 - Gateway error event -> raise a LiveKit `APIError` or `APIConnectionError`
 - Gateway completion event -> finish the async iterator
 The implementation should make the event parser tolerant of protocol field-name differences by
 isolating event normalization in one helper function. Unknown event types should be logged and
 ignored unless they indicate failure.
 ## Error Handling
 - Missing Hermes env vars should fail fast at startup when provider is `hermes_gateway`.
 - Gateway connect/session-create failures should raise connection errors.
 - A failed request should not discard the per-room session unless the gateway reports that the
  session is invalid or closed.
 - If the gateway connection closes mid-turn, reconnect once and retry only if no assistant text
  has been yielded yet.
 - If assistant text has already been yielded, fail the turn instead of replaying partial output.
 ## Testing
 Add focused tests around the adapter:
 - Serializes simple system/user/assistant chat context.
 - Creates one gateway session and reuses it across two turns for the same room.
 - Converts text deltas into `llm.ChatChunk` content.
 - Handles final full-message events without duplicate text.
 - Raises on gateway error events.
 - Logs and skips unsupported image/tool content.
 Add a small wiring test or import-level test for `CUSTOM_LLM_PROVIDER=hermes_gateway` if the
 custom module is testable without external services.
 ## Rollout
 1. Implement the adapter behind `CUSTOM_LLM_PROVIDER=hermes_gateway`.
 2. Keep `openai` as the default provider.
 3. Run unit tests for the adapter and a syntax/type smoke check on `custom/custom_agent.py`.
 4. Test manually with a local gateway using `python custom/custom_agent.py console` or the
   existing LiveKit development mode.
 5. If vision payloads are unsupported by the deployed gateway, document that the first Hermes
   rollout is text-only for vision turns.
--- a/docs/superpowers/plans/2026-06-01-channel-runtime-v1.md
+++ b/docs/superpowers/plans/2026-06-01-channel-runtime-v1.md