```

feat(llm): 添加 Hermes Gateway LLM 设计文档 ```
2026-06-01 16:05:15 +08:00
parent 33a9845566
commit 826db8ec2e
3 changed files with 2534 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -21,6 +21,7 @@ sessions/
 **/.ruff_cache/
 **/.mypy_cache/
 **/.cache/
+**/.codegraph/
 **/.venv/
 **/dist/
 **/build/
--- a/2026-06-01-hermes-gateway-llm-design.md
+++ b/2026-06-01-hermes-gateway-llm-design.md
@ -0,0 +1,177 @@
+# Hermes Gateway LLM Design
+
+Date: 2026-06-01
+
+## Goal
+
+Replace the OpenAI-compatible LLM call path in `custom/custom_agent.py` with a LiveKit LLM
+adapter that talks to NousResearch Hermes Agent through the OpenClaw gateway protocol.
+
+The integration must keep the existing custom agent behavior:
+
+- Chinese room-locator and general assistant instructions
+- Emotion prefix parsing with `<emotion=...>`
+- Memory recall for room-locator queries
+- Optional vision-frame attachment
+- LiveKit ASR, TTS, VAD, turn handling, metrics, and interruption behavior
+
+The Hermes session strategy is `per_room`: one LiveKit room should map to one Hermes gateway
+session for the lifetime of that room.
+
+## Non-Goals
+
+- Do not replace LiveKit `AgentSession`, ASR, TTS, VAD, or room I/O.
+- Do not move room-locator classification into Hermes Agent.
+- Do not implement Hermes-side tools in the first pass.
+- Do not require an OpenAI-compatible proxy in front of the gateway.
+
+## Recommended Architecture
+
+Add a new custom LiveKit LLM implementation in `custom/hermes_gateway.py`.
+
+The adapter will implement the LiveKit `llm.LLM` interface and return a custom `LLMStream`.
+The stream will own a single gateway request/response cycle while the LLM object owns the
+per-room gateway session state.
+
+`custom/custom_agent.py` will continue to call `selected_llm.chat(...)` through
+`_run_selected_llm()`. That preserves the existing `llm_node()` pipeline and keeps Hermes
+behind the same abstraction as OpenAI-compatible models.
+
+## Components
+
+### HermesGatewayLLM
+
+Responsibilities:
+
+- Store gateway configuration: URL, auth token, agent identifier, request timeout, and reconnect
+  policy.
+- Lazily create one Hermes gateway session per LiveKit room.
+- Expose `model` as the configured Hermes agent/model identifier.
+- Expose `provider` as `hermes-gateway`.
+- Create `HermesGatewayLLMStream` from `chat(...)`.
+- Close any persistent WebSocket/session resources in `aclose()`.
+
+### HermesGatewayLLMStream
+
+Responsibilities:
+
+- Serialize LiveKit `ChatContext` into the gateway request payload.
+- Send the latest turn to the per-room Hermes session.
+- Consume gateway events until the turn completes or fails.
+- Yield LiveKit `llm.ChatChunk` objects for assistant text deltas.
+- Surface recoverable connection failures through the normal LiveKit LLM error path.
+
+### custom_agent.py Wiring
+
+Add env-driven provider selection:
+
+- `CUSTOM_LLM_PROVIDER=openai` keeps the current behavior.
+- `CUSTOM_LLM_PROVIDER=hermes_gateway` constructs `HermesGatewayLLM`.
+
+New Hermes-specific env vars:
+
+- `CUSTOM_HERMES_GATEWAY_URL`
+- `CUSTOM_HERMES_API_KEY`
+- `CUSTOM_HERMES_AGENT_ID`
+- `CUSTOM_HERMES_SESSION_MODE=per_room`
+- `CUSTOM_HERMES_REQUEST_TIMEOUT`
+- `CUSTOM_HERMES_VERIFY_SSL`
+
+When `CUSTOM_LLM_PROVIDER=hermes_gateway`, `base_llm`, `text_llm`, and `vision_llm` should all
+point at the same Hermes adapter. Separate Hermes text/vision agent IDs are out of scope for this
+design.
+
+## Data Flow
+
+1. User speaks or sends text.
+2. Existing LiveKit/STT flow updates `ChatContext`.
+3. `CustomAgent.llm_node()` selects `general` or `room_locator`.
+4. Existing code injects the appropriate instructions and emotion-prefix requirement.
+5. Existing code optionally augments the latest user message with memory context.
+6. Existing code optionally attaches a fresh vision frame.
+7. `_run_selected_llm()` calls `HermesGatewayLLM.chat(...)`.
+8. The Hermes adapter sends the request to the per-room gateway session.
+9. Gateway text events are converted to `llm.ChatChunk` deltas.
+10. Existing emotion observation and TTS stripping continue unchanged.
+
+## ChatContext Serialization
+
+Text messages should be serialized first.
+
+Supported LiveKit content:
+
+- `str`: send as normal message content.
+- instruction/config updates: preserve the final active instructions as the leading instruction
+  message in the gateway payload. If the deployed gateway only accepts user/assistant messages,
+  prepend the instruction text to the latest user message before sending.
+- image content: attempt to send through the gateway image/multimodal field. If the deployed
+  Hermes gateway rejects or ignores image content, log a warning and fall back to text-only
+  generation for that turn.
+
+Function tool calls should not be sent in the first implementation. If tool messages appear, log
+that they were omitted.
+
+## per_room Session Lifecycle
+
+The adapter should derive a stable room key from the active LiveKit session or job context. If a
+room name/SID is not available, fall back to one adapter-local session.
+
+For each room key:
+
+1. Open or reuse a gateway connection.
+2. Send the gateway `connect` handshake if needed.
+3. Create a Hermes session once.
+4. Reuse that Hermes session for all future turns from the same room.
+5. Close the gateway connection when the LiveKit LLM is closed.
+
+This lets Hermes maintain its own conversational state while LiveKit still keeps the visible
+conversation history.
+
+## Gateway Event Mapping
+
+Map streaming text events to LiveKit chunks:
+
+- Gateway assistant text delta -> `llm.ChatChunk(delta=llm.ChoiceDelta(content=delta))`
+- Gateway final assistant message -> emit any remaining text not already streamed
+- Gateway usage metadata -> `llm.CompletionUsage` when token counts are available
+- Gateway tool/action events -> log at debug/info level in the first implementation
+- Gateway error event -> raise a LiveKit `APIError` or `APIConnectionError`
+- Gateway completion event -> finish the async iterator
+
+The implementation should make the event parser tolerant of protocol field-name differences by
+isolating event normalization in one helper function. Unknown event types should be logged and
+ignored unless they indicate failure.
+
+## Error Handling
+
+- Missing Hermes env vars should fail fast at startup when provider is `hermes_gateway`.
+- Gateway connect/session-create failures should raise connection errors.
+- A failed request should not discard the per-room session unless the gateway reports that the
+  session is invalid or closed.
+- If the gateway connection closes mid-turn, reconnect once and retry only if no assistant text
+  has been yielded yet.
+- If assistant text has already been yielded, fail the turn instead of replaying partial output.
+
+## Testing
+
+Add focused tests around the adapter:
+
+- Serializes simple system/user/assistant chat context.
+- Creates one gateway session and reuses it across two turns for the same room.
+- Converts text deltas into `llm.ChatChunk` content.
+- Handles final full-message events without duplicate text.
+- Raises on gateway error events.
+- Logs and skips unsupported image/tool content.
+
+Add a small wiring test or import-level test for `CUSTOM_LLM_PROVIDER=hermes_gateway` if the
+custom module is testable without external services.
+
+## Rollout
+
+1. Implement the adapter behind `CUSTOM_LLM_PROVIDER=hermes_gateway`.
+2. Keep `openai` as the default provider.
+3. Run unit tests for the adapter and a syntax/type smoke check on `custom/custom_agent.py`.
+4. Test manually with a local gateway using `python custom/custom_agent.py console` or the
+   existing LiveKit development mode.
+5. If vision payloads are unsupported by the deployed gateway, document that the first Hermes
+   rollout is text-only for vision turns.
--- a/docs/superpowers/plans/2026-06-01-channel-runtime-v1.md
+++ b/docs/superpowers/plans/2026-06-01-channel-runtime-v1.md