feat: implement channel runtime connectors

2026-06-03 16:22:44 +08:00
parent ee972441f5
commit c3d84b904a
105 changed files with 15621 additions and 322 deletions
--- a/2026-06-01-hermes-gateway-llm-design.md
+++ b/2026-06-01-hermes-gateway-llm-design.md
@ -1,177 +1,458 @@
-# Hermes Gateway LLM Design
+# Beaver Terminal WebSocket Integration Guide

 Date: 2026-06-01

+Audience: the small-terminal-side Codex agent that will modify terminal firmware or terminal app code.
+
 ## Goal

-Replace the OpenAI-compatible LLM call path in `custom/custom_agent.py` with a LiveKit LLM
-adapter that talks to NousResearch Hermes Agent through the OpenClaw gateway protocol.
+Connect the small terminal device to Beaver through a text-only WebSocket channel.

-The integration must keep the existing custom agent behavior:
+The first acceptance target is simple:

- Chinese room-locator and general assistant instructions
- Emotion prefix parsing with `<emotion=...>`
- Memory recall for room-locator queries
- Optional vision-frame attachment
- LiveKit ASR, TTS, VAD, turn handling, metrics, and interruption behavior
+1. The terminal opens a WebSocket connection to Beaver.
+2. The terminal sends a `connect` frame with a stable `peer_id`.
+3. The terminal sends one text `message` frame.
+4. The terminal receives an `ack`.
+5. The terminal receives the final assistant text response from Beaver.
+6. The terminal can reconnect with the same `peer_id` and keep the same Beaver session.

-The Hermes session strategy is `per_room`: one LiveKit room should map to one Hermes gateway
-session for the lifetime of that room.
+This document replaces the earlier Hermes LiveKit LLM adapter design for the terminal-side work. Do not implement a LiveKit LLM adapter from this document.

 ## Non-Goals

- Do not replace LiveKit `AgentSession`, ASR, TTS, VAD, or room I/O.
- Do not move room-locator classification into Hermes Agent.
- Do not implement Hermes-side tools in the first pass.
- Do not require an OpenAI-compatible proxy in front of the gateway.
+- Do not implement audio streaming.
+- Do not implement camera, screen, image, or multimodal frames.
+- Do not implement token streaming.
+- Do not implement terminal-side tools.
+- Do not implement AuthZ, device registration, OAuth, or pairing in the first pass.
+- Do not call Beaver REST chat endpoints or the existing Web UI `/ws/{session_id}` endpoint.
+- Do not build an OpenAI-compatible proxy.
+- Do not implement Hermes Agent or LiveKit changes on the terminal side.

-## Recommended Architecture
+## Beaver Endpoint

-Add a new custom LiveKit LLM implementation in `custom/hermes_gateway.py`.
+The terminal connects to:

-The adapter will implement the LiveKit `llm.LLM` interface and return a custom `LLMStream`.
-The stream will own a single gateway request/response cycle while the LLM object owns the
-per-room gateway session state.
+```text
+ws://<beaver-host>/api/channels/<channel_id>/ws
+```

-`custom/custom_agent.py` will continue to call `selected_llm.chat(...)` through
-`_run_selected_llm()`. That preserves the existing `llm_node()` pipeline and keeps Hermes
-behind the same abstraction as OpenAI-compatible models.
+For local development through the Beaver app instance nginx port:

-## Components
+```text
+ws://127.0.0.1:8080/api/channels/terminal-dev/ws
+```

-### HermesGatewayLLM
+For direct backend development without nginx:

-Responsibilities:
+```text
+ws://127.0.0.1:18080/api/channels/terminal-dev/ws
+```

- Store gateway configuration: URL, auth token, agent identifier, request timeout, and reconnect
-  policy.
- Lazily create one Hermes gateway session per LiveKit room.
- Expose `model` as the configured Hermes agent/model identifier.
- Expose `provider` as `hermes-gateway`.
- Create `HermesGatewayLLMStream` from `chat(...)`.
- Close any persistent WebSocket/session resources in `aclose()`.
+Use `wss://` when Beaver is deployed behind TLS.

-### HermesGatewayLLMStream
+The expected first channel id is:

-Responsibilities:
+```text
+terminal-dev
+```

- Serialize LiveKit `ChatContext` into the gateway request payload.
- Send the latest turn to the per-room Hermes session.
- Consume gateway events until the turn completes or fails.
- Yield LiveKit `llm.ChatChunk` objects for assistant text deltas.
- Surface recoverable connection failures through the normal LiveKit LLM error path.
+The terminal implementation should make the URL configurable, for example:

-### custom_agent.py Wiring
+```text
+BEAVER_WS_URL=ws://127.0.0.1:8080/api/channels/terminal-dev/ws
+TERMINAL_PEER_ID=device-001
+TERMINAL_DEVICE_NAME=desk-terminal
+```

-Add env-driven provider selection:
+## Protocol Overview

- `CUSTOM_LLM_PROVIDER=openai` keeps the current behavior.
- `CUSTOM_LLM_PROVIDER=hermes_gateway` constructs `HermesGatewayLLM`.
+The transport is JSON over WebSocket.

-New Hermes-specific env vars:
+All frames are UTF-8 JSON objects. The terminal should ignore unknown fields. Beaver will ignore unknown fields unless the frame type is invalid.

- `CUSTOM_HERMES_GATEWAY_URL`
- `CUSTOM_HERMES_API_KEY`
- `CUSTOM_HERMES_AGENT_ID`
- `CUSTOM_HERMES_SESSION_MODE=per_room`
- `CUSTOM_HERMES_REQUEST_TIMEOUT`
- `CUSTOM_HERMES_VERIFY_SSL`
+The protocol is request/reply oriented in this phase. Beaver sends only final assistant messages, not token deltas.

-When `CUSTOM_LLM_PROVIDER=hermes_gateway`, `base_llm`, `text_llm`, and `vision_llm` should all
-point at the same Hermes adapter. Separate Hermes text/vision agent IDs are out of scope for this
-design.
+Required frame flow:

-## Data Flow
+```text
+terminal -> Beaver: connect
+Beaver -> terminal: connected
+terminal -> Beaver: message
+Beaver -> terminal: ack
+Beaver -> terminal: message
+```

-1. User speaks or sends text.
-2. Existing LiveKit/STT flow updates `ChatContext`.
-3. `CustomAgent.llm_node()` selects `general` or `room_locator`.
-4. Existing code injects the appropriate instructions and emotion-prefix requirement.
-5. Existing code optionally augments the latest user message with memory context.
-6. Existing code optionally attaches a fresh vision frame.
-7. `_run_selected_llm()` calls `HermesGatewayLLM.chat(...)`.
-8. The Hermes adapter sends the request to the per-room gateway session.
-9. Gateway text events are converted to `llm.ChatChunk` deltas.
-10. Existing emotion observation and TTS stripping continue unchanged.
+Optional heartbeat:

-## ChatContext Serialization
+```text
+terminal -> Beaver: ping
+Beaver -> terminal: pong
+```

-Text messages should be serialized first.
+## Connect Frame

-Supported LiveKit content:
+The terminal must send `connect` immediately after the WebSocket opens.

- `str`: send as normal message content.
- instruction/config updates: preserve the final active instructions as the leading instruction
-  message in the gateway payload. If the deployed gateway only accepts user/assistant messages,
-  prepend the instruction text to the latest user message before sending.
- image content: attempt to send through the gateway image/multimodal field. If the deployed
-  Hermes gateway rejects or ignores image content, log a warning and fall back to text-only
-  generation for that turn.
+Terminal to Beaver:

-Function tool calls should not be sent in the first implementation. If tool messages appear, log
-that they were omitted.
+```json
+{
+  "type": "connect",
+  "peer_id": "device-001",
+  "device_name": "desk-terminal",
+  "capabilities": ["text"]
+}
+```

-## per_room Session Lifecycle
+Required fields:

-The adapter should derive a stable room key from the active LiveKit session or job context. If a
-room name/SID is not available, fall back to one adapter-local session.
+- `type`: must be `"connect"`.
+- `peer_id`: stable terminal identity. Reuse this value across reconnects.

-For each room key:
+Recommended fields:

-1. Open or reuse a gateway connection.
-2. Send the gateway `connect` handshake if needed.
-3. Create a Hermes session once.
-4. Reuse that Hermes session for all future turns from the same room.
-5. Close the gateway connection when the LiveKit LLM is closed.
+- `device_name`: human-readable terminal name.
+- `capabilities`: include `"text"`.

-This lets Hermes maintain its own conversational state while LiveKit still keeps the visible
-conversation history.
+Optional fields:

-## Gateway Event Mapping
+- `thread_id`: optional sub-session key. Omit it for the first pass.
+- `user_id`: optional user identity. Omit it unless the terminal already has a stable user id.

-Map streaming text events to LiveKit chunks:
+Beaver to terminal:

- Gateway assistant text delta -> `llm.ChatChunk(delta=llm.ChoiceDelta(content=delta))`
- Gateway final assistant message -> emit any remaining text not already streamed
- Gateway usage metadata -> `llm.CompletionUsage` when token counts are available
- Gateway tool/action events -> log at debug/info level in the first implementation
- Gateway error event -> raise a LiveKit `APIError` or `APIConnectionError`
- Gateway completion event -> finish the async iterator
+```json
+{
+  "type": "connected",
+  "channel_id": "terminal-dev",
+  "session_id": "terminal-dev:local:device-001"
+}
+```

-The implementation should make the event parser tolerant of protocol field-name differences by
-isolating event normalization in one helper function. Unknown event types should be logged and
-ignored unless they indicate failure.
+The terminal should store `session_id` for logging and diagnostics. It does not need to send `session_id` back in message frames.

-## Error Handling
+## Message Frame

- Missing Hermes env vars should fail fast at startup when provider is `hermes_gateway`.
- Gateway connect/session-create failures should raise connection errors.
- A failed request should not discard the per-room session unless the gateway reports that the
-  session is invalid or closed.
- If the gateway connection closes mid-turn, reconnect once and retry only if no assistant text
-  has been yielded yet.
- If assistant text has already been yielded, fail the turn instead of replaying partial output.
+Terminal to Beaver:

-## Testing
+```json
+{
+  "type": "message",
+  "message_id": "m-001",
+  "text": "hello"
+}
+```

-Add focused tests around the adapter:
+Required fields:

- Serializes simple system/user/assistant chat context.
- Creates one gateway session and reuses it across two turns for the same room.
- Converts text deltas into `llm.ChatChunk` content.
- Handles final full-message events without duplicate text.
- Raises on gateway error events.
- Logs and skips unsupported image/tool content.
+- `type`: must be `"message"`.
+- `message_id`: unique id for this user message.
+- `text`: non-empty user text.

-Add a small wiring test or import-level test for `CUSTOM_LLM_PROVIDER=hermes_gateway` if the
-custom module is testable without external services.
+Recommended `message_id` format:

-## Rollout
+```text
+<peer_id>-<monotonic-counter>
+```

-1. Implement the adapter behind `CUSTOM_LLM_PROVIDER=hermes_gateway`.
-2. Keep `openai` as the default provider.
-3. Run unit tests for the adapter and a syntax/type smoke check on `custom/custom_agent.py`.
-4. Test manually with a local gateway using `python custom/custom_agent.py console` or the
-   existing LiveKit development mode.
-5. If vision payloads are unsupported by the deployed gateway, document that the first Hermes
-   rollout is text-only for vision turns.
+Example:
+
+```text
+device-001-000001
+device-001-000002
+```
+
+The terminal should persist the counter if practical. If persistence is unavailable, generate a UUID or timestamp-based id. Reusing the same `message_id` tells Beaver to treat the frame as a duplicate.
+
+Optional fields:
+
+- `thread_id`: use only when the terminal intentionally wants a separate Beaver session.
+- `user_id`: use only when the terminal has a stable user id.
+
+## Ack Frame
+
+Beaver sends an ack after accepting or deduplicating the inbound message.
+
+Accepted:
+
+```json
+{
+  "type": "ack",
+  "message_id": "device-001-000001",
+  "session_id": "terminal-dev:local:device-001",
+  "accepted": true
+}
+```
+
+Duplicate still processing:
+
+```json
+{
+  "type": "ack",
+  "message_id": "device-001-000001",
+  "session_id": "terminal-dev:local:device-001",
+  "accepted": false,
+  "duplicate": true,
+  "pending": true
+}
+```
+
+Duplicate already completed:
+
+```json
+{
+  "type": "ack",
+  "message_id": "device-001-000001",
+  "session_id": "terminal-dev:local:device-001",
+  "accepted": false,
+  "duplicate": true,
+  "pending": false,
+  "reply": "cached assistant reply"
+}
+```
+
+Terminal behavior:
+
+- If `accepted` is true, wait for the assistant `message`.
+- If `duplicate` and `reply` is present, display the cached reply.
+- If `duplicate` and `pending` is true, keep waiting on the socket.
+- If `error` is present, display or log the error.
+
+## Assistant Message Frame
+
+Beaver to terminal:
+
+```json
+{
+  "type": "message",
+  "role": "assistant",
+  "message_id": "device-001-000001",
+  "run_id": "run-id",
+  "text": "assistant reply",
+  "finish_reason": "stop"
+}
+```
+
+Fields:
+
+- `type`: `"message"`.
+- `role`: `"assistant"`.
+- `message_id`: the user message id this response belongs to.
+- `run_id`: Beaver run id for diagnostics.
+- `text`: final assistant response.
+- `finish_reason`: usually `"stop"`, or `"error"` when the run failed.
+
+Terminal behavior:
+
+- Render or speak `text`.
+- Treat `finish_reason == "error"` as a failed turn.
+- Do not expect token-level streaming in this phase.
+
+## Ping And Pong
+
+Terminal to Beaver:
+
+```json
+{"type": "ping"}
+```
+
+Beaver to terminal:
+
+```json
+{"type": "pong"}
+```
+
+Recommended heartbeat interval:
+
+```text
+30 seconds
+```
+
+If no pong or other frame is received after a reasonable timeout, reconnect.
+
+## Error Frame
+
+Beaver may send:
+
+```json
+{
+  "type": "error",
+  "error": "human readable error"
+}
+```
+
+Terminal behavior:
+
+- Log the error.
+- Keep the connection open unless the WebSocket closes.
+- If the error is for a user message, allow the user to retry with a new `message_id`.
+
+Common first-pass errors:
+
+- `connect` is required before `message`.
+- `peer_id` is required.
+- `message_id` is required.
+- `text` is required.
+- Unsupported websocket frame type.
+
+## Terminal State Machine
+
+Implement the terminal client as a small state machine.
+
+```text
+DISCONNECTED
+  -> connect websocket
+CONNECTING
+  -> websocket open, send connect frame
+WAIT_CONNECTED
+  -> receive connected
+READY
+  -> send message frame
+WAIT_ACK
+  -> receive ack
+WAIT_REPLY
+  -> receive assistant message
+READY
+```
+
+On WebSocket close or network failure, transition to `DISCONNECTED` and reconnect with backoff.
+
+Recommended reconnect policy:
+
+- Start at 1 second.
+- Double up to 30 seconds.
+- Reset backoff after a successful `connected` frame.
+
+On reconnect, use the same `peer_id`.
+
+## Terminal Implementation Requirements
+
+The terminal-side code should provide:
+
+- A configurable Beaver WebSocket URL.
+- A stable `peer_id`.
+- A configurable `device_name`.
+- A monotonic or otherwise unique `message_id` generator.
+- JSON encoding and decoding.
+- Connect frame on socket open.
+- Ping/pong heartbeat.
+- Reconnect with backoff.
+- A queue or guard so only one user text turn is in flight at a time for the first pass.
+- Logging for `session_id`, `message_id`, `run_id`, and errors.
+
+The terminal-side code does not need:
+
+- Multi-room session logic.
+- Hermes session management.
+- LiveKit `AgentSession`.
+- Audio chunking.
+- Tool calls.
+- OAuth or token refresh.
+
+## Example Client Pseudocode
+
+```python
+peer_id = load_or_create_peer_id()
+counter = load_counter()
+
+async def run_terminal_client():
+    while True:
+        try:
+            async with connect(BEAVER_WS_URL) as ws:
+                await ws.send_json({
+                    "type": "connect",
+                    "peer_id": peer_id,
+                    "device_name": DEVICE_NAME,
+                    "capabilities": ["text"],
+                })
+
+                connected = await ws.receive_json()
+                assert connected["type"] == "connected"
+                log("session_id", connected["session_id"])
+
+                await read_send_receive_loop(ws)
+        except Exception as exc:
+            log("websocket disconnected", exc)
+            await sleep(next_backoff())
+
+async def send_user_text(ws, text):
+    global counter
+    counter += 1
+    save_counter(counter)
+    message_id = f"{peer_id}-{counter:06d}"
+
+    await ws.send_json({
+        "type": "message",
+        "message_id": message_id,
+        "text": text,
+    })
+
+    while True:
+        frame = await ws.receive_json()
+        if frame["type"] == "ack" and frame.get("message_id") == message_id:
+            if frame.get("reply"):
+                return frame["reply"]
+            continue
+        if frame["type"] == "message" and frame.get("role") == "assistant":
+            if frame.get("message_id") == message_id:
+                return frame.get("text", "")
+        if frame["type"] == "error":
+            raise RuntimeError(frame.get("error", "unknown error"))
+```
+
+Adapt the pseudocode to the terminal runtime language and WebSocket library.
+
+## Manual Test With websocat
+
+If `websocat` is available, a developer can manually test the protocol:
+
+```bash
+websocat ws://127.0.0.1:8080/api/channels/terminal-dev/ws
+```
+
+Then paste:
+
+```json
+{"type":"connect","peer_id":"device-001","device_name":"desk-terminal","capabilities":["text"]}
+```
+
+Expected response:
+
+```json
+{"type":"connected","channel_id":"terminal-dev","session_id":"terminal-dev:local:device-001"}
+```
+
+Then paste:
+
+```json
+{"type":"message","message_id":"device-001-000001","text":"hello"}
+```
+
+Expected responses:
+
+```json
+{"type":"ack","message_id":"device-001-000001","session_id":"terminal-dev:local:device-001","accepted":true}
+```
+
+Then, after Beaver finishes the run:
+
+```json
+{"type":"message","role":"assistant","message_id":"device-001-000001","run_id":"...","text":"...","finish_reason":"stop"}
+```
+
+## Acceptance Checklist For Terminal-Side Codex
+
+- The terminal opens the configured Beaver WebSocket URL.
+- The terminal sends `connect` immediately after open.
+- The terminal receives and logs `connected.session_id`.
+- The terminal sends text using a unique `message_id`.
+- The terminal receives `ack`.
+- The terminal receives and displays assistant `message.text`.
+- The terminal handles `ping`/`pong`.
+- The terminal reconnects with the same `peer_id`.
+- The terminal does not use REST chat or `/ws/{session_id}`.
+- The terminal implementation remains text-only for the first pass.
+
+When this checklist passes against Beaver, the first-stage device integration is accepted from the terminal side.