feat: implement channel runtime connectors
This commit is contained in:
@ -1,177 +1,458 @@
|
||||
# Hermes Gateway LLM Design
|
||||
# Beaver Terminal WebSocket Integration Guide
|
||||
|
||||
Date: 2026-06-01
|
||||
|
||||
Audience: the small-terminal-side Codex agent that will modify terminal firmware or terminal app code.
|
||||
|
||||
## Goal
|
||||
|
||||
Replace the OpenAI-compatible LLM call path in `custom/custom_agent.py` with a LiveKit LLM
|
||||
adapter that talks to NousResearch Hermes Agent through the OpenClaw gateway protocol.
|
||||
Connect the small terminal device to Beaver through a text-only WebSocket channel.
|
||||
|
||||
The integration must keep the existing custom agent behavior:
|
||||
The first acceptance target is simple:
|
||||
|
||||
- Chinese room-locator and general assistant instructions
|
||||
- Emotion prefix parsing with `<emotion=...>`
|
||||
- Memory recall for room-locator queries
|
||||
- Optional vision-frame attachment
|
||||
- LiveKit ASR, TTS, VAD, turn handling, metrics, and interruption behavior
|
||||
1. The terminal opens a WebSocket connection to Beaver.
|
||||
2. The terminal sends a `connect` frame with a stable `peer_id`.
|
||||
3. The terminal sends one text `message` frame.
|
||||
4. The terminal receives an `ack`.
|
||||
5. The terminal receives the final assistant text response from Beaver.
|
||||
6. The terminal can reconnect with the same `peer_id` and keep the same Beaver session.
|
||||
|
||||
The Hermes session strategy is `per_room`: one LiveKit room should map to one Hermes gateway
|
||||
session for the lifetime of that room.
|
||||
This document replaces the earlier Hermes LiveKit LLM adapter design for the terminal-side work. Do not implement a LiveKit LLM adapter from this document.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Do not replace LiveKit `AgentSession`, ASR, TTS, VAD, or room I/O.
|
||||
- Do not move room-locator classification into Hermes Agent.
|
||||
- Do not implement Hermes-side tools in the first pass.
|
||||
- Do not require an OpenAI-compatible proxy in front of the gateway.
|
||||
- Do not implement audio streaming.
|
||||
- Do not implement camera, screen, image, or multimodal frames.
|
||||
- Do not implement token streaming.
|
||||
- Do not implement terminal-side tools.
|
||||
- Do not implement AuthZ, device registration, OAuth, or pairing in the first pass.
|
||||
- Do not call Beaver REST chat endpoints or the existing Web UI `/ws/{session_id}` endpoint.
|
||||
- Do not build an OpenAI-compatible proxy.
|
||||
- Do not implement Hermes Agent or LiveKit changes on the terminal side.
|
||||
|
||||
## Recommended Architecture
|
||||
## Beaver Endpoint
|
||||
|
||||
Add a new custom LiveKit LLM implementation in `custom/hermes_gateway.py`.
|
||||
The terminal connects to:
|
||||
|
||||
The adapter will implement the LiveKit `llm.LLM` interface and return a custom `LLMStream`.
|
||||
The stream will own a single gateway request/response cycle while the LLM object owns the
|
||||
per-room gateway session state.
|
||||
```text
|
||||
ws://<beaver-host>/api/channels/<channel_id>/ws
|
||||
```
|
||||
|
||||
`custom/custom_agent.py` will continue to call `selected_llm.chat(...)` through
|
||||
`_run_selected_llm()`. That preserves the existing `llm_node()` pipeline and keeps Hermes
|
||||
behind the same abstraction as OpenAI-compatible models.
|
||||
For local development through the Beaver app instance nginx port:
|
||||
|
||||
## Components
|
||||
```text
|
||||
ws://127.0.0.1:8080/api/channels/terminal-dev/ws
|
||||
```
|
||||
|
||||
### HermesGatewayLLM
|
||||
For direct backend development without nginx:
|
||||
|
||||
Responsibilities:
|
||||
```text
|
||||
ws://127.0.0.1:18080/api/channels/terminal-dev/ws
|
||||
```
|
||||
|
||||
- Store gateway configuration: URL, auth token, agent identifier, request timeout, and reconnect
|
||||
policy.
|
||||
- Lazily create one Hermes gateway session per LiveKit room.
|
||||
- Expose `model` as the configured Hermes agent/model identifier.
|
||||
- Expose `provider` as `hermes-gateway`.
|
||||
- Create `HermesGatewayLLMStream` from `chat(...)`.
|
||||
- Close any persistent WebSocket/session resources in `aclose()`.
|
||||
Use `wss://` when Beaver is deployed behind TLS.
|
||||
|
||||
### HermesGatewayLLMStream
|
||||
The expected first channel id is:
|
||||
|
||||
Responsibilities:
|
||||
```text
|
||||
terminal-dev
|
||||
```
|
||||
|
||||
- Serialize LiveKit `ChatContext` into the gateway request payload.
|
||||
- Send the latest turn to the per-room Hermes session.
|
||||
- Consume gateway events until the turn completes or fails.
|
||||
- Yield LiveKit `llm.ChatChunk` objects for assistant text deltas.
|
||||
- Surface recoverable connection failures through the normal LiveKit LLM error path.
|
||||
The terminal implementation should make the URL configurable, for example:
|
||||
|
||||
### custom_agent.py Wiring
|
||||
```text
|
||||
BEAVER_WS_URL=ws://127.0.0.1:8080/api/channels/terminal-dev/ws
|
||||
TERMINAL_PEER_ID=device-001
|
||||
TERMINAL_DEVICE_NAME=desk-terminal
|
||||
```
|
||||
|
||||
Add env-driven provider selection:
|
||||
## Protocol Overview
|
||||
|
||||
- `CUSTOM_LLM_PROVIDER=openai` keeps the current behavior.
|
||||
- `CUSTOM_LLM_PROVIDER=hermes_gateway` constructs `HermesGatewayLLM`.
|
||||
The transport is JSON over WebSocket.
|
||||
|
||||
New Hermes-specific env vars:
|
||||
All frames are UTF-8 JSON objects. The terminal should ignore unknown fields. Beaver will ignore unknown fields unless the frame type is invalid.
|
||||
|
||||
- `CUSTOM_HERMES_GATEWAY_URL`
|
||||
- `CUSTOM_HERMES_API_KEY`
|
||||
- `CUSTOM_HERMES_AGENT_ID`
|
||||
- `CUSTOM_HERMES_SESSION_MODE=per_room`
|
||||
- `CUSTOM_HERMES_REQUEST_TIMEOUT`
|
||||
- `CUSTOM_HERMES_VERIFY_SSL`
|
||||
The protocol is request/reply oriented in this phase. Beaver sends only final assistant messages, not token deltas.
|
||||
|
||||
When `CUSTOM_LLM_PROVIDER=hermes_gateway`, `base_llm`, `text_llm`, and `vision_llm` should all
|
||||
point at the same Hermes adapter. Separate Hermes text/vision agent IDs are out of scope for this
|
||||
design.
|
||||
Required frame flow:
|
||||
|
||||
## Data Flow
|
||||
```text
|
||||
terminal -> Beaver: connect
|
||||
Beaver -> terminal: connected
|
||||
terminal -> Beaver: message
|
||||
Beaver -> terminal: ack
|
||||
Beaver -> terminal: message
|
||||
```
|
||||
|
||||
1. User speaks or sends text.
|
||||
2. Existing LiveKit/STT flow updates `ChatContext`.
|
||||
3. `CustomAgent.llm_node()` selects `general` or `room_locator`.
|
||||
4. Existing code injects the appropriate instructions and emotion-prefix requirement.
|
||||
5. Existing code optionally augments the latest user message with memory context.
|
||||
6. Existing code optionally attaches a fresh vision frame.
|
||||
7. `_run_selected_llm()` calls `HermesGatewayLLM.chat(...)`.
|
||||
8. The Hermes adapter sends the request to the per-room gateway session.
|
||||
9. Gateway text events are converted to `llm.ChatChunk` deltas.
|
||||
10. Existing emotion observation and TTS stripping continue unchanged.
|
||||
Optional heartbeat:
|
||||
|
||||
## ChatContext Serialization
|
||||
```text
|
||||
terminal -> Beaver: ping
|
||||
Beaver -> terminal: pong
|
||||
```
|
||||
|
||||
Text messages should be serialized first.
|
||||
## Connect Frame
|
||||
|
||||
Supported LiveKit content:
|
||||
The terminal must send `connect` immediately after the WebSocket opens.
|
||||
|
||||
- `str`: send as normal message content.
|
||||
- instruction/config updates: preserve the final active instructions as the leading instruction
|
||||
message in the gateway payload. If the deployed gateway only accepts user/assistant messages,
|
||||
prepend the instruction text to the latest user message before sending.
|
||||
- image content: attempt to send through the gateway image/multimodal field. If the deployed
|
||||
Hermes gateway rejects or ignores image content, log a warning and fall back to text-only
|
||||
generation for that turn.
|
||||
Terminal to Beaver:
|
||||
|
||||
Function tool calls should not be sent in the first implementation. If tool messages appear, log
|
||||
that they were omitted.
|
||||
```json
|
||||
{
|
||||
"type": "connect",
|
||||
"peer_id": "device-001",
|
||||
"device_name": "desk-terminal",
|
||||
"capabilities": ["text"]
|
||||
}
|
||||
```
|
||||
|
||||
## per_room Session Lifecycle
|
||||
Required fields:
|
||||
|
||||
The adapter should derive a stable room key from the active LiveKit session or job context. If a
|
||||
room name/SID is not available, fall back to one adapter-local session.
|
||||
- `type`: must be `"connect"`.
|
||||
- `peer_id`: stable terminal identity. Reuse this value across reconnects.
|
||||
|
||||
For each room key:
|
||||
Recommended fields:
|
||||
|
||||
1. Open or reuse a gateway connection.
|
||||
2. Send the gateway `connect` handshake if needed.
|
||||
3. Create a Hermes session once.
|
||||
4. Reuse that Hermes session for all future turns from the same room.
|
||||
5. Close the gateway connection when the LiveKit LLM is closed.
|
||||
- `device_name`: human-readable terminal name.
|
||||
- `capabilities`: include `"text"`.
|
||||
|
||||
This lets Hermes maintain its own conversational state while LiveKit still keeps the visible
|
||||
conversation history.
|
||||
Optional fields:
|
||||
|
||||
## Gateway Event Mapping
|
||||
- `thread_id`: optional sub-session key. Omit it for the first pass.
|
||||
- `user_id`: optional user identity. Omit it unless the terminal already has a stable user id.
|
||||
|
||||
Map streaming text events to LiveKit chunks:
|
||||
Beaver to terminal:
|
||||
|
||||
- Gateway assistant text delta -> `llm.ChatChunk(delta=llm.ChoiceDelta(content=delta))`
|
||||
- Gateway final assistant message -> emit any remaining text not already streamed
|
||||
- Gateway usage metadata -> `llm.CompletionUsage` when token counts are available
|
||||
- Gateway tool/action events -> log at debug/info level in the first implementation
|
||||
- Gateway error event -> raise a LiveKit `APIError` or `APIConnectionError`
|
||||
- Gateway completion event -> finish the async iterator
|
||||
```json
|
||||
{
|
||||
"type": "connected",
|
||||
"channel_id": "terminal-dev",
|
||||
"session_id": "terminal-dev:local:device-001"
|
||||
}
|
||||
```
|
||||
|
||||
The implementation should make the event parser tolerant of protocol field-name differences by
|
||||
isolating event normalization in one helper function. Unknown event types should be logged and
|
||||
ignored unless they indicate failure.
|
||||
The terminal should store `session_id` for logging and diagnostics. It does not need to send `session_id` back in message frames.
|
||||
|
||||
## Error Handling
|
||||
## Message Frame
|
||||
|
||||
- Missing Hermes env vars should fail fast at startup when provider is `hermes_gateway`.
|
||||
- Gateway connect/session-create failures should raise connection errors.
|
||||
- A failed request should not discard the per-room session unless the gateway reports that the
|
||||
session is invalid or closed.
|
||||
- If the gateway connection closes mid-turn, reconnect once and retry only if no assistant text
|
||||
has been yielded yet.
|
||||
- If assistant text has already been yielded, fail the turn instead of replaying partial output.
|
||||
Terminal to Beaver:
|
||||
|
||||
## Testing
|
||||
```json
|
||||
{
|
||||
"type": "message",
|
||||
"message_id": "m-001",
|
||||
"text": "hello"
|
||||
}
|
||||
```
|
||||
|
||||
Add focused tests around the adapter:
|
||||
Required fields:
|
||||
|
||||
- Serializes simple system/user/assistant chat context.
|
||||
- Creates one gateway session and reuses it across two turns for the same room.
|
||||
- Converts text deltas into `llm.ChatChunk` content.
|
||||
- Handles final full-message events without duplicate text.
|
||||
- Raises on gateway error events.
|
||||
- Logs and skips unsupported image/tool content.
|
||||
- `type`: must be `"message"`.
|
||||
- `message_id`: unique id for this user message.
|
||||
- `text`: non-empty user text.
|
||||
|
||||
Add a small wiring test or import-level test for `CUSTOM_LLM_PROVIDER=hermes_gateway` if the
|
||||
custom module is testable without external services.
|
||||
Recommended `message_id` format:
|
||||
|
||||
## Rollout
|
||||
```text
|
||||
<peer_id>-<monotonic-counter>
|
||||
```
|
||||
|
||||
1. Implement the adapter behind `CUSTOM_LLM_PROVIDER=hermes_gateway`.
|
||||
2. Keep `openai` as the default provider.
|
||||
3. Run unit tests for the adapter and a syntax/type smoke check on `custom/custom_agent.py`.
|
||||
4. Test manually with a local gateway using `python custom/custom_agent.py console` or the
|
||||
existing LiveKit development mode.
|
||||
5. If vision payloads are unsupported by the deployed gateway, document that the first Hermes
|
||||
rollout is text-only for vision turns.
|
||||
Example:
|
||||
|
||||
```text
|
||||
device-001-000001
|
||||
device-001-000002
|
||||
```
|
||||
|
||||
The terminal should persist the counter if practical. If persistence is unavailable, generate a UUID or timestamp-based id. Reusing the same `message_id` tells Beaver to treat the frame as a duplicate.
|
||||
|
||||
Optional fields:
|
||||
|
||||
- `thread_id`: use only when the terminal intentionally wants a separate Beaver session.
|
||||
- `user_id`: use only when the terminal has a stable user id.
|
||||
|
||||
## Ack Frame
|
||||
|
||||
Beaver sends an ack after accepting or deduplicating the inbound message.
|
||||
|
||||
Accepted:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "ack",
|
||||
"message_id": "device-001-000001",
|
||||
"session_id": "terminal-dev:local:device-001",
|
||||
"accepted": true
|
||||
}
|
||||
```
|
||||
|
||||
Duplicate still processing:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "ack",
|
||||
"message_id": "device-001-000001",
|
||||
"session_id": "terminal-dev:local:device-001",
|
||||
"accepted": false,
|
||||
"duplicate": true,
|
||||
"pending": true
|
||||
}
|
||||
```
|
||||
|
||||
Duplicate already completed:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "ack",
|
||||
"message_id": "device-001-000001",
|
||||
"session_id": "terminal-dev:local:device-001",
|
||||
"accepted": false,
|
||||
"duplicate": true,
|
||||
"pending": false,
|
||||
"reply": "cached assistant reply"
|
||||
}
|
||||
```
|
||||
|
||||
Terminal behavior:
|
||||
|
||||
- If `accepted` is true, wait for the assistant `message`.
|
||||
- If `duplicate` and `reply` is present, display the cached reply.
|
||||
- If `duplicate` and `pending` is true, keep waiting on the socket.
|
||||
- If `error` is present, display or log the error.
|
||||
|
||||
## Assistant Message Frame
|
||||
|
||||
Beaver to terminal:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "message",
|
||||
"role": "assistant",
|
||||
"message_id": "device-001-000001",
|
||||
"run_id": "run-id",
|
||||
"text": "assistant reply",
|
||||
"finish_reason": "stop"
|
||||
}
|
||||
```
|
||||
|
||||
Fields:
|
||||
|
||||
- `type`: `"message"`.
|
||||
- `role`: `"assistant"`.
|
||||
- `message_id`: the user message id this response belongs to.
|
||||
- `run_id`: Beaver run id for diagnostics.
|
||||
- `text`: final assistant response.
|
||||
- `finish_reason`: usually `"stop"`, or `"error"` when the run failed.
|
||||
|
||||
Terminal behavior:
|
||||
|
||||
- Render or speak `text`.
|
||||
- Treat `finish_reason == "error"` as a failed turn.
|
||||
- Do not expect token-level streaming in this phase.
|
||||
|
||||
## Ping And Pong
|
||||
|
||||
Terminal to Beaver:
|
||||
|
||||
```json
|
||||
{"type": "ping"}
|
||||
```
|
||||
|
||||
Beaver to terminal:
|
||||
|
||||
```json
|
||||
{"type": "pong"}
|
||||
```
|
||||
|
||||
Recommended heartbeat interval:
|
||||
|
||||
```text
|
||||
30 seconds
|
||||
```
|
||||
|
||||
If no pong or other frame is received after a reasonable timeout, reconnect.
|
||||
|
||||
## Error Frame
|
||||
|
||||
Beaver may send:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "error",
|
||||
"error": "human readable error"
|
||||
}
|
||||
```
|
||||
|
||||
Terminal behavior:
|
||||
|
||||
- Log the error.
|
||||
- Keep the connection open unless the WebSocket closes.
|
||||
- If the error is for a user message, allow the user to retry with a new `message_id`.
|
||||
|
||||
Common first-pass errors:
|
||||
|
||||
- `connect` is required before `message`.
|
||||
- `peer_id` is required.
|
||||
- `message_id` is required.
|
||||
- `text` is required.
|
||||
- Unsupported websocket frame type.
|
||||
|
||||
## Terminal State Machine
|
||||
|
||||
Implement the terminal client as a small state machine.
|
||||
|
||||
```text
|
||||
DISCONNECTED
|
||||
-> connect websocket
|
||||
CONNECTING
|
||||
-> websocket open, send connect frame
|
||||
WAIT_CONNECTED
|
||||
-> receive connected
|
||||
READY
|
||||
-> send message frame
|
||||
WAIT_ACK
|
||||
-> receive ack
|
||||
WAIT_REPLY
|
||||
-> receive assistant message
|
||||
READY
|
||||
```
|
||||
|
||||
On WebSocket close or network failure, transition to `DISCONNECTED` and reconnect with backoff.
|
||||
|
||||
Recommended reconnect policy:
|
||||
|
||||
- Start at 1 second.
|
||||
- Double up to 30 seconds.
|
||||
- Reset backoff after a successful `connected` frame.
|
||||
|
||||
On reconnect, use the same `peer_id`.
|
||||
|
||||
## Terminal Implementation Requirements
|
||||
|
||||
The terminal-side code should provide:
|
||||
|
||||
- A configurable Beaver WebSocket URL.
|
||||
- A stable `peer_id`.
|
||||
- A configurable `device_name`.
|
||||
- A monotonic or otherwise unique `message_id` generator.
|
||||
- JSON encoding and decoding.
|
||||
- Connect frame on socket open.
|
||||
- Ping/pong heartbeat.
|
||||
- Reconnect with backoff.
|
||||
- A queue or guard so only one user text turn is in flight at a time for the first pass.
|
||||
- Logging for `session_id`, `message_id`, `run_id`, and errors.
|
||||
|
||||
The terminal-side code does not need:
|
||||
|
||||
- Multi-room session logic.
|
||||
- Hermes session management.
|
||||
- LiveKit `AgentSession`.
|
||||
- Audio chunking.
|
||||
- Tool calls.
|
||||
- OAuth or token refresh.
|
||||
|
||||
## Example Client Pseudocode
|
||||
|
||||
```python
|
||||
peer_id = load_or_create_peer_id()
|
||||
counter = load_counter()
|
||||
|
||||
async def run_terminal_client():
|
||||
while True:
|
||||
try:
|
||||
async with connect(BEAVER_WS_URL) as ws:
|
||||
await ws.send_json({
|
||||
"type": "connect",
|
||||
"peer_id": peer_id,
|
||||
"device_name": DEVICE_NAME,
|
||||
"capabilities": ["text"],
|
||||
})
|
||||
|
||||
connected = await ws.receive_json()
|
||||
assert connected["type"] == "connected"
|
||||
log("session_id", connected["session_id"])
|
||||
|
||||
await read_send_receive_loop(ws)
|
||||
except Exception as exc:
|
||||
log("websocket disconnected", exc)
|
||||
await sleep(next_backoff())
|
||||
|
||||
async def send_user_text(ws, text):
|
||||
global counter
|
||||
counter += 1
|
||||
save_counter(counter)
|
||||
message_id = f"{peer_id}-{counter:06d}"
|
||||
|
||||
await ws.send_json({
|
||||
"type": "message",
|
||||
"message_id": message_id,
|
||||
"text": text,
|
||||
})
|
||||
|
||||
while True:
|
||||
frame = await ws.receive_json()
|
||||
if frame["type"] == "ack" and frame.get("message_id") == message_id:
|
||||
if frame.get("reply"):
|
||||
return frame["reply"]
|
||||
continue
|
||||
if frame["type"] == "message" and frame.get("role") == "assistant":
|
||||
if frame.get("message_id") == message_id:
|
||||
return frame.get("text", "")
|
||||
if frame["type"] == "error":
|
||||
raise RuntimeError(frame.get("error", "unknown error"))
|
||||
```
|
||||
|
||||
Adapt the pseudocode to the terminal runtime language and WebSocket library.
|
||||
|
||||
## Manual Test With websocat
|
||||
|
||||
If `websocat` is available, a developer can manually test the protocol:
|
||||
|
||||
```bash
|
||||
websocat ws://127.0.0.1:8080/api/channels/terminal-dev/ws
|
||||
```
|
||||
|
||||
Then paste:
|
||||
|
||||
```json
|
||||
{"type":"connect","peer_id":"device-001","device_name":"desk-terminal","capabilities":["text"]}
|
||||
```
|
||||
|
||||
Expected response:
|
||||
|
||||
```json
|
||||
{"type":"connected","channel_id":"terminal-dev","session_id":"terminal-dev:local:device-001"}
|
||||
```
|
||||
|
||||
Then paste:
|
||||
|
||||
```json
|
||||
{"type":"message","message_id":"device-001-000001","text":"hello"}
|
||||
```
|
||||
|
||||
Expected responses:
|
||||
|
||||
```json
|
||||
{"type":"ack","message_id":"device-001-000001","session_id":"terminal-dev:local:device-001","accepted":true}
|
||||
```
|
||||
|
||||
Then, after Beaver finishes the run:
|
||||
|
||||
```json
|
||||
{"type":"message","role":"assistant","message_id":"device-001-000001","run_id":"...","text":"...","finish_reason":"stop"}
|
||||
```
|
||||
|
||||
## Acceptance Checklist For Terminal-Side Codex
|
||||
|
||||
- The terminal opens the configured Beaver WebSocket URL.
|
||||
- The terminal sends `connect` immediately after open.
|
||||
- The terminal receives and logs `connected.session_id`.
|
||||
- The terminal sends text using a unique `message_id`.
|
||||
- The terminal receives `ack`.
|
||||
- The terminal receives and displays assistant `message.text`.
|
||||
- The terminal handles `ping`/`pong`.
|
||||
- The terminal reconnects with the same `peer_id`.
|
||||
- The terminal does not use REST chat or `/ws/{session_id}`.
|
||||
- The terminal implementation remains text-only for the first pass.
|
||||
|
||||
When this checklist passes against Beaver, the first-stage device integration is accepted from the terminal side.
|
||||
|
||||
Reference in New Issue
Block a user