Files
beaver_project/2026-06-01-hermes-gateway-llm-design.md

459 lines
11 KiB
Markdown

# Beaver Terminal WebSocket Integration Guide
Date: 2026-06-01
Audience: the small-terminal-side Codex agent that will modify terminal firmware or terminal app code.
## Goal
Connect the small terminal device to Beaver through a text-only WebSocket channel.
The first acceptance target is simple:
1. The terminal opens a WebSocket connection to Beaver.
2. The terminal sends a `connect` frame with a stable `peer_id`.
3. The terminal sends one text `message` frame.
4. The terminal receives an `ack`.
5. The terminal receives the final assistant text response from Beaver.
6. The terminal can reconnect with the same `peer_id` and keep the same Beaver session.
This document replaces the earlier Hermes LiveKit LLM adapter design for the terminal-side work. Do not implement a LiveKit LLM adapter from this document.
## Non-Goals
- Do not implement audio streaming.
- Do not implement camera, screen, image, or multimodal frames.
- Do not implement token streaming.
- Do not implement terminal-side tools.
- Do not implement AuthZ, device registration, OAuth, or pairing in the first pass.
- Do not call Beaver REST chat endpoints or the existing Web UI `/ws/{session_id}` endpoint.
- Do not build an OpenAI-compatible proxy.
- Do not implement Hermes Agent or LiveKit changes on the terminal side.
## Beaver Endpoint
The terminal connects to:
```text
ws://<beaver-host>/api/channels/<channel_id>/ws
```
For local development through the Beaver app instance nginx port:
```text
ws://127.0.0.1:8080/api/channels/terminal-dev/ws
```
For direct backend development without nginx:
```text
ws://127.0.0.1:18080/api/channels/terminal-dev/ws
```
Use `wss://` when Beaver is deployed behind TLS.
The expected first channel id is:
```text
terminal-dev
```
The terminal implementation should make the URL configurable, for example:
```text
BEAVER_WS_URL=ws://127.0.0.1:8080/api/channels/terminal-dev/ws
TERMINAL_PEER_ID=device-001
TERMINAL_DEVICE_NAME=desk-terminal
```
## Protocol Overview
The transport is JSON over WebSocket.
All frames are UTF-8 JSON objects. The terminal should ignore unknown fields. Beaver will ignore unknown fields unless the frame type is invalid.
The protocol is request/reply oriented in this phase. Beaver sends only final assistant messages, not token deltas.
Required frame flow:
```text
terminal -> Beaver: connect
Beaver -> terminal: connected
terminal -> Beaver: message
Beaver -> terminal: ack
Beaver -> terminal: message
```
Optional heartbeat:
```text
terminal -> Beaver: ping
Beaver -> terminal: pong
```
## Connect Frame
The terminal must send `connect` immediately after the WebSocket opens.
Terminal to Beaver:
```json
{
"type": "connect",
"peer_id": "device-001",
"device_name": "desk-terminal",
"capabilities": ["text"]
}
```
Required fields:
- `type`: must be `"connect"`.
- `peer_id`: stable terminal identity. Reuse this value across reconnects.
Recommended fields:
- `device_name`: human-readable terminal name.
- `capabilities`: include `"text"`.
Optional fields:
- `thread_id`: optional sub-session key. Omit it for the first pass.
- `user_id`: optional user identity. Omit it unless the terminal already has a stable user id.
Beaver to terminal:
```json
{
"type": "connected",
"channel_id": "terminal-dev",
"session_id": "terminal-dev:local:device-001"
}
```
The terminal should store `session_id` for logging and diagnostics. It does not need to send `session_id` back in message frames.
## Message Frame
Terminal to Beaver:
```json
{
"type": "message",
"message_id": "m-001",
"text": "hello"
}
```
Required fields:
- `type`: must be `"message"`.
- `message_id`: unique id for this user message.
- `text`: non-empty user text.
Recommended `message_id` format:
```text
<peer_id>-<monotonic-counter>
```
Example:
```text
device-001-000001
device-001-000002
```
The terminal should persist the counter if practical. If persistence is unavailable, generate a UUID or timestamp-based id. Reusing the same `message_id` tells Beaver to treat the frame as a duplicate.
Optional fields:
- `thread_id`: use only when the terminal intentionally wants a separate Beaver session.
- `user_id`: use only when the terminal has a stable user id.
## Ack Frame
Beaver sends an ack after accepting or deduplicating the inbound message.
Accepted:
```json
{
"type": "ack",
"message_id": "device-001-000001",
"session_id": "terminal-dev:local:device-001",
"accepted": true
}
```
Duplicate still processing:
```json
{
"type": "ack",
"message_id": "device-001-000001",
"session_id": "terminal-dev:local:device-001",
"accepted": false,
"duplicate": true,
"pending": true
}
```
Duplicate already completed:
```json
{
"type": "ack",
"message_id": "device-001-000001",
"session_id": "terminal-dev:local:device-001",
"accepted": false,
"duplicate": true,
"pending": false,
"reply": "cached assistant reply"
}
```
Terminal behavior:
- If `accepted` is true, wait for the assistant `message`.
- If `duplicate` and `reply` is present, display the cached reply.
- If `duplicate` and `pending` is true, keep waiting on the socket.
- If `error` is present, display or log the error.
## Assistant Message Frame
Beaver to terminal:
```json
{
"type": "message",
"role": "assistant",
"message_id": "device-001-000001",
"run_id": "run-id",
"text": "assistant reply",
"finish_reason": "stop"
}
```
Fields:
- `type`: `"message"`.
- `role`: `"assistant"`.
- `message_id`: the user message id this response belongs to.
- `run_id`: Beaver run id for diagnostics.
- `text`: final assistant response.
- `finish_reason`: usually `"stop"`, or `"error"` when the run failed.
Terminal behavior:
- Render or speak `text`.
- Treat `finish_reason == "error"` as a failed turn.
- Do not expect token-level streaming in this phase.
## Ping And Pong
Terminal to Beaver:
```json
{"type": "ping"}
```
Beaver to terminal:
```json
{"type": "pong"}
```
Recommended heartbeat interval:
```text
30 seconds
```
If no pong or other frame is received after a reasonable timeout, reconnect.
## Error Frame
Beaver may send:
```json
{
"type": "error",
"error": "human readable error"
}
```
Terminal behavior:
- Log the error.
- Keep the connection open unless the WebSocket closes.
- If the error is for a user message, allow the user to retry with a new `message_id`.
Common first-pass errors:
- `connect` is required before `message`.
- `peer_id` is required.
- `message_id` is required.
- `text` is required.
- Unsupported websocket frame type.
## Terminal State Machine
Implement the terminal client as a small state machine.
```text
DISCONNECTED
-> connect websocket
CONNECTING
-> websocket open, send connect frame
WAIT_CONNECTED
-> receive connected
READY
-> send message frame
WAIT_ACK
-> receive ack
WAIT_REPLY
-> receive assistant message
READY
```
On WebSocket close or network failure, transition to `DISCONNECTED` and reconnect with backoff.
Recommended reconnect policy:
- Start at 1 second.
- Double up to 30 seconds.
- Reset backoff after a successful `connected` frame.
On reconnect, use the same `peer_id`.
## Terminal Implementation Requirements
The terminal-side code should provide:
- A configurable Beaver WebSocket URL.
- A stable `peer_id`.
- A configurable `device_name`.
- A monotonic or otherwise unique `message_id` generator.
- JSON encoding and decoding.
- Connect frame on socket open.
- Ping/pong heartbeat.
- Reconnect with backoff.
- A queue or guard so only one user text turn is in flight at a time for the first pass.
- Logging for `session_id`, `message_id`, `run_id`, and errors.
The terminal-side code does not need:
- Multi-room session logic.
- Hermes session management.
- LiveKit `AgentSession`.
- Audio chunking.
- Tool calls.
- OAuth or token refresh.
## Example Client Pseudocode
```python
peer_id = load_or_create_peer_id()
counter = load_counter()
async def run_terminal_client():
while True:
try:
async with connect(BEAVER_WS_URL) as ws:
await ws.send_json({
"type": "connect",
"peer_id": peer_id,
"device_name": DEVICE_NAME,
"capabilities": ["text"],
})
connected = await ws.receive_json()
assert connected["type"] == "connected"
log("session_id", connected["session_id"])
await read_send_receive_loop(ws)
except Exception as exc:
log("websocket disconnected", exc)
await sleep(next_backoff())
async def send_user_text(ws, text):
global counter
counter += 1
save_counter(counter)
message_id = f"{peer_id}-{counter:06d}"
await ws.send_json({
"type": "message",
"message_id": message_id,
"text": text,
})
while True:
frame = await ws.receive_json()
if frame["type"] == "ack" and frame.get("message_id") == message_id:
if frame.get("reply"):
return frame["reply"]
continue
if frame["type"] == "message" and frame.get("role") == "assistant":
if frame.get("message_id") == message_id:
return frame.get("text", "")
if frame["type"] == "error":
raise RuntimeError(frame.get("error", "unknown error"))
```
Adapt the pseudocode to the terminal runtime language and WebSocket library.
## Manual Test With websocat
If `websocat` is available, a developer can manually test the protocol:
```bash
websocat ws://127.0.0.1:8080/api/channels/terminal-dev/ws
```
Then paste:
```json
{"type":"connect","peer_id":"device-001","device_name":"desk-terminal","capabilities":["text"]}
```
Expected response:
```json
{"type":"connected","channel_id":"terminal-dev","session_id":"terminal-dev:local:device-001"}
```
Then paste:
```json
{"type":"message","message_id":"device-001-000001","text":"hello"}
```
Expected responses:
```json
{"type":"ack","message_id":"device-001-000001","session_id":"terminal-dev:local:device-001","accepted":true}
```
Then, after Beaver finishes the run:
```json
{"type":"message","role":"assistant","message_id":"device-001-000001","run_id":"...","text":"...","finish_reason":"stop"}
```
## Acceptance Checklist For Terminal-Side Codex
- The terminal opens the configured Beaver WebSocket URL.
- The terminal sends `connect` immediately after open.
- The terminal receives and logs `connected.session_id`.
- The terminal sends text using a unique `message_id`.
- The terminal receives `ack`.
- The terminal receives and displays assistant `message.text`.
- The terminal handles `ping`/`pong`.
- The terminal reconnects with the same `peer_id`.
- The terminal does not use REST chat or `/ws/{session_id}`.
- The terminal implementation remains text-only for the first pass.
When this checklist passes against Beaver, the first-stage device integration is accepted from the terminal side.