docs: design terminal websocket channel

This commit is contained in:
2026-06-01 16:41:19 +08:00
parent 826db8ec2e
commit 6a6ddc21c0

View File

@ -0,0 +1,279 @@
# Terminal WebSocket Channel Design
Date: 2026-06-01
## Goal
Add a text-only WebSocket channel adapter so a small terminal device can connect to Beaver and exchange messages through the channel runtime.
This is a first-stage acceptance path for proving Beaver can talk to the terminal device. The terminal must enter through `ChannelRuntime` and `MessageBus`; it must not use the existing Web UI `/ws/{session_id}` direct-chat path.
## Non-Goals
- Do not implement audio, camera, screen, image, or multimodal payloads.
- Do not stream token deltas to the terminal in this phase.
- Do not add AuthZ or device registration in this phase.
- Do not implement the Hermes LiveKit LLM adapter in this phase.
- Do not route terminal messages directly to `AgentService`.
## Recommended Architecture
Add a channel-native WebSocket adapter named `TerminalWebSocketAdapter`.
The Web backend exposes:
```text
/api/channels/{channel_id}/ws
```
The route resolves the configured channel adapter from `ChannelRuntime` and delegates the accepted WebSocket to the adapter. The adapter owns terminal connection state, normalizes incoming frames into `InboundMessage`, and receives `OutboundMessage` objects through `ChannelManager.dispatch_outbound()`.
The path remains bus-first:
```text
terminal websocket
-> TerminalWebSocketAdapter
-> ChannelRuntime.accept_inbound()
-> MessageBus.inbound
-> ChannelRuntime bridge
-> AgentService.handle_inbound_message()
-> MessageBus.outbound
-> ChannelManager.dispatch_outbound()
-> TerminalWebSocketAdapter.send()
-> terminal websocket
```
## Channel Configuration
The terminal channel uses the existing `BeaverConfig.channels` map.
Example:
```json
{
"channels": {
"terminal-dev": {
"enabled": true,
"kind": "terminal",
"mode": "websocket",
"accountId": "local",
"displayName": "Terminal Dev",
"config": {
"heartbeatSeconds": 30,
"maxMessageChars": 20000
}
}
}
}
```
`kind` is the platform family. `mode` is the transport mode. The adapter factory must instantiate `TerminalWebSocketAdapter` when `kind == "terminal"` and `mode == "websocket"`.
## Protocol
The protocol is JSON over WebSocket. All payloads are text-only.
The terminal starts with a connect frame:
```json
{
"type": "connect",
"peer_id": "device-001",
"device_name": "desk-terminal",
"capabilities": ["text"]
}
```
Beaver replies:
```json
{
"type": "connected",
"channel_id": "terminal-dev",
"session_id": "terminal-dev:local:device-001"
}
```
The terminal sends user text:
```json
{
"type": "message",
"message_id": "m-001",
"text": "你好"
}
```
Beaver acknowledges accepted inbound:
```json
{
"type": "ack",
"message_id": "m-001",
"session_id": "terminal-dev:local:device-001",
"accepted": true
}
```
Beaver sends the final assistant response:
```json
{
"type": "message",
"role": "assistant",
"message_id": "m-001",
"run_id": "run-id",
"text": "你好,我在。",
"finish_reason": "stop"
}
```
Ping/pong frames are supported:
```json
{"type": "ping"}
{"type": "pong"}
```
Unsupported frame types return an error frame and keep the connection open:
```json
{"type": "error", "error": "Unsupported websocket frame type: example"}
```
## Identity And Session Mapping
The adapter builds a `ChannelIdentity` from the connect and message frames:
- `channel_id`: path/config channel id, such as `terminal-dev`
- `kind`: `terminal`
- `account_id`: channel config account id, such as `local`
- `peer_id`: terminal `peer_id`
- `peer_type`: `terminal`
- `message_id`: message frame `message_id`
- `thread_id`: optional message or connect frame field
- `user_id`: optional message or connect frame field
The session id stays aligned with channel runtime v1:
```text
<channel_id>:<account_id>:<peer_id>[:<thread_id>]
```
For the first terminal rollout, a terminal connection is treated as one active peer. A reconnect with the same `peer_id` reuses the same session id.
## Delivery Semantics
Inbound messages are accepted through `ChannelRuntime.accept_inbound()`.
If dedupe sees a duplicate message id:
- return an ack with `duplicate: true`
- include cached `reply` when the prior run is done
- include `pending: true` when the prior run is still processing
- do not publish a second inbound message
Outbound delivery is connection-bound. `TerminalWebSocketAdapter.send()` looks up the active connection for the outbound session or peer. If found, it sends the final assistant message. If no connection is available, it marks the outbound message as unclaimed so runtime records `outbound_unclaimed`.
No retry queue is required in this phase.
## Runtime Status And Events
`/api/status` and `/api/channels` include terminal channels with:
- `channel_id`
- `kind`
- `mode`
- `display_name`
- `enabled`
- `state`
- `account_id`
- `last_event_at`
- `websocket_url`
- `capabilities`, including `receive_text`, `send_text`, and `persistent_connection`
- `connected_peers`
Channel events should record:
- `adapter_started`
- `terminal_connected`
- `terminal_disconnected`
- `inbound_accepted`
- `inbound_duplicate`
- `direct_run_started`
- `direct_run_finished`
- `outbound_delivered`
- `outbound_unclaimed`
- `adapter_stopped`
Do not store raw terminal payloads or full message text in the event log. Existing text preview behavior is enough.
## Nginx And Deployment
The existing `/api/channels/` nginx location must support WebSocket upgrade because terminal WebSockets live under that prefix.
The location should include:
```nginx
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_read_timeout 3600;
proxy_send_timeout 3600;
```
The 1800 second timeout used by synchronous webhooks can stay, but WebSocket upgrade headers are required for terminal devices.
## Error Handling
Before connect:
- only `connect` and `ping` are accepted
- `message` returns an error requiring connect first
On connect:
- missing `peer_id` closes or rejects with an error frame
- unsupported capabilities are ignored for now as long as text is available
On message:
- missing `message_id` returns an error
- missing or blank `text` returns an error
- oversized text returns an error based on `max_message_chars`
On disconnect:
- remove the active connection
- record `terminal_disconnected`
- do not cancel an already running Beaver direct run
If the run completes after disconnect, outbound is recorded as `outbound_unclaimed`.
## Testing
Add focused backend tests:
- WebSocket connect returns `connected` with stable session id.
- Message frame publishes through runtime and returns ack plus assistant message.
- Duplicate message id does not publish a second inbound and returns duplicate status.
- Disconnect before outbound records `outbound_unclaimed`.
- Unknown frame type returns an error and keeps the connection alive.
- Channel status exposes `websocket_url` and connected peer count.
- Config loader accepts `kind=terminal`, `mode=websocket` through existing channel config.
Run the existing backend unit suite and frontend type/test checks after implementation.
## Acceptance Criteria
The first-stage acceptance is complete when a small terminal can:
1. Connect to `/api/channels/terminal-dev/ws`.
2. Send a `connect` frame with a stable `peer_id`.
3. Send a text `message` frame.
4. Receive an ack.
5. Receive the final assistant text response from Beaver.
6. Reconnect with the same `peer_id` and keep the same Beaver session id.
7. Show connection and message events in Beaver channel status/events.
This validates the Beaver-to-terminal path through the new channel runtime without introducing AuthZ, multimodal payloads, or Hermes LiveKit LLM work.