beaver_project/2026-06-01-hermes-gateway-llm-design.md

# Hermes Gateway LLM Design

Date: 2026-06-01

## Goal

Replace the OpenAI-compatible LLM call path in `custom/custom_agent.py` with a LiveKit LLM
adapter that talks to NousResearch Hermes Agent through the OpenClaw gateway protocol.

The integration must keep the existing custom agent behavior:

- Chinese room-locator and general assistant instructions
- Emotion prefix parsing with `<emotion=...>`
- Memory recall for room-locator queries
- Optional vision-frame attachment
- LiveKit ASR, TTS, VAD, turn handling, metrics, and interruption behavior

The Hermes session strategy is `per_room`: one LiveKit room should map to one Hermes gateway
session for the lifetime of that room.

## Non-Goals

- Do not replace LiveKit `AgentSession`, ASR, TTS, VAD, or room I/O.
- Do not move room-locator classification into Hermes Agent.
- Do not implement Hermes-side tools in the first pass.
- Do not require an OpenAI-compatible proxy in front of the gateway.

## Recommended Architecture

Add a new custom LiveKit LLM implementation in `custom/hermes_gateway.py`.

The adapter will implement the LiveKit `llm.LLM` interface and return a custom `LLMStream`.
The stream will own a single gateway request/response cycle while the LLM object owns the
per-room gateway session state.

`custom/custom_agent.py` will continue to call `selected_llm.chat(...)` through
`_run_selected_llm()`. That preserves the existing `llm_node()` pipeline and keeps Hermes
behind the same abstraction as OpenAI-compatible models.

## Components

### HermesGatewayLLM

Responsibilities:

- Store gateway configuration: URL, auth token, agent identifier, request timeout, and reconnect
  policy.
- Lazily create one Hermes gateway session per LiveKit room.
- Expose `model` as the configured Hermes agent/model identifier.
- Expose `provider` as `hermes-gateway`.
- Create `HermesGatewayLLMStream` from `chat(...)`.
- Close any persistent WebSocket/session resources in `aclose()`.

### HermesGatewayLLMStream

Responsibilities:

- Serialize LiveKit `ChatContext` into the gateway request payload.
- Send the latest turn to the per-room Hermes session.
- Consume gateway events until the turn completes or fails.
- Yield LiveKit `llm.ChatChunk` objects for assistant text deltas.
- Surface recoverable connection failures through the normal LiveKit LLM error path.

### custom_agent.py Wiring

Add env-driven provider selection:

- `CUSTOM_LLM_PROVIDER=openai` keeps the current behavior.
- `CUSTOM_LLM_PROVIDER=hermes_gateway` constructs `HermesGatewayLLM`.

New Hermes-specific env vars:

- `CUSTOM_HERMES_GATEWAY_URL`
- `CUSTOM_HERMES_API_KEY`
- `CUSTOM_HERMES_AGENT_ID`
- `CUSTOM_HERMES_SESSION_MODE=per_room`
- `CUSTOM_HERMES_REQUEST_TIMEOUT`
- `CUSTOM_HERMES_VERIFY_SSL`

When `CUSTOM_LLM_PROVIDER=hermes_gateway`, `base_llm`, `text_llm`, and `vision_llm` should all
point at the same Hermes adapter. Separate Hermes text/vision agent IDs are out of scope for this
design.

## Data Flow

1. User speaks or sends text.
2. Existing LiveKit/STT flow updates `ChatContext`.
3. `CustomAgent.llm_node()` selects `general` or `room_locator`.
4. Existing code injects the appropriate instructions and emotion-prefix requirement.
5. Existing code optionally augments the latest user message with memory context.
6. Existing code optionally attaches a fresh vision frame.
7. `_run_selected_llm()` calls `HermesGatewayLLM.chat(...)`.
8. The Hermes adapter sends the request to the per-room gateway session.
9. Gateway text events are converted to `llm.ChatChunk` deltas.
10. Existing emotion observation and TTS stripping continue unchanged.

## ChatContext Serialization

Text messages should be serialized first.

Supported LiveKit content:

- `str`: send as normal message content.
- instruction/config updates: preserve the final active instructions as the leading instruction
  message in the gateway payload. If the deployed gateway only accepts user/assistant messages,
  prepend the instruction text to the latest user message before sending.
- image content: attempt to send through the gateway image/multimodal field. If the deployed
  Hermes gateway rejects or ignores image content, log a warning and fall back to text-only
  generation for that turn.

Function tool calls should not be sent in the first implementation. If tool messages appear, log
that they were omitted.

## per_room Session Lifecycle

The adapter should derive a stable room key from the active LiveKit session or job context. If a
room name/SID is not available, fall back to one adapter-local session.

For each room key:

1. Open or reuse a gateway connection.
2. Send the gateway `connect` handshake if needed.
3. Create a Hermes session once.
4. Reuse that Hermes session for all future turns from the same room.
5. Close the gateway connection when the LiveKit LLM is closed.

This lets Hermes maintain its own conversational state while LiveKit still keeps the visible
conversation history.

## Gateway Event Mapping

Map streaming text events to LiveKit chunks:

- Gateway assistant text delta -> `llm.ChatChunk(delta=llm.ChoiceDelta(content=delta))`
- Gateway final assistant message -> emit any remaining text not already streamed
- Gateway usage metadata -> `llm.CompletionUsage` when token counts are available
- Gateway tool/action events -> log at debug/info level in the first implementation
- Gateway error event -> raise a LiveKit `APIError` or `APIConnectionError`
- Gateway completion event -> finish the async iterator

The implementation should make the event parser tolerant of protocol field-name differences by
isolating event normalization in one helper function. Unknown event types should be logged and
ignored unless they indicate failure.

## Error Handling

- Missing Hermes env vars should fail fast at startup when provider is `hermes_gateway`.
- Gateway connect/session-create failures should raise connection errors.
- A failed request should not discard the per-room session unless the gateway reports that the
  session is invalid or closed.
- If the gateway connection closes mid-turn, reconnect once and retry only if no assistant text
  has been yielded yet.
- If assistant text has already been yielded, fail the turn instead of replaying partial output.

## Testing

Add focused tests around the adapter:

- Serializes simple system/user/assistant chat context.
- Creates one gateway session and reuses it across two turns for the same room.
- Converts text deltas into `llm.ChatChunk` content.
- Handles final full-message events without duplicate text.
- Raises on gateway error events.
- Logs and skips unsupported image/tool content.

Add a small wiring test or import-level test for `CUSTOM_LLM_PROVIDER=hermes_gateway` if the
custom module is testable without external services.

## Rollout

1. Implement the adapter behind `CUSTOM_LLM_PROVIDER=hermes_gateway`.
2. Keep `openai` as the default provider.
3. Run unit tests for the adapter and a syntax/type smoke check on `custom/custom_agent.py`.
4. Test manually with a local gateway using `python custom/custom_agent.py console` or the
   existing LiveKit development mode.
5. If vision payloads are unsupported by the deployed gateway, document that the first Hermes
   rollout is text-only for vision turns.