7.1 KiB
Hermes Gateway LLM Design
Date: 2026-06-01
Goal
Replace the OpenAI-compatible LLM call path in custom/custom_agent.py with a LiveKit LLM
adapter that talks to NousResearch Hermes Agent through the OpenClaw gateway protocol.
The integration must keep the existing custom agent behavior:
- Chinese room-locator and general assistant instructions
- Emotion prefix parsing with
<emotion=...> - Memory recall for room-locator queries
- Optional vision-frame attachment
- LiveKit ASR, TTS, VAD, turn handling, metrics, and interruption behavior
The Hermes session strategy is per_room: one LiveKit room should map to one Hermes gateway
session for the lifetime of that room.
Non-Goals
- Do not replace LiveKit
AgentSession, ASR, TTS, VAD, or room I/O. - Do not move room-locator classification into Hermes Agent.
- Do not implement Hermes-side tools in the first pass.
- Do not require an OpenAI-compatible proxy in front of the gateway.
Recommended Architecture
Add a new custom LiveKit LLM implementation in custom/hermes_gateway.py.
The adapter will implement the LiveKit llm.LLM interface and return a custom LLMStream.
The stream will own a single gateway request/response cycle while the LLM object owns the
per-room gateway session state.
custom/custom_agent.py will continue to call selected_llm.chat(...) through
_run_selected_llm(). That preserves the existing llm_node() pipeline and keeps Hermes
behind the same abstraction as OpenAI-compatible models.
Components
HermesGatewayLLM
Responsibilities:
- Store gateway configuration: URL, auth token, agent identifier, request timeout, and reconnect policy.
- Lazily create one Hermes gateway session per LiveKit room.
- Expose
modelas the configured Hermes agent/model identifier. - Expose
providerashermes-gateway. - Create
HermesGatewayLLMStreamfromchat(...). - Close any persistent WebSocket/session resources in
aclose().
HermesGatewayLLMStream
Responsibilities:
- Serialize LiveKit
ChatContextinto the gateway request payload. - Send the latest turn to the per-room Hermes session.
- Consume gateway events until the turn completes or fails.
- Yield LiveKit
llm.ChatChunkobjects for assistant text deltas. - Surface recoverable connection failures through the normal LiveKit LLM error path.
custom_agent.py Wiring
Add env-driven provider selection:
CUSTOM_LLM_PROVIDER=openaikeeps the current behavior.CUSTOM_LLM_PROVIDER=hermes_gatewayconstructsHermesGatewayLLM.
New Hermes-specific env vars:
CUSTOM_HERMES_GATEWAY_URLCUSTOM_HERMES_API_KEYCUSTOM_HERMES_AGENT_IDCUSTOM_HERMES_SESSION_MODE=per_roomCUSTOM_HERMES_REQUEST_TIMEOUTCUSTOM_HERMES_VERIFY_SSL
When CUSTOM_LLM_PROVIDER=hermes_gateway, base_llm, text_llm, and vision_llm should all
point at the same Hermes adapter. Separate Hermes text/vision agent IDs are out of scope for this
design.
Data Flow
- User speaks or sends text.
- Existing LiveKit/STT flow updates
ChatContext. CustomAgent.llm_node()selectsgeneralorroom_locator.- Existing code injects the appropriate instructions and emotion-prefix requirement.
- Existing code optionally augments the latest user message with memory context.
- Existing code optionally attaches a fresh vision frame.
_run_selected_llm()callsHermesGatewayLLM.chat(...).- The Hermes adapter sends the request to the per-room gateway session.
- Gateway text events are converted to
llm.ChatChunkdeltas. - Existing emotion observation and TTS stripping continue unchanged.
ChatContext Serialization
Text messages should be serialized first.
Supported LiveKit content:
str: send as normal message content.- instruction/config updates: preserve the final active instructions as the leading instruction message in the gateway payload. If the deployed gateway only accepts user/assistant messages, prepend the instruction text to the latest user message before sending.
- image content: attempt to send through the gateway image/multimodal field. If the deployed Hermes gateway rejects or ignores image content, log a warning and fall back to text-only generation for that turn.
Function tool calls should not be sent in the first implementation. If tool messages appear, log that they were omitted.
per_room Session Lifecycle
The adapter should derive a stable room key from the active LiveKit session or job context. If a room name/SID is not available, fall back to one adapter-local session.
For each room key:
- Open or reuse a gateway connection.
- Send the gateway
connecthandshake if needed. - Create a Hermes session once.
- Reuse that Hermes session for all future turns from the same room.
- Close the gateway connection when the LiveKit LLM is closed.
This lets Hermes maintain its own conversational state while LiveKit still keeps the visible conversation history.
Gateway Event Mapping
Map streaming text events to LiveKit chunks:
- Gateway assistant text delta ->
llm.ChatChunk(delta=llm.ChoiceDelta(content=delta)) - Gateway final assistant message -> emit any remaining text not already streamed
- Gateway usage metadata ->
llm.CompletionUsagewhen token counts are available - Gateway tool/action events -> log at debug/info level in the first implementation
- Gateway error event -> raise a LiveKit
APIErrororAPIConnectionError - Gateway completion event -> finish the async iterator
The implementation should make the event parser tolerant of protocol field-name differences by isolating event normalization in one helper function. Unknown event types should be logged and ignored unless they indicate failure.
Error Handling
- Missing Hermes env vars should fail fast at startup when provider is
hermes_gateway. - Gateway connect/session-create failures should raise connection errors.
- A failed request should not discard the per-room session unless the gateway reports that the session is invalid or closed.
- If the gateway connection closes mid-turn, reconnect once and retry only if no assistant text has been yielded yet.
- If assistant text has already been yielded, fail the turn instead of replaying partial output.
Testing
Add focused tests around the adapter:
- Serializes simple system/user/assistant chat context.
- Creates one gateway session and reuses it across two turns for the same room.
- Converts text deltas into
llm.ChatChunkcontent. - Handles final full-message events without duplicate text.
- Raises on gateway error events.
- Logs and skips unsupported image/tool content.
Add a small wiring test or import-level test for CUSTOM_LLM_PROVIDER=hermes_gateway if the
custom module is testable without external services.
Rollout
- Implement the adapter behind
CUSTOM_LLM_PROVIDER=hermes_gateway. - Keep
openaias the default provider. - Run unit tests for the adapter and a syntax/type smoke check on
custom/custom_agent.py. - Test manually with a local gateway using
python custom/custom_agent.py consoleor the existing LiveKit development mode. - If vision payloads are unsupported by the deployed gateway, document that the first Hermes rollout is text-only for vision turns.