Files

steven_li 826db8ec2e ```

feat(llm): 添加 Hermes Gateway LLM 设计文档
```

2026-06-01 16:05:15 +08:00

7.1 KiB

Raw Blame History

Hermes Gateway LLM Design

Date: 2026-06-01

Goal

Replace the OpenAI-compatible LLM call path in custom/custom_agent.py with a LiveKit LLM adapter that talks to NousResearch Hermes Agent through the OpenClaw gateway protocol.

The integration must keep the existing custom agent behavior:

Chinese room-locator and general assistant instructions
Emotion prefix parsing with <emotion=...>
Memory recall for room-locator queries
Optional vision-frame attachment
LiveKit ASR, TTS, VAD, turn handling, metrics, and interruption behavior

The Hermes session strategy is per_room: one LiveKit room should map to one Hermes gateway session for the lifetime of that room.

Non-Goals

Do not replace LiveKit AgentSession, ASR, TTS, VAD, or room I/O.
Do not move room-locator classification into Hermes Agent.
Do not implement Hermes-side tools in the first pass.
Do not require an OpenAI-compatible proxy in front of the gateway.

Recommended Architecture

Add a new custom LiveKit LLM implementation in custom/hermes_gateway.py.

The adapter will implement the LiveKit llm.LLM interface and return a custom LLMStream. The stream will own a single gateway request/response cycle while the LLM object owns the per-room gateway session state.

custom/custom_agent.py will continue to call selected_llm.chat(...) through _run_selected_llm(). That preserves the existing llm_node() pipeline and keeps Hermes behind the same abstraction as OpenAI-compatible models.

Components

HermesGatewayLLM

Responsibilities:

Store gateway configuration: URL, auth token, agent identifier, request timeout, and reconnect policy.
Lazily create one Hermes gateway session per LiveKit room.
Expose model as the configured Hermes agent/model identifier.
Expose provider as hermes-gateway.
Create HermesGatewayLLMStream from chat(...).
Close any persistent WebSocket/session resources in aclose().

HermesGatewayLLMStream

Responsibilities:

Serialize LiveKit ChatContext into the gateway request payload.
Send the latest turn to the per-room Hermes session.
Consume gateway events until the turn completes or fails.
Yield LiveKit llm.ChatChunk objects for assistant text deltas.
Surface recoverable connection failures through the normal LiveKit LLM error path.

custom_agent.py Wiring

Add env-driven provider selection:

CUSTOM_LLM_PROVIDER=openai keeps the current behavior.
CUSTOM_LLM_PROVIDER=hermes_gateway constructs HermesGatewayLLM.

New Hermes-specific env vars:

CUSTOM_HERMES_GATEWAY_URL
CUSTOM_HERMES_API_KEY
CUSTOM_HERMES_AGENT_ID
CUSTOM_HERMES_SESSION_MODE=per_room
CUSTOM_HERMES_REQUEST_TIMEOUT
CUSTOM_HERMES_VERIFY_SSL

When CUSTOM_LLM_PROVIDER=hermes_gateway, base_llm, text_llm, and vision_llm should all point at the same Hermes adapter. Separate Hermes text/vision agent IDs are out of scope for this design.

Data Flow

User speaks or sends text.
Existing LiveKit/STT flow updates ChatContext.
CustomAgent.llm_node() selects general or room_locator.
Existing code injects the appropriate instructions and emotion-prefix requirement.
Existing code optionally augments the latest user message with memory context.
Existing code optionally attaches a fresh vision frame.
_run_selected_llm() calls HermesGatewayLLM.chat(...).
The Hermes adapter sends the request to the per-room gateway session.
Gateway text events are converted to llm.ChatChunk deltas.
Existing emotion observation and TTS stripping continue unchanged.

ChatContext Serialization

Text messages should be serialized first.

Supported LiveKit content:

str: send as normal message content.
instruction/config updates: preserve the final active instructions as the leading instruction message in the gateway payload. If the deployed gateway only accepts user/assistant messages, prepend the instruction text to the latest user message before sending.
image content: attempt to send through the gateway image/multimodal field. If the deployed Hermes gateway rejects or ignores image content, log a warning and fall back to text-only generation for that turn.

Function tool calls should not be sent in the first implementation. If tool messages appear, log that they were omitted.

per_room Session Lifecycle

The adapter should derive a stable room key from the active LiveKit session or job context. If a room name/SID is not available, fall back to one adapter-local session.

For each room key:

Open or reuse a gateway connection.
Send the gateway connect handshake if needed.
Create a Hermes session once.
Reuse that Hermes session for all future turns from the same room.
Close the gateway connection when the LiveKit LLM is closed.

This lets Hermes maintain its own conversational state while LiveKit still keeps the visible conversation history.

Gateway Event Mapping

Map streaming text events to LiveKit chunks:

Gateway assistant text delta -> llm.ChatChunk(delta=llm.ChoiceDelta(content=delta))
Gateway final assistant message -> emit any remaining text not already streamed
Gateway usage metadata -> llm.CompletionUsage when token counts are available
Gateway tool/action events -> log at debug/info level in the first implementation
Gateway error event -> raise a LiveKit APIError or APIConnectionError
Gateway completion event -> finish the async iterator

The implementation should make the event parser tolerant of protocol field-name differences by isolating event normalization in one helper function. Unknown event types should be logged and ignored unless they indicate failure.

Error Handling

Missing Hermes env vars should fail fast at startup when provider is hermes_gateway.
Gateway connect/session-create failures should raise connection errors.
A failed request should not discard the per-room session unless the gateway reports that the session is invalid or closed.
If the gateway connection closes mid-turn, reconnect once and retry only if no assistant text has been yielded yet.
If assistant text has already been yielded, fail the turn instead of replaying partial output.

Testing

Add focused tests around the adapter:

Serializes simple system/user/assistant chat context.
Creates one gateway session and reuses it across two turns for the same room.
Converts text deltas into llm.ChatChunk content.
Handles final full-message events without duplicate text.
Raises on gateway error events.
Logs and skips unsupported image/tool content.

Add a small wiring test or import-level test for CUSTOM_LLM_PROVIDER=hermes_gateway if the custom module is testable without external services.

Rollout

Implement the adapter behind CUSTOM_LLM_PROVIDER=hermes_gateway.
Keep openai as the default provider.
Run unit tests for the adapter and a syntax/type smoke check on custom/custom_agent.py.
Test manually with a local gateway using python custom/custom_agent.py console or the existing LiveKit development mode.
If vision payloads are unsupported by the deployed gateway, document that the first Hermes rollout is text-only for vision turns.

7.1 KiB Raw Blame History