178 lines
7.1 KiB
Markdown
178 lines
7.1 KiB
Markdown
# Hermes Gateway LLM Design
|
|
|
|
Date: 2026-06-01
|
|
|
|
## Goal
|
|
|
|
Replace the OpenAI-compatible LLM call path in `custom/custom_agent.py` with a LiveKit LLM
|
|
adapter that talks to NousResearch Hermes Agent through the OpenClaw gateway protocol.
|
|
|
|
The integration must keep the existing custom agent behavior:
|
|
|
|
- Chinese room-locator and general assistant instructions
|
|
- Emotion prefix parsing with `<emotion=...>`
|
|
- Memory recall for room-locator queries
|
|
- Optional vision-frame attachment
|
|
- LiveKit ASR, TTS, VAD, turn handling, metrics, and interruption behavior
|
|
|
|
The Hermes session strategy is `per_room`: one LiveKit room should map to one Hermes gateway
|
|
session for the lifetime of that room.
|
|
|
|
## Non-Goals
|
|
|
|
- Do not replace LiveKit `AgentSession`, ASR, TTS, VAD, or room I/O.
|
|
- Do not move room-locator classification into Hermes Agent.
|
|
- Do not implement Hermes-side tools in the first pass.
|
|
- Do not require an OpenAI-compatible proxy in front of the gateway.
|
|
|
|
## Recommended Architecture
|
|
|
|
Add a new custom LiveKit LLM implementation in `custom/hermes_gateway.py`.
|
|
|
|
The adapter will implement the LiveKit `llm.LLM` interface and return a custom `LLMStream`.
|
|
The stream will own a single gateway request/response cycle while the LLM object owns the
|
|
per-room gateway session state.
|
|
|
|
`custom/custom_agent.py` will continue to call `selected_llm.chat(...)` through
|
|
`_run_selected_llm()`. That preserves the existing `llm_node()` pipeline and keeps Hermes
|
|
behind the same abstraction as OpenAI-compatible models.
|
|
|
|
## Components
|
|
|
|
### HermesGatewayLLM
|
|
|
|
Responsibilities:
|
|
|
|
- Store gateway configuration: URL, auth token, agent identifier, request timeout, and reconnect
|
|
policy.
|
|
- Lazily create one Hermes gateway session per LiveKit room.
|
|
- Expose `model` as the configured Hermes agent/model identifier.
|
|
- Expose `provider` as `hermes-gateway`.
|
|
- Create `HermesGatewayLLMStream` from `chat(...)`.
|
|
- Close any persistent WebSocket/session resources in `aclose()`.
|
|
|
|
### HermesGatewayLLMStream
|
|
|
|
Responsibilities:
|
|
|
|
- Serialize LiveKit `ChatContext` into the gateway request payload.
|
|
- Send the latest turn to the per-room Hermes session.
|
|
- Consume gateway events until the turn completes or fails.
|
|
- Yield LiveKit `llm.ChatChunk` objects for assistant text deltas.
|
|
- Surface recoverable connection failures through the normal LiveKit LLM error path.
|
|
|
|
### custom_agent.py Wiring
|
|
|
|
Add env-driven provider selection:
|
|
|
|
- `CUSTOM_LLM_PROVIDER=openai` keeps the current behavior.
|
|
- `CUSTOM_LLM_PROVIDER=hermes_gateway` constructs `HermesGatewayLLM`.
|
|
|
|
New Hermes-specific env vars:
|
|
|
|
- `CUSTOM_HERMES_GATEWAY_URL`
|
|
- `CUSTOM_HERMES_API_KEY`
|
|
- `CUSTOM_HERMES_AGENT_ID`
|
|
- `CUSTOM_HERMES_SESSION_MODE=per_room`
|
|
- `CUSTOM_HERMES_REQUEST_TIMEOUT`
|
|
- `CUSTOM_HERMES_VERIFY_SSL`
|
|
|
|
When `CUSTOM_LLM_PROVIDER=hermes_gateway`, `base_llm`, `text_llm`, and `vision_llm` should all
|
|
point at the same Hermes adapter. Separate Hermes text/vision agent IDs are out of scope for this
|
|
design.
|
|
|
|
## Data Flow
|
|
|
|
1. User speaks or sends text.
|
|
2. Existing LiveKit/STT flow updates `ChatContext`.
|
|
3. `CustomAgent.llm_node()` selects `general` or `room_locator`.
|
|
4. Existing code injects the appropriate instructions and emotion-prefix requirement.
|
|
5. Existing code optionally augments the latest user message with memory context.
|
|
6. Existing code optionally attaches a fresh vision frame.
|
|
7. `_run_selected_llm()` calls `HermesGatewayLLM.chat(...)`.
|
|
8. The Hermes adapter sends the request to the per-room gateway session.
|
|
9. Gateway text events are converted to `llm.ChatChunk` deltas.
|
|
10. Existing emotion observation and TTS stripping continue unchanged.
|
|
|
|
## ChatContext Serialization
|
|
|
|
Text messages should be serialized first.
|
|
|
|
Supported LiveKit content:
|
|
|
|
- `str`: send as normal message content.
|
|
- instruction/config updates: preserve the final active instructions as the leading instruction
|
|
message in the gateway payload. If the deployed gateway only accepts user/assistant messages,
|
|
prepend the instruction text to the latest user message before sending.
|
|
- image content: attempt to send through the gateway image/multimodal field. If the deployed
|
|
Hermes gateway rejects or ignores image content, log a warning and fall back to text-only
|
|
generation for that turn.
|
|
|
|
Function tool calls should not be sent in the first implementation. If tool messages appear, log
|
|
that they were omitted.
|
|
|
|
## per_room Session Lifecycle
|
|
|
|
The adapter should derive a stable room key from the active LiveKit session or job context. If a
|
|
room name/SID is not available, fall back to one adapter-local session.
|
|
|
|
For each room key:
|
|
|
|
1. Open or reuse a gateway connection.
|
|
2. Send the gateway `connect` handshake if needed.
|
|
3. Create a Hermes session once.
|
|
4. Reuse that Hermes session for all future turns from the same room.
|
|
5. Close the gateway connection when the LiveKit LLM is closed.
|
|
|
|
This lets Hermes maintain its own conversational state while LiveKit still keeps the visible
|
|
conversation history.
|
|
|
|
## Gateway Event Mapping
|
|
|
|
Map streaming text events to LiveKit chunks:
|
|
|
|
- Gateway assistant text delta -> `llm.ChatChunk(delta=llm.ChoiceDelta(content=delta))`
|
|
- Gateway final assistant message -> emit any remaining text not already streamed
|
|
- Gateway usage metadata -> `llm.CompletionUsage` when token counts are available
|
|
- Gateway tool/action events -> log at debug/info level in the first implementation
|
|
- Gateway error event -> raise a LiveKit `APIError` or `APIConnectionError`
|
|
- Gateway completion event -> finish the async iterator
|
|
|
|
The implementation should make the event parser tolerant of protocol field-name differences by
|
|
isolating event normalization in one helper function. Unknown event types should be logged and
|
|
ignored unless they indicate failure.
|
|
|
|
## Error Handling
|
|
|
|
- Missing Hermes env vars should fail fast at startup when provider is `hermes_gateway`.
|
|
- Gateway connect/session-create failures should raise connection errors.
|
|
- A failed request should not discard the per-room session unless the gateway reports that the
|
|
session is invalid or closed.
|
|
- If the gateway connection closes mid-turn, reconnect once and retry only if no assistant text
|
|
has been yielded yet.
|
|
- If assistant text has already been yielded, fail the turn instead of replaying partial output.
|
|
|
|
## Testing
|
|
|
|
Add focused tests around the adapter:
|
|
|
|
- Serializes simple system/user/assistant chat context.
|
|
- Creates one gateway session and reuses it across two turns for the same room.
|
|
- Converts text deltas into `llm.ChatChunk` content.
|
|
- Handles final full-message events without duplicate text.
|
|
- Raises on gateway error events.
|
|
- Logs and skips unsupported image/tool content.
|
|
|
|
Add a small wiring test or import-level test for `CUSTOM_LLM_PROVIDER=hermes_gateway` if the
|
|
custom module is testable without external services.
|
|
|
|
## Rollout
|
|
|
|
1. Implement the adapter behind `CUSTOM_LLM_PROVIDER=hermes_gateway`.
|
|
2. Keep `openai` as the default provider.
|
|
3. Run unit tests for the adapter and a syntax/type smoke check on `custom/custom_agent.py`.
|
|
4. Test manually with a local gateway using `python custom/custom_agent.py console` or the
|
|
existing LiveKit development mode.
|
|
5. If vision payloads are unsupported by the deployed gateway, document that the first Hermes
|
|
rollout is text-only for vision turns.
|