Update Chinese and English documentation for custom board setup and MCP protocol

- Added a comprehensive guide for creating custom boards in the XiaoZhi AI project, detailing directory structure, configuration files, and initialization code.
- Introduced a new document explaining the MCP protocol for IoT control, including message formats and interaction flows.
- Updated existing documentation to reflect changes in tool registration and usage examples for the MCP protocol.
- Enhanced README files for better clarity and consistency across languages.
This commit is contained in:
Terrence
2026-04-17 03:36:37 +08:00
parent 69b1a978e9
commit 87f6faee79
15 changed files with 2910 additions and 919 deletions

View File

@ -1,76 +1,77 @@
# MQTT + UDP 混合通信协议文档
# MQTT + UDP Hybrid Communication Protocol
基于代码实现整理的 MQTT + UDP 混合通信协议文档,概述设备端与服务器之间如何通过 MQTT 进行控制消息传输,通过 UDP 进行音频数据传输的交互方式。
This document describes the MQTT + UDP hybrid protocol used between the device and the server, based on the current implementation: MQTT carries control messages, UDP carries real-time audio.
---
## 1. 协议概览
## 1. Overview
本协议采用混合传输方式:
- **MQTT**用于控制消息、状态同步、JSON 数据交换
- **UDP**:用于实时音频数据传输,支持加密
The protocol uses two channels:
### 1.1 协议特点
- **MQTT** - control messages, state synchronization, JSON payloads.
- **UDP** - real-time audio, encrypted.
- **双通道设计**:控制与数据分离,确保实时性
- **加密传输**UDP 音频数据使用 AES-CTR 加密
- **序列号保护**:防止数据包重放和乱序
- **自动重连**MQTT 连接断开时自动重连
### 1.1 Key characteristics
- **Dual channel design** - control is separated from data so audio has low latency.
- **Encrypted transport** - UDP audio is encrypted with AES-CTR.
- **Sequence numbers** - guard against replay and reordering.
- **Automatic reconnect** - MQTT reconnects on disconnect.
---
## 2. 总体流程概览
## 2. End-to-end Flow
```mermaid
sequenceDiagram
participant Device as ESP32 设备
participant MQTT as MQTT 服务器
participant UDP as UDP 服务器
participant Device as ESP32 device
participant MQTT as MQTT broker
participant UDP as UDP server
Note over Device, UDP: 1. 建立 MQTT 连接
Note over Device, UDP: 1. Establish MQTT connection
Device->>MQTT: MQTT Connect
MQTT->>Device: Connected
Note over Device, UDP: 2. 请求音频通道
Device->>MQTT: Hello Message (type: "hello", transport: "udp")
MQTT->>Device: Hello Response (UDP 连接信息 + 加密密钥)
Note over Device, UDP: 2. Request audio channel
Device->>MQTT: Hello message (type: "hello", transport: "udp")
MQTT->>Device: Hello response (UDP endpoint + encryption keys)
Note over Device, UDP: 3. 建立 UDP 连接
Note over Device, UDP: 3. Establish UDP connection
Device->>UDP: UDP Connect
UDP->>Device: Connected
Note over Device, UDP: 4. 音频数据传输
loop 音频流传输
Device->>UDP: 加密音频数据 (Opus)
UDP->>Device: 加密音频数据 (Opus)
Note over Device, UDP: 4. Audio streaming
loop Audio stream
Device->>UDP: Encrypted audio (Opus)
UDP->>Device: Encrypted audio (Opus)
end
Note over Device, UDP: 5. 控制消息交换
par 控制消息
Device->>MQTT: Listen/TTS/MCP 消息
MQTT->>Device: STT/TTS/MCP 响应
Note over Device, UDP: 5. Control messages
par Control
Device->>MQTT: Listen / TTS / MCP messages
MQTT->>Device: STT / TTS / MCP / Alert responses
end
Note over Device, UDP: 6. 关闭连接
Device->>MQTT: Goodbye Message
Note over Device, UDP: 6. Teardown
Device->>MQTT: Goodbye
Device->>UDP: Disconnect
```
---
## 3. MQTT 控制通道
## 3. MQTT Control Channel
### 3.1 连接建立
### 3.1 Connection
设备通过 MQTT 连接到服务器,连接参数包括:
- **Endpoint**MQTT 服务器地址和端口
- **Client ID**:设备唯一标识符
- **Username/Password**:认证凭据
- **Keep Alive**心跳间隔默认240秒
The device connects to the broker using:
- **Endpoint** - broker host and port.
- **Client ID** - device identifier.
- **Username / Password** - credentials.
- **Keep Alive** - heartbeat interval (default 240 s).
### 3.2 Hello 消息交换
### 3.2 Hello exchange
#### 3.2.1 设备端发送 Hello
#### 3.2.1 Device -> Server
```json
{
@ -78,7 +79,8 @@ sequenceDiagram
"version": 3,
"transport": "udp",
"features": {
"mcp": true
"mcp": true,
"aec": true
},
"audio_params": {
"format": "opus",
@ -89,7 +91,9 @@ sequenceDiagram
}
```
#### 3.2.2 服务器响应 Hello
`features.mcp` is always set; `features.aec` is set when `CONFIG_USE_SERVER_AEC` is enabled.
#### 3.2.2 Server -> Device
```json
{
@ -111,17 +115,17 @@ sequenceDiagram
}
```
**字段说明:**
- `udp.server`UDP 服务器地址
- `udp.port`UDP 服务器端口
- `udp.key`AES 加密密钥(十六进制字符串)
- `udp.nonce`AES 加密随机数(十六进制字符串)
Field reference:
- `udp.server` - UDP server address.
- `udp.port` - UDP server port.
- `udp.key` - AES key, hex-encoded.
- `udp.nonce` - AES nonce, hex-encoded.
### 3.3 JSON 消息类型
### 3.3 JSON message types
#### 3.3.1 设备端→服务器
#### 3.3.1 Device -> Server
1. **Listen 消息**
1. **Listen**
```json
{
"session_id": "xxx",
@ -131,7 +135,7 @@ sequenceDiagram
}
```
2. **Abort 消息**
2. **Abort**
```json
{
"session_id": "xxx",
@ -140,7 +144,7 @@ sequenceDiagram
}
```
3. **MCP 消息**
3. **MCP**
```json
{
"session_id": "xxx",
@ -148,12 +152,12 @@ sequenceDiagram
"payload": {
"jsonrpc": "2.0",
"id": 1,
"result": {...}
"result": {}
}
}
```
4. **Goodbye 消息**
4. **Goodbye**
```json
{
"session_id": "xxx",
@ -161,71 +165,84 @@ sequenceDiagram
}
```
#### 3.3.2 服务器→设备端
#### 3.3.2 Server -> Device
支持的消息类型与 WebSocket 协议一致,包括:
- **STT**:语音识别结果
- **TTS**:语音合成控制
- **LLM**:情感表达控制
- **MCP**:物联网控制
- **System**:系统控制
- **Custom**:自定义消息(可选)
Semantics match the WebSocket protocol. Supported types:
- **STT** - speech recognition result.
- **TTS** - TTS lifecycle (`start`, `stop`, `sentence_start`).
- **LLM** - emotion update for the UI.
- **MCP** - IoT control.
- **System** - system control, e.g. `"command": "reboot"`.
- **Alert** - show an alert on the UI; fields: `status`, `message`, `emotion`.
- **Goodbye** - server-initiated shutdown of the audio session. The device responds by closing the UDP channel without sending its own goodbye.
- **Custom** (optional, enabled via `CONFIG_RECEIVE_CUSTOM_MESSAGE`).
Example alert:
```json
{
"session_id": "xxx",
"type": "alert",
"status": "Warning",
"message": "Battery low",
"emotion": "sad"
}
```
---
## 4. UDP 音频通道
## 4. UDP Audio Channel
### 4.1 连接建立
### 4.1 Establishing the channel
设备收到 MQTT Hello 响应后,使用其中的 UDP 连接信息建立音频通道:
1. 解析 UDP 服务器地址和端口
2. 解析加密密钥和随机数
3. 初始化 AES-CTR 加密上下文
4. 建立 UDP 连接
After the device receives the MQTT hello response, it:
1. Parses the UDP host and port.
2. Parses the AES key and nonce.
3. Initializes the AES-CTR context.
4. Opens the UDP socket.
### 4.2 音频数据格式
### 4.2 Audio packet format
#### 4.2.1 加密音频包结构
#### 4.2.1 Encrypted audio packet
```
|type 1byte|flags 1byte|payload_len 2bytes|ssrc 4bytes|timestamp 4bytes|sequence 4bytes|
|type 1B|flags 1B|payload_len 2B|ssrc 4B|timestamp 4B|sequence 4B|
|payload payload_len bytes|
```
**字段说明:**
- `type`:数据包类型,固定为 0x01
- `flags`:标志位,当前未使用
- `payload_len`:负载长度(网络字节序)
- `ssrc`:同步源标识符
- `timestamp`:时间戳(网络字节序)
- `sequence`:序列号(网络字节序)
- `payload`:加密的 Opus 音频数据
Field reference:
- `type`: packet type, always `0x01`.
- `flags`: flags, currently unused.
- `payload_len`: payload length (network byte order).
- `ssrc`: synchronization source identifier.
- `timestamp`: timestamp (network byte order).
- `sequence`: sequence number (network byte order).
- `payload`: encrypted Opus audio data.
#### 4.2.2 加密算法
#### 4.2.2 Encryption
使用 **AES-CTR** 模式加密:
- **密钥**128位由服务器提供
- **随机数**128位由服务器提供
- **计数器**:包含时间戳和序列号信息
Uses **AES-CTR** with:
- **Key**: 128-bit, provided by the server.
- **Nonce**: 128-bit, provided by the server.
- **Counter**: built from the timestamp and sequence number.
### 4.3 序列号管理
### 4.3 Sequence number management
- **发送端**`local_sequence_` 单调递增
- **接收端**`remote_sequence_` 验证连续性
- **防重放**:拒绝序列号小于期望值的数据包
- **容错处理**:允许轻微的序列号跳跃,记录警告
- **Sender**: `local_sequence_` is incremented monotonically.
- **Receiver**: `remote_sequence_` validates continuity.
- **Anti-replay**: packets with sequence numbers below the expected value are dropped.
- **Tolerance**: small gaps are logged as warnings but still accepted.
### 4.4 错误处理
### 4.4 Error handling
1. **解密失败**:记录错误,丢弃数据包
2. **序列号异常**:记录警告,但仍处理数据包
3. **数据包格式错误**:记录错误,丢弃数据包
1. **Decryption failure** - log an error and drop the packet.
2. **Sequence gap** - log a warning, continue processing the packet.
3. **Malformed packet** - log an error and drop.
---
## 5. 状态管理
## 5. State Management
### 5.1 连接状态
### 5.1 Connection state
```mermaid
stateDiagram
@ -233,21 +250,21 @@ stateDiagram
[*] --> Disconnected
Disconnected --> MqttConnecting: StartMqttClient()
MqttConnecting --> MqttConnected: MQTT Connected
MqttConnecting --> Disconnected: Connect Failed
MqttConnecting --> Disconnected: Connect failed
MqttConnected --> RequestingChannel: OpenAudioChannel()
RequestingChannel --> ChannelOpened: Hello Exchange Success
RequestingChannel --> MqttConnected: Hello Timeout/Failed
ChannelOpened --> UdpConnected: UDP Connect Success
UdpConnected --> AudioStreaming: Start Audio Transfer
AudioStreaming --> UdpConnected: Stop Audio Transfer
UdpConnected --> ChannelOpened: UDP Disconnect
RequestingChannel --> ChannelOpened: Hello exchange success
RequestingChannel --> MqttConnected: Hello timeout / failed
ChannelOpened --> UdpConnected: UDP connect success
UdpConnected --> AudioStreaming: Start audio
AudioStreaming --> UdpConnected: Stop audio
UdpConnected --> ChannelOpened: UDP disconnect
ChannelOpened --> MqttConnected: CloseAudioChannel()
MqttConnected --> Disconnected: MQTT Disconnect
MqttConnected --> Disconnected: MQTT disconnect
```
### 5.2 状态检查
### 5.2 State check
设备通过以下条件判断音频通道是否可用:
The device determines whether the audio channel is available with:
```cpp
bool IsAudioChannelOpened() const {
return udp_ != nullptr && !error_occurred_ && !IsTimeout();
@ -256,138 +273,137 @@ bool IsAudioChannelOpened() const {
---
## 6. 配置参数
## 6. Configuration Parameters
### 6.1 MQTT 配置
### 6.1 MQTT settings
从设置中读取的配置项:
- `endpoint`MQTT 服务器地址
- `client_id`:客户端标识符
- `username`:用户名
- `password`:密码
- `keepalive`心跳间隔默认240秒
- `publish_topic`:发布主题
Read from storage:
- `endpoint` - broker address.
- `client_id` - client identifier.
- `username` - user name.
- `password` - password.
- `keepalive` - keep-alive interval (default 240 s).
- `publish_topic` - publish topic.
### 6.2 音频参数
### 6.2 Audio parameters
- **格式**Opus
- **采样率**16000 Hz设备端/ 24000 Hz服务器端
- **声道数**1单声道
- **帧时长**60ms
- **Format**: Opus
- **Sample rate**: 16 kHz device / 24 kHz server
- **Channels**: 1 (mono)
- **Frame duration**: 60 ms
---
## 7. 错误处理与重连
## 7. Error Handling and Reconnection
### 7.1 MQTT 重连机制
### 7.1 MQTT reconnect
- 连接失败时自动重试
- 支持错误上报控制
- 断线时触发清理流程
- Automatic retry on connect failure.
- Optional error reporting.
- Clean-up runs on disconnect.
### 7.2 UDP 连接管理
### 7.2 UDP connection
- 连接失败时不自动重试
- 依赖 MQTT 通道重新协商
- 支持连接状态查询
- No automatic retry; depends on re-negotiation via MQTT.
- Status can be queried at any time.
### 7.3 超时处理
### 7.3 Timeouts
基类 `Protocol` 提供超时检测:
- 默认超时时间:120
- 基于最后接收时间计算
- 超时时自动标记为不可用
The base `Protocol` class provides timeout detection:
- Default timeout: 120 s.
- Based on the time since the last incoming packet.
- After a timeout the channel is marked unavailable.
---
## 8. 安全考虑
## 8. Security
### 8.1 传输加密
### 8.1 Transport encryption
- **MQTT**:支持 TLS/SSL 加密(端口8883
- **UDP**:使用 AES-CTR 加密音频数据
- **MQTT**: supports TLS/SSL (port 8883).
- **UDP**: AES-CTR on audio payloads.
### 8.2 认证机制
### 8.2 Authentication
- **MQTT**:用户名/密码认证
- **UDP**:通过 MQTT 通道分发密钥
- **MQTT**: user name / password.
- **UDP**: keys are distributed via the MQTT channel.
### 8.3 防重放攻击
### 8.3 Anti-replay
- 序列号单调递增
- 拒绝过期数据包
- 时间戳验证
- Monotonically increasing sequence numbers.
- Stale packets are dropped.
- Timestamps are validated.
---
## 9. 性能优化
## 9. Performance Notes
### 9.1 并发控制
### 9.1 Concurrency
使用互斥锁保护 UDP 连接:
A mutex protects the UDP connection:
```cpp
std::lock_guard<std::mutex> lock(channel_mutex_);
```
### 9.2 内存管理
### 9.2 Memory management
- 动态创建/销毁网络对象
- 智能指针管理音频数据包
- 及时释放加密上下文
- Network objects are created and destroyed dynamically.
- Audio packets are managed with smart pointers.
- Encryption contexts are released promptly.
### 9.3 网络优化
### 9.3 Network optimizations
- UDP 连接复用
- 数据包大小优化
- 序列号连续性检查
- UDP connection reuse.
- Reasonable packet sizes.
- Sequence continuity checks.
---
## 10. WebSocket 协议的比较
## 10. Comparison with WebSocket
| 特性 | MQTT + UDP | WebSocket |
|------|------------|-----------|
| 控制通道 | MQTT | WebSocket |
| 音频通道 | UDP (加密) | WebSocket (二进制) |
| 实时性 | 高 (UDP) | 中等 |
| 可靠性 | 中等 | 高 |
| 复杂度 | 高 | 低 |
| 加密 | AES-CTR | TLS |
| 防火墙友好度 | 低 | 高 |
| Feature | MQTT + UDP | WebSocket |
|---------|------------|-----------|
| Control channel | MQTT | WebSocket |
| Audio channel | UDP (encrypted) | WebSocket (binary) |
| Latency | Low (UDP) | Medium |
| Reliability | Medium | High |
| Complexity | High | Low |
| Encryption | AES-CTR | TLS |
| Firewall friendliness | Low | High |
---
## 11. 部署建议
## 11. Deployment Notes
### 11.1 网络环境
### 11.1 Network
- 确保 UDP 端口可达
- 配置防火墙规则
- 考虑 NAT 穿透
- Ensure UDP ports are reachable.
- Configure firewall rules accordingly.
- Plan for NAT traversal if needed.
### 11.2 服务器配置
### 11.2 Server infrastructure
- MQTT Broker 配置
- UDP 服务器部署
- 密钥管理系统
- MQTT broker configuration.
- UDP server deployment.
- Key management.
### 11.3 监控指标
### 11.3 Monitoring
- 连接成功率
- 音频传输延迟
- 数据包丢失率
- 解密失败率
- Connection success rate.
- Audio transmission latency.
- Packet loss.
- Decryption failures.
---
## 12. 总结
## 12. Summary
MQTT + UDP 混合协议通过以下设计实现高效的音视频通信:
The MQTT + UDP hybrid protocol achieves efficient audio communication through:
- **分离式架构**:控制与数据通道分离,各司其职
- **加密保护**AES-CTR 确保音频数据安全传输
- **序列化管理**:防止重放攻击和数据乱序
- **自动恢复**:支持连接断开后的自动重连
- **性能优化**UDP 传输保证音频数据的实时性
- **Split architecture** - separate control and data channels with clear responsibilities.
- **Encryption** - AES-CTR protects audio payloads.
- **Sequence management** - prevents replay and reordering.
- **Automatic recovery** - MQTT reconnects on failure.
- **Performance** - UDP keeps audio latency low.
该协议适用于对实时性要求较高的语音交互场景,但需要在网络复杂度和传输性能之间做出权衡。
The protocol is a good fit for low-latency voice interaction, at the cost of higher network complexity than pure WebSocket.