- Added a comprehensive guide for creating custom boards in the XiaoZhi AI project, detailing directory structure, configuration files, and initialization code. - Introduced a new document explaining the MCP protocol for IoT control, including message formats and interaction flows. - Updated existing documentation to reflect changes in tool registration and usage examples for the MCP protocol. - Enhanced README files for better clarity and consistency across languages.
271 lines
8.6 KiB
Markdown
271 lines
8.6 KiB
Markdown
# MCP (Model Context Protocol) Interaction Flow
|
|
|
|
NOTICE: This document was AI-assisted; when implementing a backend, always cross-check the details against the code.
|
|
|
|
In this project, MCP is used between the backend API (MCP client) and the ESP32 device (MCP server) to let the backend discover and invoke the device's capabilities (tools).
|
|
|
|
## Message Format
|
|
|
|
From `main/protocols/protocol.cc` and `main/mcp_server.cc`, MCP messages are wrapped inside the underlying transport (WebSocket or MQTT). The inner payload follows the [JSON-RPC 2.0](https://www.jsonrpc.org/specification) specification.
|
|
|
|
Overall message layout:
|
|
|
|
```json
|
|
{
|
|
"session_id": "...", // session id
|
|
"type": "mcp", // fixed value "mcp"
|
|
"payload": { // JSON-RPC 2.0 payload
|
|
"jsonrpc": "2.0",
|
|
"method": "...", // method name ("initialize", "tools/list", "tools/call", ...)
|
|
"params": { ... }, // arguments (for requests)
|
|
"id": ..., // request id (for requests and responses)
|
|
"result": { ... }, // success result (response)
|
|
"error": { ... } // error (response)
|
|
}
|
|
}
|
|
```
|
|
|
|
The `payload` follows standard JSON-RPC 2.0:
|
|
|
|
- `jsonrpc`: always `"2.0"`.
|
|
- `method`: the method name (requests).
|
|
- `params`: structured parameters, usually an object (requests).
|
|
- `id`: request identifier; echoed back in responses.
|
|
- `result`: success value (responses).
|
|
- `error`: error information (responses).
|
|
|
|
## Interaction Flow
|
|
|
|
MCP interactions are driven by the client (backend) discovering and invoking tools on the device.
|
|
|
|
1. **Connection and capability announcement**
|
|
|
|
- **When**: after the device boots and connects to the backend.
|
|
- **Direction**: device -> backend.
|
|
- **Message**: the device sends the transport hello, advertising supported capabilities. MCP support is signaled via `"mcp": true` in the `features` map.
|
|
- **Example (transport hello, not an MCP payload):**
|
|
```json
|
|
{
|
|
"type": "hello",
|
|
"version": 1,
|
|
"features": {
|
|
"mcp": true
|
|
},
|
|
"transport": "websocket",
|
|
"audio_params": { ... },
|
|
"session_id": "..."
|
|
}
|
|
```
|
|
|
|
2. **Initialize the MCP session**
|
|
|
|
- **When**: after the backend sees that the device supports MCP. Usually the first MCP request.
|
|
- **Direction**: backend -> device.
|
|
- **Method**: `initialize`
|
|
- **Message (MCP payload):**
|
|
```json
|
|
{
|
|
"jsonrpc": "2.0",
|
|
"method": "initialize",
|
|
"params": {
|
|
"capabilities": {
|
|
// optional client capabilities
|
|
"vision": {
|
|
"url": "...", // camera image upload endpoint (must be an http URL, not a websocket URL)
|
|
"token": "..." // token for the upload URL
|
|
}
|
|
// ... other client capabilities
|
|
}
|
|
},
|
|
"id": 1
|
|
}
|
|
```
|
|
|
|
- **Device response:**
|
|
```json
|
|
{
|
|
"jsonrpc": "2.0",
|
|
"id": 1,
|
|
"result": {
|
|
"protocolVersion": "2024-11-05",
|
|
"capabilities": {
|
|
"tools": {}
|
|
},
|
|
"serverInfo": {
|
|
"name": "...", // device name (BOARD_NAME)
|
|
"version": "..." // firmware version
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
3. **Discover the tools**
|
|
|
|
- **When**: whenever the backend needs the list of callable tools and their signatures.
|
|
- **Direction**: backend -> device.
|
|
- **Method**: `tools/list`
|
|
- **Request parameters**:
|
|
- `cursor` (string, optional): pagination cursor. Empty on the first request.
|
|
- `withUserTools` (boolean, optional, default `false`): if `true`, the device also includes "user-only" tools (see "User-only tools" below) in the listing. This is typically used by a companion app that lets the user trigger privileged actions directly.
|
|
- **Message (MCP payload):**
|
|
```json
|
|
{
|
|
"jsonrpc": "2.0",
|
|
"method": "tools/list",
|
|
"params": {
|
|
"cursor": "",
|
|
"withUserTools": false
|
|
},
|
|
"id": 2
|
|
}
|
|
```
|
|
- **Device response:**
|
|
```json
|
|
{
|
|
"jsonrpc": "2.0",
|
|
"id": 2,
|
|
"result": {
|
|
"tools": [
|
|
{
|
|
"name": "self.get_device_status",
|
|
"description": "...",
|
|
"inputSchema": { ... }
|
|
},
|
|
{
|
|
"name": "self.audio_speaker.set_volume",
|
|
"description": "...",
|
|
"inputSchema": { ... }
|
|
}
|
|
// ... more tools
|
|
],
|
|
"nextCursor": "..."
|
|
}
|
|
}
|
|
```
|
|
- **Pagination**: when `nextCursor` is non-empty, the backend must send another `tools/list` request with that cursor to fetch the next page.
|
|
|
|
4. **Call a tool**
|
|
|
|
- **When**: the backend wants to execute a specific device function.
|
|
- **Direction**: backend -> device.
|
|
- **Method**: `tools/call`
|
|
- **Message (MCP payload):**
|
|
```json
|
|
{
|
|
"jsonrpc": "2.0",
|
|
"method": "tools/call",
|
|
"params": {
|
|
"name": "self.audio_speaker.set_volume",
|
|
"arguments": {
|
|
"volume": 50
|
|
}
|
|
},
|
|
"id": 3
|
|
}
|
|
```
|
|
- **Successful response:**
|
|
```json
|
|
{
|
|
"jsonrpc": "2.0",
|
|
"id": 3,
|
|
"result": {
|
|
"content": [
|
|
{ "type": "text", "text": "true" }
|
|
],
|
|
"isError": false
|
|
}
|
|
}
|
|
```
|
|
- **Error response:**
|
|
```json
|
|
{
|
|
"jsonrpc": "2.0",
|
|
"id": 3,
|
|
"error": {
|
|
"code": -32601,
|
|
"message": "Unknown tool: self.non_existent_tool"
|
|
}
|
|
}
|
|
```
|
|
|
|
5. **Device-initiated notifications**
|
|
|
|
- **When**: the device wants to inform the backend of internal events (e.g. state transitions). `Application::SendMcpMessage` is the outbound entry point.
|
|
- **Direction**: device -> backend.
|
|
- **Method**: conventionally `notifications/...` or any custom method.
|
|
- **Message (MCP payload)**: JSON-RPC notifications have no `id`.
|
|
```json
|
|
{
|
|
"jsonrpc": "2.0",
|
|
"method": "notifications/state_changed",
|
|
"params": {
|
|
"newState": "idle",
|
|
"oldState": "connecting"
|
|
}
|
|
}
|
|
```
|
|
- **Backend handling**: process the notification without replying.
|
|
|
|
## User-only Tools
|
|
|
|
The MCP server on the device maintains two kinds of tools:
|
|
|
|
- **Regular tools** - registered via `McpServer::AddTool`. Exposed to the backend (and hence the AI model) by default.
|
|
- **User-only tools** - registered via `McpServer::AddUserOnlyTool`. These are hidden from standard `tools/list` results, because they are privileged or user-facing actions that should not be invoked autonomously by the AI. Examples include system reboot, firmware upgrade, and screen snapshot upload.
|
|
|
|
The backend opts in to user-only tools by sending `tools/list` with `params.withUserTools = true`. Typical usage: a companion app screen that exposes these actions to the end user.
|
|
|
|
See [MCP IoT control usage](./mcp-usage.md) for how to register either kind of tool on the device side.
|
|
|
|
## Sequence Diagram
|
|
|
|
A simplified diagram of the main MCP message flow:
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Device as ESP32 Device
|
|
participant BackendAPI as Backend API (Client)
|
|
|
|
Note over Device, BackendAPI: Establish WebSocket / MQTT
|
|
|
|
Device->>BackendAPI: Hello (features.mcp = true)
|
|
|
|
BackendAPI->>Device: MCP Initialize request
|
|
Note over BackendAPI: method: initialize
|
|
Note over BackendAPI: params: { capabilities: ... }
|
|
|
|
Device->>BackendAPI: MCP Initialize response
|
|
Note over Device: result: { protocolVersion, serverInfo, ... }
|
|
|
|
BackendAPI->>Device: MCP tools/list request
|
|
Note over BackendAPI: params: { cursor: "", withUserTools: false }
|
|
|
|
Device->>BackendAPI: MCP tools/list response
|
|
Note over Device: result: { tools: [...], nextCursor: ... }
|
|
|
|
loop Optional pagination
|
|
BackendAPI->>Device: MCP tools/list request
|
|
Note over BackendAPI: params: { cursor: "..." }
|
|
Device->>BackendAPI: MCP tools/list response
|
|
Note over Device: result: { tools: [...], nextCursor: "" }
|
|
end
|
|
|
|
BackendAPI->>Device: MCP tools/call request
|
|
Note over BackendAPI: params: { name, arguments }
|
|
|
|
alt Call succeeds
|
|
Device->>BackendAPI: MCP tools/call success response
|
|
Note over Device: result: { content, isError: false }
|
|
else Call fails
|
|
Device->>BackendAPI: MCP tools/call error response
|
|
Note over Device: error: { code, message }
|
|
end
|
|
|
|
opt Device notification
|
|
Device->>BackendAPI: MCP notification
|
|
Note over Device: method: notifications/...
|
|
end
|
|
```
|
|
|
|
This document summarizes the MCP interaction flow in this project. For exact parameter shapes, behavior, and available tools, refer to `McpServer::AddCommonTools` / `AddUserOnlyTools` in `main/mcp_server.cc` and the per-board `InitializeTools` implementations.
|