4.2 KiB
FunASR Dual-Mode API
This is a speech recognition (ASR) service built on FastAPI, integrating two inference modes of FunASR to provide flexible speech transcription capabilities.
Features
The service provides two main inference interfaces:
-
AutoModel Mode (
/inference/funasr):- Uses the
funasr.AutoModelhigh-level interface. - Integrates VAD (Voice Activity Detection).
- Supports Hotwords enhancement.
- Supports ITN (Inverse Text Normalization).
- Supports multi-language configuration.
- Uses the
-
Direct Model Mode (
/inference/direct):- Directly calls the underlying
FunASRNanomodel. - Supports standard full inference.
- Supports simulated streaming/chunk inference (Chunk Mode) for testing the model's incremental decoding capabilities.
- Directly calls the underlying
Environment Setup
Dependency Installation
This project uses uv for dependency management. Please ensure uv is installed, then run the following command in the project root directory:
uv sync
Model Configuration
The default model path is configured as /models/Fun-ASR-Nano-2512. If your model is located elsewhere, please set the environment variable MODEL_DIR:
export MODEL_DIR="/your/absolute/path/to/model"
Start Service
You can start the service directly using the uv script (default port 5000):
uv run api.py
The service will automatically detect the computing device (CUDA > MPS > CPU) upon startup.
Docker Startup
If deploying with Docker, you can refer to the following command. You can specify a custom model path using -e MODEL_DIR:
docker run -d --restart always -p 5000:5000 --gpus "device=1" \
-e MODEL_DIR="/models/Fun-ASR-Nano-2512" \
--mount type=bind,source=/your/path/model/Fun-ASR-Nano-2512,target=/models/Fun-ASR-Nano-2512 \
harbor.bwgdi.com/library/fun-asr:0.0.1
API Documentation
1. FunASR Standard Inference Interface
- URL:
/inference/funasr - Method:
POST - Content-Type:
multipart/form-data
| Parameter Name | Type | Required | Default | Description |
|---|---|---|---|---|
file |
File | Yes | - | Audio file |
language |
String | No | "中文" | Target language |
itn |
String | No | "true" | Whether to enable Inverse Text Normalization (true/false) |
hotwords |
String | No | "" | List of hotwords to improve recognition rate of specific vocabulary |
Example:
curl -X POST "http://127.0.0.1:5000/inference/funasr" \
-F "file=@/path/to/audio.wav" \
-F "hotwords=开放时间"
2. Direct Underlying Inference Interface
- URL:
/inference/direct - Method:
POST - Content-Type:
multipart/form-data
| Parameter Name | Type | Required | Default | Description |
|---|---|---|---|---|
file |
File | Yes | - | Audio file |
chunk_mode |
Boolean | No | False | Whether to enable chunk simulation mode (true/false) |
Example:
# Enable chunk simulation mode
curl -X POST "http://127.0.0.1:5000/inference/direct" \
-F "file=@/path/to/audio.wav" \
-F "chunk_mode=true"
Response:
{
"status": "success",
"mode": "direct",
"text": {
"key": "rand_key_WgNZq6ITZM5jt",
"text": "你好。",
"text_tn": "你好",
"label": "null",
"ctc_text": "你好",
"ctc_timestamps": [
{
"token": "你",
"start_time": 1.8,
"end_time": 1.86,
"score": 0.908
},
{
"token": "好",
"start_time": 2.16,
"end_time": 2.22,
"score": 0.988
}
],
"timestamps": [
{
"token": "你",
"start_time": 1.8,
"end_time": 1.86,
"score": 0.908
},
{
"token": "好",
"start_time": 2.16,
"end_time": 2.22,
"score": 0.988
},
{
"token": "。",
"start_time": 2.88,
"end_time": 2.94,
"score": 0.0
}
]
}
}