147 lines
4.2 KiB
Markdown
147 lines
4.2 KiB
Markdown
# FunASR Dual-Mode API
|
|
|
|
This is a speech recognition (ASR) service built on FastAPI, integrating two inference modes of FunASR to provide flexible speech transcription capabilities.
|
|
|
|
## Features
|
|
|
|
The service provides two main inference interfaces:
|
|
|
|
1. **AutoModel Mode (`/inference/funasr`)**:
|
|
* Uses the `funasr.AutoModel` high-level interface.
|
|
* Integrates VAD (Voice Activity Detection).
|
|
* Supports Hotwords enhancement.
|
|
* Supports ITN (Inverse Text Normalization).
|
|
* Supports multi-language configuration.
|
|
|
|
2. **Direct Model Mode (`/inference/direct`)**:
|
|
* Directly calls the underlying `FunASRNano` model.
|
|
* Supports standard full inference.
|
|
* Supports simulated streaming/chunk inference (Chunk Mode) for testing the model's incremental decoding capabilities.
|
|
|
|
## Environment Setup
|
|
|
|
### Dependency Installation
|
|
|
|
This project uses `uv` for dependency management. Please ensure `uv` is installed, then run the following command in the project root directory:
|
|
|
|
```bash
|
|
uv sync
|
|
```
|
|
|
|
### Model Configuration
|
|
|
|
The default model path is configured as `/models/Fun-ASR-Nano-2512`. If your model is located elsewhere, please set the environment variable `MODEL_DIR`:
|
|
|
|
```bash
|
|
export MODEL_DIR="/your/absolute/path/to/model"
|
|
```
|
|
|
|
## Start Service
|
|
|
|
You can start the service directly using the uv script (default port 5000):
|
|
|
|
```bash
|
|
uv run api.py
|
|
```
|
|
|
|
The service will automatically detect the computing device (CUDA > MPS > CPU) upon startup.
|
|
|
|
### Docker Startup
|
|
|
|
If deploying with Docker, you can refer to the following command. You can specify a custom model path using `-e MODEL_DIR`:
|
|
|
|
```bash
|
|
docker run -d --restart always -p 5000:5000 --gpus "device=1" \
|
|
-e MODEL_DIR="/models/Fun-ASR-Nano-2512" \
|
|
--mount type=bind,source=/your/path/model/Fun-ASR-Nano-2512,target=/models/Fun-ASR-Nano-2512 \
|
|
harbor.bwgdi.com/library/fun-asr:0.0.1
|
|
```
|
|
|
|
## API Documentation
|
|
|
|
### 1. FunASR Standard Inference Interface
|
|
|
|
* **URL**: `/inference/funasr`
|
|
* **Method**: `POST`
|
|
* **Content-Type**: `multipart/form-data`
|
|
|
|
| Parameter Name | Type | Required | Default | Description |
|
|
| :--- | :--- | :--- | :--- | :--- |
|
|
| `file` | File | Yes | - | Audio file |
|
|
| `language` | String | No | "中文" | Target language |
|
|
| `itn` | String | No | "true" | Whether to enable Inverse Text Normalization (true/false) |
|
|
| `hotwords` | String | No | "" | List of hotwords to improve recognition rate of specific vocabulary |
|
|
|
|
**Example**:
|
|
```bash
|
|
curl -X POST "http://127.0.0.1:5000/inference/funasr" \
|
|
-F "file=@/path/to/audio.wav" \
|
|
-F "hotwords=开放时间"
|
|
```
|
|
|
|
### 2. Direct Underlying Inference Interface
|
|
|
|
* **URL**: `/inference/direct`
|
|
* **Method**: `POST`
|
|
* **Content-Type**: `multipart/form-data`
|
|
|
|
| Parameter Name | Type | Required | Default | Description |
|
|
| :--- | :--- | :--- | :--- | :--- |
|
|
| `file` | File | Yes | - | Audio file |
|
|
| `chunk_mode` | Boolean | No | False | Whether to enable chunk simulation mode (true/false) |
|
|
|
|
**Example**:
|
|
```bash
|
|
# Enable chunk simulation mode
|
|
curl -X POST "http://127.0.0.1:5000/inference/direct" \
|
|
-F "file=@/path/to/audio.wav" \
|
|
-F "chunk_mode=true"
|
|
```
|
|
**Response**:
|
|
```json
|
|
{
|
|
"status": "success",
|
|
"mode": "direct",
|
|
"text": {
|
|
"key": "rand_key_WgNZq6ITZM5jt",
|
|
"text": "你好。",
|
|
"text_tn": "你好",
|
|
"label": "null",
|
|
"ctc_text": "你好",
|
|
"ctc_timestamps": [
|
|
{
|
|
"token": "你",
|
|
"start_time": 1.8,
|
|
"end_time": 1.86,
|
|
"score": 0.908
|
|
},
|
|
{
|
|
"token": "好",
|
|
"start_time": 2.16,
|
|
"end_time": 2.22,
|
|
"score": 0.988
|
|
}
|
|
],
|
|
"timestamps": [
|
|
{
|
|
"token": "你",
|
|
"start_time": 1.8,
|
|
"end_time": 1.86,
|
|
"score": 0.908
|
|
},
|
|
{
|
|
"token": "好",
|
|
"start_time": 2.16,
|
|
"end_time": 2.22,
|
|
"score": 0.988
|
|
},
|
|
{
|
|
"token": "。",
|
|
"start_time": 2.88,
|
|
"end_time": 2.94,
|
|
"score": 0.0
|
|
}
|
|
]
|
|
}
|
|
}
|
|
``` |