Files
Fun-ASR/readme_bw.md
vera 7879751126
Some checks failed
Build container / build-docker (push) Failing after 28s
feat: api
2026-02-10 17:56:37 +08:00

147 lines
4.2 KiB
Markdown

# FunASR Dual-Mode API
This is a speech recognition (ASR) service built on FastAPI, integrating two inference modes of FunASR to provide flexible speech transcription capabilities.
## Features
The service provides two main inference interfaces:
1. **AutoModel Mode (`/inference/funasr`)**:
* Uses the `funasr.AutoModel` high-level interface.
* Integrates VAD (Voice Activity Detection).
* Supports Hotwords enhancement.
* Supports ITN (Inverse Text Normalization).
* Supports multi-language configuration.
2. **Direct Model Mode (`/inference/direct`)**:
* Directly calls the underlying `FunASRNano` model.
* Supports standard full inference.
* Supports simulated streaming/chunk inference (Chunk Mode) for testing the model's incremental decoding capabilities.
## Environment Setup
### Dependency Installation
This project uses `uv` for dependency management. Please ensure `uv` is installed, then run the following command in the project root directory:
```bash
uv sync
```
### Model Configuration
The default model path is configured as `/models/Fun-ASR-Nano-2512`. If your model is located elsewhere, please set the environment variable `MODEL_DIR`:
```bash
export MODEL_DIR="/your/absolute/path/to/model"
```
## Start Service
You can start the service directly using the uv script (default port 5000):
```bash
uv run api.py
```
The service will automatically detect the computing device (CUDA > MPS > CPU) upon startup.
### Docker Startup
If deploying with Docker, you can refer to the following command. You can specify a custom model path using `-e MODEL_DIR`:
```bash
docker run -d --restart always -p 5000:5000 --gpus "device=1" \
-e MODEL_DIR="/models/Fun-ASR-Nano-2512" \
--mount type=bind,source=/your/path/model/Fun-ASR-Nano-2512,target=/models/Fun-ASR-Nano-2512 \
harbor.bwgdi.com/library/fun-asr:0.0.1
```
## API Documentation
### 1. FunASR Standard Inference Interface
* **URL**: `/inference/funasr`
* **Method**: `POST`
* **Content-Type**: `multipart/form-data`
| Parameter Name | Type | Required | Default | Description |
| :--- | :--- | :--- | :--- | :--- |
| `file` | File | Yes | - | Audio file |
| `language` | String | No | "中文" | Target language |
| `itn` | String | No | "true" | Whether to enable Inverse Text Normalization (true/false) |
| `hotwords` | String | No | "" | List of hotwords to improve recognition rate of specific vocabulary |
**Example**:
```bash
curl -X POST "http://127.0.0.1:5000/inference/funasr" \
-F "file=@/path/to/audio.wav" \
-F "hotwords=开放时间"
```
### 2. Direct Underlying Inference Interface
* **URL**: `/inference/direct`
* **Method**: `POST`
* **Content-Type**: `multipart/form-data`
| Parameter Name | Type | Required | Default | Description |
| :--- | :--- | :--- | :--- | :--- |
| `file` | File | Yes | - | Audio file |
| `chunk_mode` | Boolean | No | False | Whether to enable chunk simulation mode (true/false) |
**Example**:
```bash
# Enable chunk simulation mode
curl -X POST "http://127.0.0.1:5000/inference/direct" \
-F "file=@/path/to/audio.wav" \
-F "chunk_mode=true"
```
**Response**:
```json
{
"status": "success",
"mode": "direct",
"text": {
"key": "rand_key_WgNZq6ITZM5jt",
"text": "你好。",
"text_tn": "你好",
"label": "null",
"ctc_text": "你好",
"ctc_timestamps": [
{
"token": "你",
"start_time": 1.8,
"end_time": 1.86,
"score": 0.908
},
{
"token": "好",
"start_time": 2.16,
"end_time": 2.22,
"score": 0.988
}
],
"timestamps": [
{
"token": "你",
"start_time": 1.8,
"end_time": 1.86,
"score": 0.908
},
{
"token": "好",
"start_time": 2.16,
"end_time": 2.22,
"score": 0.988
},
{
"token": "。",
"start_time": 2.88,
"end_time": 2.94,
"score": 0.0
}
]
}
}
```