# FunASR Dual-Mode API This is a speech recognition (ASR) service built on FastAPI, integrating two inference modes of FunASR to provide flexible speech transcription capabilities. ## Features The service provides two main inference interfaces: 1. **AutoModel Mode (`/inference/funasr`)**: * Uses the `funasr.AutoModel` high-level interface. * Integrates VAD (Voice Activity Detection). * Supports Hotwords enhancement. * Supports ITN (Inverse Text Normalization). * Supports multi-language configuration. 2. **Direct Model Mode (`/inference/direct`)**: * Directly calls the underlying `FunASRNano` model. * Supports standard full inference. * Supports simulated streaming/chunk inference (Chunk Mode) for testing the model's incremental decoding capabilities. ## Environment Setup ### Dependency Installation This project uses `uv` for dependency management. Please ensure `uv` is installed, then run the following command in the project root directory: ```bash uv sync ``` ### Model Configuration The default model path is configured as `/models/Fun-ASR-Nano-2512`. If your model is located elsewhere, please set the environment variable `MODEL_DIR`: ```bash export MODEL_DIR="/your/absolute/path/to/model" ``` ## Start Service You can start the service directly using the uv script (default port 5000): ```bash uv run api.py ``` The service will automatically detect the computing device (CUDA > MPS > CPU) upon startup. ### Docker Startup If deploying with Docker, you can refer to the following command. You can specify a custom model path using `-e MODEL_DIR`: ```bash docker run -d --restart always -p 5000:5000 --gpus "device=1" \ -e MODEL_DIR="/models/Fun-ASR-Nano-2512" \ --mount type=bind,source=/your/path/model/Fun-ASR-Nano-2512,target=/models/Fun-ASR-Nano-2512 \ harbor.bwgdi.com/library/fun-asr:0.0.1 ``` ## API Documentation ### 1. FunASR Standard Inference Interface * **URL**: `/inference/funasr` * **Method**: `POST` * **Content-Type**: `multipart/form-data` | Parameter Name | Type | Required | Default | Description | | :--- | :--- | :--- | :--- | :--- | | `file` | File | Yes | - | Audio file | | `language` | String | No | "中文" | Target language | | `itn` | String | No | "true" | Whether to enable Inverse Text Normalization (true/false) | | `hotwords` | String | No | "" | List of hotwords to improve recognition rate of specific vocabulary | **Example**: ```bash curl -X POST "http://127.0.0.1:5000/inference/funasr" \ -F "file=@/path/to/audio.wav" \ -F "hotwords=开放时间" ``` ### 2. Direct Underlying Inference Interface * **URL**: `/inference/direct` * **Method**: `POST` * **Content-Type**: `multipart/form-data` | Parameter Name | Type | Required | Default | Description | | :--- | :--- | :--- | :--- | :--- | | `file` | File | Yes | - | Audio file | | `chunk_mode` | Boolean | No | False | Whether to enable chunk simulation mode (true/false) | **Example**: ```bash # Enable chunk simulation mode curl -X POST "http://127.0.0.1:5000/inference/direct" \ -F "file=@/path/to/audio.wav" \ -F "chunk_mode=true" ``` **Response**: ```json { "status": "success", "mode": "direct", "text": { "key": "rand_key_WgNZq6ITZM5jt", "text": "你好。", "text_tn": "你好", "label": "null", "ctc_text": "你好", "ctc_timestamps": [ { "token": "你", "start_time": 1.8, "end_time": 1.86, "score": 0.908 }, { "token": "好", "start_time": 2.16, "end_time": 2.22, "score": 0.988 } ], "timestamps": [ { "token": "你", "start_time": 1.8, "end_time": 1.86, "score": 0.908 }, { "token": "好", "start_time": 2.16, "end_time": 2.22, "score": 0.988 }, { "token": "。", "start_time": 2.88, "end_time": 2.94, "score": 0.0 } ] } } ```