FunASR Dual-Mode API

This is a speech recognition (ASR) service built on FastAPI, integrating two inference modes of FunASR to provide flexible speech transcription capabilities.

Features

The service provides two main inference interfaces:

AutoModel Mode (/inference/funasr):
- Uses the funasr.AutoModel high-level interface.
- Integrates VAD (Voice Activity Detection).
- Supports Hotwords enhancement.
- Supports ITN (Inverse Text Normalization).
- Supports multi-language configuration.
Direct Model Mode (/inference/direct):
- Directly calls the underlying FunASRNano model.
- Supports standard full inference.
- Supports simulated streaming/chunk inference (Chunk Mode) for testing the model's incremental decoding capabilities.

Environment Setup

Dependency Installation

This project uses uv for dependency management. Please ensure uv is installed, then run the following command in the project root directory:

uv sync

Model Configuration

The default model path is configured as /models/Fun-ASR-Nano-2512. If your model is located elsewhere, please set the environment variable MODEL_DIR:

export MODEL_DIR="/your/absolute/path/to/model"

Start Service

You can start the service directly using the uv script (default port 5000):

uv run api.py

The service will automatically detect the computing device (CUDA > MPS > CPU) upon startup.

Docker Startup

If deploying with Docker, you can refer to the following command. You can specify a custom model path using -e MODEL_DIR:

docker run -d --restart always -p 5000:5000 --gpus "device=1" \
  -e MODEL_DIR="/models/Fun-ASR-Nano-2512" \
  --mount type=bind,source=/your/path/model/Fun-ASR-Nano-2512,target=/models/Fun-ASR-Nano-2512 \
  harbor.bwgdi.com/library/fun-asr:0.0.1

API Documentation

1. FunASR Standard Inference Interface

URL: /inference/funasr
Method: POST
Content-Type: multipart/form-data

Parameter Name	Type	Required	Default	Description
`file`	File	Yes	-	Audio file
`language`	String	No	"中文"	Target language
`itn`	String	No	"true"	Whether to enable Inverse Text Normalization (true/false)
`hotwords`	String	No	""	List of hotwords to improve recognition rate of specific vocabulary

Example:

curl -X POST "http://127.0.0.1:5000/inference/funasr" \
  -F "file=@/path/to/audio.wav" \
  -F "hotwords=开放时间"

2. Direct Underlying Inference Interface

URL: /inference/direct
Method: POST
Content-Type: multipart/form-data

Parameter Name	Type	Required	Default	Description
`file`	File	Yes	-	Audio file
`chunk_mode`	Boolean	No	False	Whether to enable chunk simulation mode (true/false)