Files
Fun-ASR/readme_bw.md
vera 7879751126
Some checks failed
Build container / build-docker (push) Failing after 28s
feat: api
2026-02-10 17:56:37 +08:00

4.2 KiB

FunASR Dual-Mode API

This is a speech recognition (ASR) service built on FastAPI, integrating two inference modes of FunASR to provide flexible speech transcription capabilities.

Features

The service provides two main inference interfaces:

  1. AutoModel Mode (/inference/funasr):

    • Uses the funasr.AutoModel high-level interface.
    • Integrates VAD (Voice Activity Detection).
    • Supports Hotwords enhancement.
    • Supports ITN (Inverse Text Normalization).
    • Supports multi-language configuration.
  2. Direct Model Mode (/inference/direct):

    • Directly calls the underlying FunASRNano model.
    • Supports standard full inference.
    • Supports simulated streaming/chunk inference (Chunk Mode) for testing the model's incremental decoding capabilities.

Environment Setup

Dependency Installation

This project uses uv for dependency management. Please ensure uv is installed, then run the following command in the project root directory:

uv sync

Model Configuration

The default model path is configured as /models/Fun-ASR-Nano-2512. If your model is located elsewhere, please set the environment variable MODEL_DIR:

export MODEL_DIR="/your/absolute/path/to/model"

Start Service

You can start the service directly using the uv script (default port 5000):

uv run api.py

The service will automatically detect the computing device (CUDA > MPS > CPU) upon startup.

Docker Startup

If deploying with Docker, you can refer to the following command. You can specify a custom model path using -e MODEL_DIR:

docker run -d --restart always -p 5000:5000 --gpus "device=1" \
  -e MODEL_DIR="/models/Fun-ASR-Nano-2512" \
  --mount type=bind,source=/your/path/model/Fun-ASR-Nano-2512,target=/models/Fun-ASR-Nano-2512 \
  harbor.bwgdi.com/library/fun-asr:0.0.1

API Documentation

1. FunASR Standard Inference Interface

  • URL: /inference/funasr
  • Method: POST
  • Content-Type: multipart/form-data
Parameter Name Type Required Default Description
file File Yes - Audio file
language String No "中文" Target language
itn String No "true" Whether to enable Inverse Text Normalization (true/false)
hotwords String No "" List of hotwords to improve recognition rate of specific vocabulary

Example:

curl -X POST "http://127.0.0.1:5000/inference/funasr" \
  -F "file=@/path/to/audio.wav" \
  -F "hotwords=开放时间"

2. Direct Underlying Inference Interface

  • URL: /inference/direct
  • Method: POST
  • Content-Type: multipart/form-data
Parameter Name Type Required Default Description
file File Yes - Audio file
chunk_mode Boolean No False Whether to enable chunk simulation mode (true/false)

Example:

# Enable chunk simulation mode
curl -X POST "http://127.0.0.1:5000/inference/direct" \
  -F "file=@/path/to/audio.wav" \
  -F "chunk_mode=true"

Response:

{
    "status": "success",
    "mode": "direct",
    "text": {
        "key": "rand_key_WgNZq6ITZM5jt",
        "text": "你好。",
        "text_tn": "你好",
        "label": "null",
        "ctc_text": "你好",
        "ctc_timestamps": [
            {
                "token": "你",
                "start_time": 1.8,
                "end_time": 1.86,
                "score": 0.908
            },
            {
                "token": "好",
                "start_time": 2.16,
                "end_time": 2.22,
                "score": 0.988
            }
        ],
        "timestamps": [
            {
                "token": "你",
                "start_time": 1.8,
                "end_time": 1.86,
                "score": 0.908
            },
            {
                "token": "好",
                "start_time": 2.16,
                "end_time": 2.22,
                "score": 0.988
            },
            {
                "token": "。",
                "start_time": 2.88,
                "end_time": 2.94,
                "score": 0.0
            }
        ]
    }
}