add finetune docs

This commit is contained in:
pengzhendong
2025-12-29 16:44:09 +08:00
parent 9371f30e65
commit 0882fd55e7
4 changed files with 124 additions and 4 deletions

View File

@ -147,6 +147,10 @@ if __name__ == "__main__":
</details> </details>
# Finetune
Please refer to [docs/finetune.md](docs/finetune.md)
# Performance 📝 # Performance 📝
We evaluated Fun-ASR against other state-of-the-art models on open-source benchmarks, Chinese dialect datasets, and industry-specific test sets. The results demonstrate that Fun-ASR achieves superior performance across various scenarios. We evaluated Fun-ASR against other state-of-the-art models on open-source benchmarks, Chinese dialect datasets, and industry-specific test sets. The results demonstrate that Fun-ASR achieves superior performance across various scenarios.
@ -193,12 +197,12 @@ We evaluated Fun-ASR against other state-of-the-art models on open-source benchm
```bibtex ```bibtex
@misc{an2025funasrtechnicalreport, @misc{an2025funasrtechnicalreport,
title={Fun-ASR Technical Report}, title={Fun-ASR Technical Report},
author={Keyu An and Yanni Chen and Zhigao Chen and Chong Deng and Zhihao Du and Changfeng Gao and Zhifu Gao and Bo Gong and Xiangang Li and Yabin Li and Ying Liu and Xiang Lv and Yunjie Ji and Yiheng Jiang and Bin Ma and Haoneng Luo and Chongjia Ni and Zexu Pan and Yiping Peng and Zhendong Peng and Peiyao Wang and Hao Wang and Haoxu Wang and Wen Wang and Wupeng Wang and Yuzhong Wu and Biao Tian and Zhentao Tan and Nan Yang and Bin Yuan and Jieping Ye and Jixing Yu and Qinglin Zhang and Kun Zou and Han Zhao and Shengkui Zhao and Jingren Zhou and Yanqiao Zhu}, author={Keyu An and Yanni Chen and Zhigao Chen and Chong Deng and Zhihao Du and Changfeng Gao and Zhifu Gao and Bo Gong and Xiangang Li and Yabin Li and Ying Liu and Xiang Lv and Yunjie Ji and Yiheng Jiang and Bin Ma and Haoneng Luo and Chongjia Ni and Zexu Pan and Yiping Peng and Zhendong Peng and Peiyao Wang and Hao Wang and Haoxu Wang and Wen Wang and Wupeng Wang and Yuzhong Wu and Biao Tian and Zhentao Tan and Nan Yang and Bin Yuan and Jieping Ye and Jixing Yu and Qinglin Zhang and Kun Zou and Han Zhao and Shengkui Zhao and Jingren Zhou and Yanqiao Zhu},
year={2025}, year={2025},
eprint={2509.12508}, eprint={2509.12508},
archivePrefix={arXiv}, archivePrefix={arXiv},
primaryClass={cs.CL}, primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.12508}, url={https://arxiv.org/abs/2509.12508},
} }
``` ```

View File

@ -147,6 +147,10 @@ if __name__ == "__main__":
</details> </details>
# 微调
详情请参考 [docs/finetune_zh.md](docs/finetune.md)
# 性能评测 📝 # 性能评测 📝
我们在开源基准数据集、中文方言测试集和工业测试集上,比较了 Fun-ASR 与其他模型的多语言语音识别性能。Fun-ASR 模型均具有明显的效果优势。 我们在开源基准数据集、中文方言测试集和工业测试集上,比较了 Fun-ASR 与其他模型的多语言语音识别性能。Fun-ASR 模型均具有明显的效果优势。
@ -193,12 +197,12 @@ if __name__ == "__main__":
```bibtex ```bibtex
@misc{an2025funasrtechnicalreport, @misc{an2025funasrtechnicalreport,
title={Fun-ASR Technical Report}, title={Fun-ASR Technical Report},
author={Keyu An and Yanni Chen and Zhigao Chen and Chong Deng and Zhihao Du and Changfeng Gao and Zhifu Gao and Bo Gong and Xiangang Li and Yabin Li and Ying Liu and Xiang Lv and Yunjie Ji and Yiheng Jiang and Bin Ma and Haoneng Luo and Chongjia Ni and Zexu Pan and Yiping Peng and Zhendong Peng and Peiyao Wang and Hao Wang and Haoxu Wang and Wen Wang and Wupeng Wang and Yuzhong Wu and Biao Tian and Zhentao Tan and Nan Yang and Bin Yuan and Jieping Ye and Jixing Yu and Qinglin Zhang and Kun Zou and Han Zhao and Shengkui Zhao and Jingren Zhou and Yanqiao Zhu}, author={Keyu An and Yanni Chen and Zhigao Chen and Chong Deng and Zhihao Du and Changfeng Gao and Zhifu Gao and Bo Gong and Xiangang Li and Yabin Li and Ying Liu and Xiang Lv and Yunjie Ji and Yiheng Jiang and Bin Ma and Haoneng Luo and Chongjia Ni and Zexu Pan and Yiping Peng and Zhendong Peng and Peiyao Wang and Hao Wang and Haoxu Wang and Wen Wang and Wupeng Wang and Yuzhong Wu and Biao Tian and Zhentao Tan and Nan Yang and Bin Yuan and Jieping Ye and Jixing Yu and Qinglin Zhang and Kun Zou and Han Zhao and Shengkui Zhao and Jingren Zhou and Yanqiao Zhu},
year={2025}, year={2025},
eprint={2509.12508}, eprint={2509.12508},
archivePrefix={arXiv}, archivePrefix={arXiv},
primaryClass={cs.CL}, primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.12508}, url={https://arxiv.org/abs/2509.12508},
} }
``` ```

54
docs/finetune.md Normal file
View File

@ -0,0 +1,54 @@
# Finetune
## Requirements
```
pip install git+https://github.com/modelscope/FunASR
```
## Data Prepare
Data examples
```
{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "语音转写:<|startofspeech|>!https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav<|endofspeech|>"}, {"role": "assistant", "content": "甚至出现交易几乎停滞的情况"}], "speech_length": 418, "text_length": 6}
{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "语音转写:<|startofspeech|>!https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0916W0489.wav<|endofspeech|>"}, {"role": "assistant", "content": "湖北一公司以员工名义贷款数十员工负债千万"}], "speech_length": 572, "text_length": 11}
```
Full ref to `data/train_example.jsonl`
Description
- `messages[1]["content"]`: audio file with speech recognition prompt
- `messages[2]["content"]`: transcription
- `speech_length`: number of fbank frames of the audio file
- `text_length`: number of tokens of the transcription (tokenized by `Qwen3-0.6B`)
`train_text.txt`
```
BAC009S0764W0121 甚至出现交易几乎停滞的情况
BAC009S0916W0489 湖北一公司以员工名义贷款数十员工负债千万
```
`train_wav.scp`
```
BAC009S0764W0121 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav
BAC009S0916W0489 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0916W0489.wav
```
`Command`
```
python tools/scp2jsonl.py \
--scp-file /path/to/train_wav.scp \
--transcript-file /path/to/train_text.txt \
--jsonl-file data/train_example.jsonl
```
## Finetune
```
bash finetune.sh
```

58
docs/fintune_zh.md Normal file
View File

@ -0,0 +1,58 @@
# 微调
## 安装训练环境
```
pip install git+https://github.com/modelscope/FunASR
```
## 数据准备
数据格式需要包括如下几个字段:
```
{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "语音转写:<|startofspeech|>!https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav<|endofspeech|>"}, {"role": "assistant", "content": "甚至出现交易几乎停滞的情况"}], "speech_length": 418, "text_length": 6}
{"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "语音转写:<|startofspeech|>!https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0916W0489.wav<|endofspeech|>"}, {"role": "assistant", "content": "湖北一公司以员工名义贷款数十员工负债千万"}], "speech_length": 572, "text_length": 11}
```
详细可以参考:`data/train_example.jsonl`
数据准备细节介绍:
- `messages[1]["content"]`: 音频文件的路径 + 语音识别的 prompt
- `messages[2]["content"]`: 音频文件标注文本
- `speech_length`: 音频文件的 fbank 帧数
- `text_length`: 音频文件标注文本的 token 数 (用 `Qwen3-0.6B` 编码)
`train_text.txt`
左边为数据唯一 ID需与 `train_wav.scp` 中的 ID 一一对应 右边为音频文件标注文本,格式如下:
```
BAC009S0764W0121 甚至出现交易几乎停滞的情况
BAC009S0916W0489 湖北一公司以员工名义贷款数十员工负债千万
```
`train_wav.scp`
左边为数据唯一 ID需与 `train_text.txt` 中的 ID 一一对应 右边为音频文件的路径,格式如下
```
BAC009S0764W0121 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0764W0121.wav
BAC009S0916W0489 https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/BAC009S0916W0489.wav
```
`生成指令`
```
python tools/scp2jsonl.py \
--scp-file /path/to/train_wav.scp \
--transcript-file /path/to/train_text.txt \
--jsonl-file data/train_example.jsonl
```
## 启动训练
```
bash finetune.sh
```