Update README_zh.md
This commit is contained in:
34
README_zh.md
34
README_zh.md
@ -140,6 +140,40 @@ if __name__ == "__main__":
|
|||||||
|
|
||||||
我们在开源基准数据集、中文方言测试集和工业测试集上,比较了 Fun-ASR 与其他模型的多语言语音识别性能。Fun-ASR 模型均具有明显的效果优势。
|
我们在开源基准数据集、中文方言测试集和工业测试集上,比较了 Fun-ASR 与其他模型的多语言语音识别性能。Fun-ASR 模型均具有明显的效果优势。
|
||||||
|
|
||||||
|
### 1. 开源数据集性能 (WER %)
|
||||||
|
|
||||||
|
| Test set | GLM-ASR-nano | GLM-ASR-nano* | Whisper-large-v3 | Seed-ASR | Seed-ASR* | Kimi-Audio | Step-Audio2 | FireRed-ASR | Fun-ASR-nano | Fun-ASR |
|
||||||
|
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
||||||
|
| **Model Size** | 1.5B | 1.5B | 1.6B | - | - | - | - | 1.1B | 0.8B | 7.7B |
|
||||||
|
| **OpenSource** | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||||
|
| AIShell1 | 1.81 | 2.17 | 4.72 | 0.68 | 1.63 | 0.71 | 0.63 | 0.54 | 1.80 | 1.22 |
|
||||||
|
| AIShell2 | - | 3.47 | 4.68 | 2.27 | 2.76 | 2.86 | 2.10 | 2.58 | 2.75 | 2.39 |
|
||||||
|
| Fleurs-zh | - | 3.65 | 5.18 | 3.43 | 3.23 | 3.11 | 2.68 | 4.81 | 2.56 | 2.53 |
|
||||||
|
| Fleurs-en | 5.78 | 6.95 | 6.23 | 9.39 | 9.39 | 6.99 | 3.03 | 10.79 | 5.96 | 4.74 |
|
||||||
|
| Librispeech-clean | 2.00 | 2.17 | 1.86 | 1.58 | 2.8 | 1.32 | 1.17 | 1.84 | 1.76 | 1.51 |
|
||||||
|
| Librispeech-other | 4.19 | 4.43 | 3.43 | 2.84 | 5.69 | 2.63 | 2.42 | 4.52 | 4.33 | 3.03 |
|
||||||
|
| WenetSpeech Meeting | 6.73 | 8.21 | 18.39 | 5.69 | 7.07 | 6.24 | 4.75 | 4.95 | 6.60 | 6.17 |
|
||||||
|
| WenetSpeech Net | - | 6.33 | 11.89 | 4.66 | 4.84 | 6.45 | 4.67 | 4.94 | 6.01 | 5.46 |
|
||||||
|
|
||||||
|
> *注:Seed-ASR\* 结果使用 volcengine 上的官方 API 评估;GLM-ASR-nano\* 结果使用开源 checkpoint 评估。*
|
||||||
|
|
||||||
|
### 2. 工业数据集性能 (WER %)
|
||||||
|
|
||||||
|
| Test set | GLM-ASR-Nano | Whisper-large-v3 | Seed-ASR | FireRed-ASR | Kimi-Audio | Paraformer v2 | Fun-ASR-nano | Fun-ASR |
|
||||||
|
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
||||||
|
| **Model Size** | 1.5B | 1.6B | - | 1.1B | 8B | 0.2B | 0.8B | 7.7B |
|
||||||
|
| **OpenSource** | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||||
|
| Nearfield | 16.95 | 16.58 | 7.20 | 10.10 | 9.02 | 8.11 | 7.79 | 6.31 |
|
||||||
|
| Farfield | 9.44 | 22.21 | 4.59 | 7.49 | 10.95 | 9.55 | 5.79 | 4.34 |
|
||||||
|
| Complex Background | 23.79 | 32.57 | 12.90 | 15.56 | 15.56 | 15.19 | 14.59 | 11.45 |
|
||||||
|
| English General | 16.47 | 18.56 | 15.65 | 21.62 | 18.12 | 19.48 | 15.28 | 13.73 |
|
||||||
|
| Opensource | 4.67 | 7.05 | 3.83 | 5.31 | 3.79 | 6.23 | 4.22 | 3.38 |
|
||||||
|
| Dialect | 54.21 | 66.14 | 29.45 | 52.82 | 71.94 | 41.16 | 28.18 | 15.21 |
|
||||||
|
| Accent | 19.78 | 36.03 | 10.23 | 14.05 | 27.20 | 17.80 | 12.90 | 10.31 |
|
||||||
|
| Lyrics | 46.56 | 54.82 | 30.26 | 42.87 | 65.18 | 50.14 | 30.85 | 21.00 |
|
||||||
|
| Hiphop | 43.32 | 46.56 | 29.46 | 33.88 | 57.25 | 43.79 | 30.87 | 28.58 |
|
||||||
|
| **Average** | **26.13** | **33.39** | **15.95** | **22.63** | **31.00** | **23.49** | **16.72** | **12.70** |
|
||||||
|
|
||||||
<div align="center">
|
<div align="center">
|
||||||
<img src="images/compare_zh.png" width="800" />
|
<img src="images/compare_zh.png" width="800" />
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
Reference in New Issue
Block a user