jonatasgrosman
commited on
Commit
•
5f50cff
1
Parent(s):
466e44d
update README + add evaluation
Browse files
README.md
CHANGED
@@ -9,6 +9,7 @@ datasets:
|
|
9 |
- mozilla-foundation/common_voice_11_0
|
10 |
metrics:
|
11 |
- wer
|
|
|
12 |
model-index:
|
13 |
- name: Whisper Large Chinese (Mandarin)
|
14 |
results:
|
@@ -19,76 +20,81 @@ model-index:
|
|
19 |
name: mozilla-foundation/common_voice_11_0 zh-CN
|
20 |
type: mozilla-foundation/common_voice_11_0
|
21 |
config: zh-CN
|
22 |
-
split:
|
23 |
args: zh-CN
|
24 |
metrics:
|
25 |
-
- name:
|
26 |
type: wer
|
27 |
-
value:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
---
|
29 |
|
30 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
31 |
-
should probably proofread and complete it, then remove this comment. -->
|
32 |
-
|
33 |
# Whisper Large Chinese (Mandarin)
|
34 |
|
35 |
-
This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on the mozilla-foundation/common_voice_11_0
|
36 |
-
It achieves the following results on the evaluation set:
|
37 |
-
- Loss: 0.2435
|
38 |
-
- Wer: 51.6742
|
39 |
-
- Cer: 8.5279
|
40 |
-
|
41 |
-
## Model description
|
42 |
|
43 |
-
|
44 |
|
45 |
-
|
46 |
|
47 |
-
|
48 |
|
49 |
-
|
|
|
|
|
|
|
50 |
|
51 |
-
|
|
|
|
|
|
|
|
|
|
|
52 |
|
53 |
-
|
54 |
|
55 |
-
|
56 |
|
57 |
-
|
58 |
-
- learning_rate: 5e-06
|
59 |
-
- train_batch_size: 16
|
60 |
-
- eval_batch_size: 8
|
61 |
-
- seed: 42
|
62 |
-
- gradient_accumulation_steps: 2
|
63 |
-
- total_train_batch_size: 32
|
64 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
65 |
-
- lr_scheduler_type: linear
|
66 |
-
- lr_scheduler_warmup_steps: 2000
|
67 |
-
- training_steps: 20000
|
68 |
-
- mixed_precision_training: Native AMP
|
69 |
|
70 |
-
|
71 |
|
72 |
-
|
73 |
-
|:-------------:|:-----:|:-----:|:---------------:|:-------:|:-------:|
|
74 |
-
| 0.3314 | 0.83 | 1000 | 0.2110 | 65.7014 | 10.8047 |
|
75 |
-
| 0.2747 | 1.66 | 2000 | 0.2005 | 58.1900 | 9.4191 |
|
76 |
-
| 0.1989 | 2.49 | 3000 | 0.1983 | 56.1991 | 9.0939 |
|
77 |
-
| 0.1142 | 3.31 | 4000 | 0.2076 | 55.0226 | 9.1589 |
|
78 |
-
| 0.0747 | 4.14 | 5000 | 0.2131 | 56.3801 | 9.0483 |
|
79 |
-
| 0.0709 | 4.97 | 6000 | 0.2165 | 54.6606 | 8.9768 |
|
80 |
-
| 0.0432 | 5.8 | 7000 | 0.2222 | 54.0271 | 8.9508 |
|
81 |
-
| 0.0261 | 6.63 | 8000 | 0.2299 | 54.4796 | 9.0353 |
|
82 |
-
| 0.0152 | 7.46 | 9000 | 0.2290 | 52.7602 | 8.8076 |
|
83 |
-
| 0.0054 | 8.28 | 10000 | 0.2435 | 51.6742 | 8.5279 |
|
84 |
-
| 0.0028 | 9.11 | 11000 | 0.2421 | 53.0317 | 8.9833 |
|
85 |
-
| 0.0045 | 9.94 | 12000 | 0.2462 | 52.9412 | 8.7751 |
|
86 |
-
| 0.0016 | 10.77 | 13000 | 0.2501 | 52.3077 | 8.9573 |
|
87 |
|
|
|
|
|
|
|
|
|
|
|
|
|
88 |
|
89 |
-
###
|
90 |
|
91 |
-
|
92 |
-
|
93 |
-
-
|
94 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
- mozilla-foundation/common_voice_11_0
|
10 |
metrics:
|
11 |
- wer
|
12 |
+
- cer
|
13 |
model-index:
|
14 |
- name: Whisper Large Chinese (Mandarin)
|
15 |
results:
|
|
|
20 |
name: mozilla-foundation/common_voice_11_0 zh-CN
|
21 |
type: mozilla-foundation/common_voice_11_0
|
22 |
config: zh-CN
|
23 |
+
split: test
|
24 |
args: zh-CN
|
25 |
metrics:
|
26 |
+
- name: WER
|
27 |
type: wer
|
28 |
+
value: 55.02141421204441
|
29 |
+
- name: CER
|
30 |
+
type: cer
|
31 |
+
value: 9.550758567294045
|
32 |
+
- task:
|
33 |
+
name: Automatic Speech Recognition
|
34 |
+
type: automatic-speech-recognition
|
35 |
+
dataset:
|
36 |
+
name: google/fleurs cmn_hans_cn
|
37 |
+
type: google/fleurs
|
38 |
+
config: cmn_hans_cn
|
39 |
+
split: test
|
40 |
+
args: cmn_hans_cn
|
41 |
+
metrics:
|
42 |
+
- name: WER
|
43 |
+
type: wer
|
44 |
+
value: 70.62596203181118
|
45 |
+
- name: CER
|
46 |
+
type: cer
|
47 |
+
value: 11.761282471826888
|
48 |
---
|
49 |
|
|
|
|
|
|
|
50 |
# Whisper Large Chinese (Mandarin)
|
51 |
|
52 |
+
This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) on Chinese (Mandarin) using the train and validation splits of [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0). Not all validation split data were used during training, I extracted 1k samples from the validation split to be used for evaluation during fine-tuning. When using this model, make sure that your speech input is sampled at 16kHz.
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
|
54 |
+
## Usage
|
55 |
|
56 |
+
```python
|
57 |
|
58 |
+
from transformers import pipeline
|
59 |
|
60 |
+
transcriber = pipeline(
|
61 |
+
"automatic-speech-recognition",
|
62 |
+
model="jonatasgrosman/whisper-large-zh-cv11"
|
63 |
+
)
|
64 |
|
65 |
+
transcriber.model.config.forced_decoder_ids = (
|
66 |
+
transcriber.tokenizer.get_decoder_prompt_ids(
|
67 |
+
language="zh"
|
68 |
+
task="transcribe"
|
69 |
+
)
|
70 |
+
)
|
71 |
|
72 |
+
transcription = transcriber("path/to/my_audio.wav")
|
73 |
|
74 |
+
```
|
75 |
|
76 |
+
## Evaluation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
|
78 |
+
We perform evaluation of the model using the test split of two datasets, the [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) (same dataset used for the fine-tuning) and the [Fleurs](https://huggingface.co/datasets/google/fleurs) (dataset not seen during the fine-tuning). As Whisper can transcribe casing and punctuation, I performed the model evaluation in 2 different scenarios, one using the raw text and the other using the normalized text (lowercase + removal of punctuations). Additionally, for the Fleurs dataset, I evaluated the model in a scenario where there are no transcriptions of numerical values since the way these values are described in this dataset is different from how they are described in the dataset used in fine-tuning (Common Voice), so it is expected that this difference in the way of describing numerical values will affect the performance of the model for this type of transcription in Fleurs.
|
79 |
|
80 |
+
### Common Voice 11
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
81 |
|
82 |
+
| | CER | WER |
|
83 |
+
| --- | --- | --- |
|
84 |
+
| [jonatasgrosman/whisper-large-zh-cv11](https://huggingface.co/jonatasgrosman/whisper-large-zh-cv11) | 9.31 | 55.94 |
|
85 |
+
| [jonatasgrosman/whisper-large-zh-cv11](https://huggingface.co/jonatasgrosman/whisper-large-zh-cv11) + text normalization | 9.55 | 55.02 |
|
86 |
+
| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 33.33 | 101.80 |
|
87 |
+
| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization | 29.90 | 95.91 |
|
88 |
|
89 |
+
### Fleurs
|
90 |
|
91 |
+
| | CER | WER |
|
92 |
+
| --- | --- | --- |
|
93 |
+
| [jonatasgrosman/whisper-large-zh-cv11](https://huggingface.co/jonatasgrosman/whisper-large-zh-cv11) | 15.00 | 93.45 |
|
94 |
+
| [jonatasgrosman/whisper-large-zh-cv11](https://huggingface.co/jonatasgrosman/whisper-large-zh-cv11) + text normalization | 11.76 | 70.63 |
|
95 |
+
| [jonatasgrosman/whisper-large-zh-cv11](https://huggingface.co/jonatasgrosman/whisper-large-zh-cv11) + keep only non-numeric samples | 10.95 | 87.91 |
|
96 |
+
| [jonatasgrosman/whisper-large-zh-cv11](https://huggingface.co/jonatasgrosman/whisper-large-zh-cv11) + text normalization + keep only non-numeric samples | 7.83 | 62.12 |
|
97 |
+
| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) | 23.49 | 101.28 |
|
98 |
+
| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization | 17.58 | 83.22 |
|
99 |
+
| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + keep only non-numeric samples | 21.03 | 101.95 |
|
100 |
+
| [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) + text normalization + keep only non-numeric samples | 15.22 | 79.28 |
|
evaluation_cv11_test.json
CHANGED
@@ -2,6 +2,8 @@
|
|
2 |
"raw": {
|
3 |
"cer": 0.09311360578930278,
|
4 |
"wer": 0.5594405594405595,
|
|
|
|
|
5 |
"references": [
|
6 |
"否",
|
7 |
"宋朝末年年间定居粉岭围。",
|
@@ -21172,6 +21174,8 @@
|
|
21172 |
"normalized": {
|
21173 |
"cer": 0.09550758567294045,
|
21174 |
"wer": 0.5502141421204441,
|
|
|
|
|
21175 |
"references": [
|
21176 |
"否",
|
21177 |
"宋朝末年年间定居粉岭围",
|
|
|
2 |
"raw": {
|
3 |
"cer": 0.09311360578930278,
|
4 |
"wer": 0.5594405594405595,
|
5 |
+
"non_numeric_samples_cer": 0.09311360578930278,
|
6 |
+
"non_numeric_samples_wer": 0.5594405594405595,
|
7 |
"references": [
|
8 |
"否",
|
9 |
"宋朝末年年间定居粉岭围。",
|
|
|
21174 |
"normalized": {
|
21175 |
"cer": 0.09550758567294045,
|
21176 |
"wer": 0.5502141421204441,
|
21177 |
+
"non_numeric_samples_cer": 0.09550758567294045,
|
21178 |
+
"non_numeric_samples_wer": 0.5502141421204441,
|
21179 |
"references": [
|
21180 |
"否",
|
21181 |
"宋朝末年年间定居粉岭围",
|
evaluation_fleurs_test.json
CHANGED
@@ -2,6 +2,8 @@
|
|
2 |
"raw": {
|
3 |
"cer": 0.1500187149095446,
|
4 |
"wer": 0.9344808439755691,
|
|
|
|
|
5 |
"references": [
|
6 |
"1940 年 8 月 15 日,盟军攻入法国南部,这次进攻被称为“龙骑兵行动”。",
|
7 |
"该群岛位于南极半岛以北 120 公里处。最大的岛屿是乔治国王岛,这里是“繁星村(Villa Las Estrellas)”的定居点。",
|
@@ -1900,6 +1902,8 @@
|
|
1900 |
"normalized": {
|
1901 |
"cer": 0.11761282471826888,
|
1902 |
"wer": 0.7062596203181118,
|
|
|
|
|
1903 |
"references": [
|
1904 |
"1940 年 8 月 15 日 盟军攻入法国南部 这次进攻被称为 龙骑兵行动",
|
1905 |
"该群岛位于南极半岛以北 120 公里处 最大的岛屿是乔治国王岛 这里是 繁星村 villa las estrellas 的定居点",
|
|
|
2 |
"raw": {
|
3 |
"cer": 0.1500187149095446,
|
4 |
"wer": 0.9344808439755691,
|
5 |
+
"non_numeric_samples_cer": 0.10947220549869556,
|
6 |
+
"non_numeric_samples_wer": 0.8790983606557377,
|
7 |
"references": [
|
8 |
"1940 年 8 月 15 日,盟军攻入法国南部,这次进攻被称为“龙骑兵行动”。",
|
9 |
"该群岛位于南极半岛以北 120 公里处。最大的岛屿是乔治国王岛,这里是“繁星村(Villa Las Estrellas)”的定居点。",
|
|
|
1902 |
"normalized": {
|
1903 |
"cer": 0.11761282471826888,
|
1904 |
"wer": 0.7062596203181118,
|
1905 |
+
"non_numeric_samples_cer": 0.07828692280578076,
|
1906 |
+
"non_numeric_samples_wer": 0.6211941478845393,
|
1907 |
"references": [
|
1908 |
"1940 年 8 月 15 日 盟军攻入法国南部 这次进攻被称为 龙骑兵行动",
|
1909 |
"该群岛位于南极半岛以北 120 公里处 最大的岛屿是乔治国王岛 这里是 繁星村 villa las estrellas 的定居点",
|
evaluation_whisper-large-v2_cv11_test.json
CHANGED
@@ -2,6 +2,8 @@
|
|
2 |
"raw": {
|
3 |
"cer": 0.33334879621468666,
|
4 |
"wer": 1.0180495180495182,
|
|
|
|
|
5 |
"references": [
|
6 |
"否",
|
7 |
"宋朝末年年间定居粉岭围。",
|
@@ -21172,6 +21174,8 @@
|
|
21172 |
"normalized": {
|
21173 |
"cer": 0.299039578225524,
|
21174 |
"wer": 0.9590944847478368,
|
|
|
|
|
21175 |
"references": [
|
21176 |
"否",
|
21177 |
"宋朝末年年间定居粉岭围",
|
|
|
2 |
"raw": {
|
3 |
"cer": 0.33334879621468666,
|
4 |
"wer": 1.0180495180495182,
|
5 |
+
"non_numeric_samples_cer": 0.33334879621468666,
|
6 |
+
"non_numeric_samples_wer": 1.0180495180495182,
|
7 |
"references": [
|
8 |
"否",
|
9 |
"宋朝末年年间定居粉岭围。",
|
|
|
21174 |
"normalized": {
|
21175 |
"cer": 0.299039578225524,
|
21176 |
"wer": 0.9590944847478368,
|
21177 |
+
"non_numeric_samples_cer": 0.299039578225524,
|
21178 |
+
"non_numeric_samples_wer": 0.9590944847478368,
|
21179 |
"references": [
|
21180 |
"否",
|
21181 |
"宋朝末年年间定居粉岭围",
|
evaluation_whisper-large-v2_fleurs_test.json
CHANGED
@@ -2,6 +2,8 @@
|
|
2 |
"raw": {
|
3 |
"cer": 0.23488459139114162,
|
4 |
"wer": 1.0127706829539145,
|
|
|
|
|
5 |
"references": [
|
6 |
"1940 年 8 月 15 日,盟军攻入法国南部,这次进攻被称为“龙骑兵行动”。",
|
7 |
"该群岛位于南极半岛以北 120 公里处。最大的岛屿是乔治国王岛,这里是“繁星村(Villa Las Estrellas)”的定居点。",
|
@@ -1900,6 +1902,8 @@
|
|
1900 |
"normalized": {
|
1901 |
"cer": 0.17581080366118196,
|
1902 |
"wer": 0.8322216521292971,
|
|
|
|
|
1903 |
"references": [
|
1904 |
"1940 年 8 月 15 日 盟军攻入法国南部 这次进攻被称为 龙骑兵行动",
|
1905 |
"该群岛位于南极半岛以北 120 公里处 最大的岛屿是乔治国王岛 这里是 繁星村 villa las estrellas 的定居点",
|
|
|
2 |
"raw": {
|
3 |
"cer": 0.23488459139114162,
|
4 |
"wer": 1.0127706829539145,
|
5 |
+
"non_numeric_samples_cer": 0.21031507124222357,
|
6 |
+
"non_numeric_samples_wer": 1.0194672131147542,
|
7 |
"references": [
|
8 |
"1940 年 8 月 15 日,盟军攻入法国南部,这次进攻被称为“龙骑兵行动”。",
|
9 |
"该群岛位于南极半岛以北 120 公里处。最大的岛屿是乔治国王岛,这里是“繁星村(Villa Las Estrellas)”的定居点。",
|
|
|
1902 |
"normalized": {
|
1903 |
"cer": 0.17581080366118196,
|
1904 |
"wer": 0.8322216521292971,
|
1905 |
+
"non_numeric_samples_cer": 0.15216778286922805,
|
1906 |
+
"non_numeric_samples_wer": 0.7928034796362199,
|
1907 |
"references": [
|
1908 |
"1940 年 8 月 15 日 盟军攻入法国南部 这次进攻被称为 龙骑兵行动",
|
1909 |
"该群岛位于南极半岛以北 120 公里处 最大的岛屿是乔治国王岛 这里是 繁星村 villa las estrellas 的定居点",
|