hubert-large-ll60k-librispeech-clean-100h-demo-dist
This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:
- Cer: 0.0316
- Loss: 0.2143
- Wer: 0.0995
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 32
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 256
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 50.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Cer | Validation Loss | Wer |
---|---|---|---|---|---|
2.911 | 0.89 | 100 | 1.0 | 2.9202 | 1.0 |
2.6638 | 1.79 | 200 | 1.0 | 2.6310 | 1.0 |
0.3898 | 2.68 | 300 | 0.0968 | 0.3892 | 0.3366 |
0.2156 | 3.57 | 400 | 0.0591 | 0.2250 | 0.2090 |
0.1517 | 4.46 | 500 | 0.0474 | 0.1834 | 0.1695 |
0.1059 | 5.36 | 600 | 0.0428 | 0.1668 | 0.1502 |
0.0825 | 6.25 | 700 | 0.0393 | 0.1662 | 0.1406 |
0.0679 | 7.14 | 800 | 0.0393 | 0.1747 | 0.1357 |
0.0602 | 8.04 | 900 | 0.0390 | 0.1767 | 0.1334 |
0.0587 | 8.93 | 1000 | 0.0376 | 0.1708 | 0.1292 |
0.0517 | 9.82 | 1100 | 0.0372 | 0.1677 | 0.1255 |
0.0413 | 10.71 | 1200 | 0.0361 | 0.1771 | 0.1234 |
0.0418 | 11.61 | 1300 | 0.0358 | 0.1731 | 0.1229 |
0.0424 | 12.5 | 1400 | 0.0348 | 0.1796 | 0.1191 |
0.0469 | 13.39 | 1500 | 0.0358 | 0.1848 | 0.1207 |
0.0414 | 14.29 | 1600 | 0.0367 | 0.1863 | 0.1213 |
0.0338 | 15.18 | 1700 | 0.0347 | 0.1889 | 0.1177 |
0.0334 | 16.07 | 1800 | 0.0360 | 0.1900 | 0.1188 |
0.0315 | 16.96 | 1900 | 0.0346 | 0.1901 | 0.1158 |
0.0317 | 17.86 | 2000 | 0.0341 | 0.1790 | 0.1134 |
0.0264 | 18.75 | 2100 | 0.0356 | 0.1864 | 0.1159 |
0.0271 | 19.64 | 2200 | 0.0341 | 0.1861 | 0.1150 |
0.0272 | 20.54 | 2300 | 0.0339 | 0.1945 | 0.1129 |
0.0278 | 21.43 | 2400 | 0.0343 | 0.1950 | 0.1131 |
0.0254 | 22.32 | 2500 | 0.0330 | 0.2015 | 0.1097 |
0.0204 | 23.21 | 2600 | 0.0326 | 0.1952 | 0.1069 |
0.0259 | 24.11 | 2700 | 0.0330 | 0.1976 | 0.1103 |
0.0325 | 25.0 | 2800 | 0.0328 | 0.1958 | 0.1088 |
0.0359 | 25.89 | 2900 | 0.0346 | 0.1908 | 0.1105 |
0.0265 | 26.79 | 3000 | 0.0337 | 0.1991 | 0.1096 |
0.0223 | 27.68 | 3100 | 0.0345 | 0.1948 | 0.1107 |
0.025 | 28.57 | 3200 | 0.0330 | 0.2046 | 0.1077 |
0.0242 | 29.46 | 3300 | 0.0335 | 0.2055 | 0.1072 |
0.0187 | 30.36 | 3400 | 0.0307 | 0.1980 | 0.1021 |
0.0219 | 31.25 | 3500 | 0.0322 | 0.1998 | 0.1054 |
0.0198 | 32.14 | 3600 | 0.0322 | 0.2104 | 0.1048 |
0.0181 | 33.04 | 3700 | 0.0325 | 0.2093 | 0.1050 |
0.0166 | 33.93 | 3800 | 0.0315 | 0.2120 | 0.1032 |
0.0212 | 34.82 | 3900 | 0.0300 | 0.2021 | 0.1003 |
0.0214 | 35.71 | 4000 | 0.0316 | 0.2045 | 0.1033 |
0.016 | 36.61 | 4100 | 0.0302 | 0.2022 | 0.1000 |
0.0169 | 37.5 | 4200 | 0.0299 | 0.2060 | 0.0996 |
0.0191 | 38.39 | 4300 | 0.0307 | 0.2114 | 0.1006 |
0.0218 | 39.29 | 4400 | 0.0314 | 0.2066 | 0.1015 |
0.0182 | 40.18 | 4500 | 0.0300 | 0.2054 | 0.0988 |
0.0185 | 41.07 | 4600 | 0.0303 | 0.2050 | 0.0994 |
0.0171 | 41.96 | 4700 | 0.0306 | 0.2136 | 0.0994 |
0.0171 | 42.86 | 4800 | 0.0318 | 0.2062 | 0.1007 |
0.0161 | 43.75 | 4900 | 0.0319 | 0.2101 | 0.1013 |
0.0168 | 44.64 | 5000 | 0.0306 | 0.2111 | 0.0985 |
0.015 | 45.54 | 5100 | 0.0318 | 0.2110 | 0.1003 |
0.0126 | 46.43 | 5200 | 0.0319 | 0.2086 | 0.0999 |
0.0153 | 47.32 | 5300 | 0.0310 | 0.2095 | 0.0981 |
0.0172 | 48.21 | 5400 | 0.0310 | 0.2130 | 0.0985 |
0.017 | 49.11 | 5500 | 0.0316 | 0.2137 | 0.0994 |
0.0152 | 50.0 | 5600 | 0.0316 | 0.2143 | 0.0995 |
Framework versions
- Transformers 4.39.0.dev0
- Pytorch 2.0.1+cu117
- Datasets 2.8.0
- Tokenizers 0.15.2
- Downloads last month
- 2
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.