File size: 1,211 Bytes
3f56d20
15de6e8
 
b9c0e02
c58e1c8
 
764b020
 
 
3f56d20
 
 
15de6e8
3f56d20
764b020
3f56d20
15de6e8
3f56d20
c58e1c8
3f56d20
15de6e8
3f56d20
c58e1c8
3f56d20
15de6e8
3f56d20
c58e1c8
3f56d20
15de6e8
3f56d20
15de6e8
3f56d20
15de6e8
b9c0e02
15de6e8
 
 
 
 
 
 
 
b9c0e02
15de6e8
3f56d20
15de6e8
3f56d20
15de6e8
 
 
c58e1c8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
model-index:
- name: mHuBERT-147-br
  results: []
language:
- br
metrics:
- wer
pipeline_tag: automatic-speech-recognition
---


# mHuBERT-147-br

This model was fine-tuned on Mozilla Common Voice 15 Breton dataset and [Roadennoù](https://github.com/gweltou/roadennou) dataset.

## Model description

This model was trained to assess the performance mHubert-147 for finetuning a Breton ASR model. 

## Intended uses & limitations

This model is a research model and shouldn't be used in production.

## Training and evaluation data

90% of the Roadennoù dataset was used for training, the remaining 10% was used for validation in addition to MCV15-br validation dataset.

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 3.8e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 52
- mixed_precision_training: Native AMP

### Framework versions

- Transformers 4.39.1
- Pytorch 2.0.1+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2