---
language: vi
datasets:
- vivos
- common_voice
metrics:
- wer
pipeline_tag: automatic-speech-recognition
tags:
- audio
- speech
- speechbrain
- Transformer
license: cc-by-nc-4.0
widget:
- example_title: Example 1
src: https://huggingface.co/dragonSwing/wav2vec2-base-vn-270h/raw/main/example.mp3
- example_title: Example 2
src: https://huggingface.co/dragonSwing/wav2vec2-base-vn-270h/raw/main/example2.mp3
model-index:
- name: Wav2vec2 Base Vietnamese 270h
results:
- task:
name: Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice vi
type: common_voice
args: vi
metrics:
- name: Test WER
type: wer
value: 9.66
- task:
name: Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice 7.0
type: mozilla-foundation/common_voice_7_0
args: vi
metrics:
- name: Test WER
type: wer
value: 5.57
- task:
name: Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice 8.0
type: mozilla-foundation/common_voice_8_0
args: vi
metrics:
- name: Test WER
type: wer
value: 5.76
- task:
name: Speech Recognition
type: automatic-speech-recognition
dataset:
name: VIVOS
type: vivos
args: vi
metrics:
- name: Test WER
type: wer
value: 3.70
---
# FINETUNE WAV2VEC 2.0 FOR SPEECH RECOGNITION
## Table of contents
1. [Documentation](#documentation)
2. [Installation](#installation)
3. [Usage](#usage)
4. [Logs and Visualization](#logs)
## Documentation
Suppose you need a simple way to fine-tune the Wav2vec 2.0 model for the task of Speech Recognition on your datasets, then you came to the right place.
All documents related to this repo can be found here:
- [Wav2vec2ForCTC](https://huggingface.co/docs/transformers/model_doc/wav2vec2#transformers.Wav2Vec2ForCTC)
- [Tutorial](https://huggingface.co/blog/fine-tune-wav2vec2-english)
- [Code reference](https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py)
## Installation
```
pip install -r requirements.txt
```
## Usage
1. Prepare your dataset
- Your dataset can be in .txt or .csv format.
- path and transcript columns are compulsory. The path column contains the paths to your stored audio files, depending on your dataset location, it can be either absolute paths or relative paths. The transcript column contains the corresponding transcripts to the audio paths.
- Check out our [data_example.csv](dataset/data_example.csv) file for more information.
2. Configure the config.toml file
3. Run
- Start training:
```
python train.py -c config.toml
```
- Continue to train from resume:
```
python train.py -c config.toml -r
```
- Load specific model and start training:
```
python train.py -c config.toml -p path/to/your/model.tar
```
## Logs and Visualization
The logs during the training will be stored, and you can visualize it using TensorBoard by running this command:
```
# specify the in config.json
tensorboard --logdir ~/saved/
# specify a port 8080
tensorboard --logdir ~/saved/ --port 8080
```