erkhem-gantulga
/

whisper-small-mn

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

whisper-small-mn / README.md

Erkhembayar Gantulga

Updated README

6036d4e 24 days ago

|

history blame contribute delete

No virus

3.49 kB

	---
	language:
	- mn
	base_model: openai/whisper-small
	tags:
	- audio
	- automatic-speech-recognition
	library_name: transformers
	metrics:
	- wer
	model-index:
	- name: Whisper Small Mn - Erkhembayar Gantulga
	results: []
	datasets:
	- mozilla-foundation/common_voice_17_0
	- google/fleurs
	pipeline_tag: automatic-speech-recognition
	license: apache-2.0
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Whisper Small Mn - Erkhembayar Gantulga

	This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Common Voice 17.0 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1561
	- Wer: 19.4492

	## Training and evaluation data

	Datasets used for training:
	- [Common Voice 17.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0)
	- [Google Fleurs](https://huggingface.co/datasets/google/fleurs)

	For training, combined Common Voice 17.0 and Google Fleurs datasets:

	```
	from datasets import load_dataset, DatasetDict, concatenate_datasets
	from datasets import Audio

	common_voice = DatasetDict()

	common_voice["train"] = load_dataset("mozilla-foundation/common_voice_17_0", "mn", split="train+validation+validated", use_auth_token=True)
	common_voice["test"] = load_dataset("mozilla-foundation/common_voice_17_0", "mn", split="test", use_auth_token=True)

	common_voice = common_voice.cast_column("audio", Audio(sampling_rate=16000))

	common_voice = common_voice.remove_columns(
	["accent", "age", "client_id", "down_votes", "gender", "locale", "path", "segment", "up_votes", "variant"]
	)

	google_fleurs = DatasetDict()

	google_fleurs["train"] = load_dataset("google/fleurs", "mn_mn", split="train+validation", use_auth_token=True)
	google_fleurs["test"] = load_dataset("google/fleurs", "mn_mn", split="test", use_auth_token=True)

	google_fleurs = google_fleurs.remove_columns(
	["id", "num_samples", "path", "raw_transcription", "gender", "lang_id", "language", "lang_group_id"]
	)
	google_fleurs = google_fleurs.rename_column("transcription", "sentence")

	dataset = DatasetDict()
	dataset["train"] = concatenate_datasets([common_voice["train"], google_fleurs["train"]])
	dataset["test"] = concatenate_datasets([common_voice["test"], google_fleurs["test"]])
	```

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 16
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 100
	- training_steps: 4000
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:-------:\|
	\| 0.4118 \| 0.4912 \| 500 \| 0.4810 \| 50.3500 \|
	\| 0.283 \| 0.9823 \| 1000 \| 0.3347 \| 38.6233 \|
	\| 0.1778 \| 1.4735 \| 1500 \| 0.2738 \| 33.5240 \|
	\| 0.1412 \| 1.9646 \| 2000 \| 0.2216 \| 27.8363 \|
	\| 0.0676 \| 2.4558 \| 2500 \| 0.1967 \| 24.3823 \|
	\| 0.0602 \| 2.9470 \| 3000 \| 0.1711 \| 21.7428 \|
	\| 0.0363 \| 3.4381 \| 3500 \| 0.1624 \| 20.4108 \|
	\| 0.0332 \| 3.9293 \| 4000 \| 0.1561 \| 19.4492 \|


	### Framework versions

	- Transformers 4.44.0
	- Pytorch 2.3.1+cu118
	- Datasets 2.20.0
	- Tokenizers 0.19.1