kingabzpro
/

whisper-small-hi-cv

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

whisper-small-hi-cv / README.md

kingabzpro's picture

Update README.md

bd3e434 about 1 year ago

|

history blame contribute delete

2.85 kB

	---
	license: apache-2.0
	base_model: openai/whisper-small
	tags:
	- generated_from_trainer
	datasets:
	- mozilla-foundation/common_voice_15_0
	- mozilla-foundation/common_voice_13_0
	language:
	- hi
	metrics:
	- cer
	- wer
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	model-index:
	- name: whisper-small-hi-cv
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Common Voice 15
	type: mozilla-foundation/common_voice_15_0
	args: hi
	metrics:
	- name: Test WER
	type: wer
	value: 13.9913
	- name: Test CER
	type: cer
	value: 5.8844

	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Common Voice 13
	type: mozilla-foundation/common_voice_13_0
	args: hi
	metrics:
	- name: Test WER
	type: wer
	value: 23.3824
	- name: Test CER
	type: cer
	value: 10.5288
	---

	# whisper-small-hi-cv

	This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Common Voice 15 dataset.
	It achieves the following results on the evaluation set:
	- Wer: 13.9913
	- Cer: 5.8844

	View the results on Kaggle Notebook: https://www.kaggle.com/code/kingabzpro/whisper-hindi-eval

	## Evaluation

	```python
	from datasets import load_dataset,load_metric,Audio
	from transformers import WhisperForConditionalGeneration, WhisperProcessor
	import torch
	import torchaudio

	test_dataset = load_dataset("mozilla-foundation/common_voice_13_0", "hi", split="test")
	wer = load_metric("wer")
	cer = load_metric("cer")

	processor = WhisperProcessor.from_pretrained("kingabzpro/whisper-small-hi-cv")
	model = WhisperForConditionalGeneration.from_pretrained("kingabzpro/whisper-small-hi-cv").to("cuda")
	test_dataset = test_dataset.cast_column("audio", Audio(sampling_rate=16000))

	def map_to_pred(batch):
	audio = batch["audio"]
	input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
	batch["reference"] = processor.tokenizer._normalize(batch['sentence'])

	with torch.no_grad():
	predicted_ids = model.generate(input_features.to("cuda"))[0]
	transcription = processor.decode(predicted_ids)
	batch["prediction"] = processor.tokenizer._normalize(transcription)
	return batch

	result = test_dataset.map(map_to_pred)

	print("WER: {:2f}".format(100 * wer.compute(predictions=result["prediction"], references=result["reference"])))
	print("CER: {:2f}".format(100 * cer.compute(predictions=result["prediction"], references=result["reference"])))
	```
	```bash
	WER: 23.3824
	CER: 10.5288
	```