--- base_model: - openai/whisper-large-v3 language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - 'no' - th - ur - hr - bg - lt - la - mi - ml - cy - sk - te - fa - lv - bn - sr - az - sl - kn - et - mk - br - eu - is - hy - ne - mn - bs - kk - sq - sw - gl - mr - pa - si - km - sn - yo - so - af - oc - ka - be - tg - sd - gu - am - yi - lo - uz - fo - ht - ps - tk - nn - mt - sa - lb - my - bo - tl - mg - as - tt - haw - ln - ha - ba - jw - su library_name: transformers license: apache-2.0 pipeline_tag: automatic-speech-recognition tags: - asr - Pytorch - pruned - audio - automatic-speech-recognition --- # Whisper-large-v3-no-numbers ## Model info This is a version of [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) model without number tokens (token ids corresponding to numbers are excluded). NO fine-tuning was used. Phrases with spoken numbers will be transcribed with numbers as words. It can be useful for TTS data preparation. **Example**: Instead of **"25"** this model will transcribe phrase as **"twenty five"**. ## Usage `transformers` version `4.45.2` Model can be used as an original whisper: ```python >>> from transformers import WhisperProcessor, WhisperForConditionalGeneration >>> import torchaudio >>> # load audio >>> wav, sr = torchaudio.load("audio.wav") >>> # resample if necessary >>> wav = torchaudio.functional.resample(wav, sr, 16000) >>> # load model and processor >>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers") >>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-large-v3-no-numbers") >>> input_features = processor(wav[0], sampling_rate=16000, return_tensors="pt").input_features >>> # generate token ids >>> predicted_ids = model.generate(input_features) >>> # decode token ids to text >>> transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False) ['<|startoftranscript|><|en|><|transcribe|><|notimestamps|> Twenty seven years. <|endoftext|>'] ``` The context tokens can be removed from the start of the transcription by setting `skip_special_tokens=True`.