Just can't run!
copied from your example. This just raise following error:
AttributeError: 'GenerationConfig' object has no attribute 'lang_to_id'
Seeing the same issue.
Great catch - fixed in https://huggingface.co/distil-whisper/distil-medium.en/commit/26f298e3a65ea076cbe4498ff70b84d33a8cca32
this does not solve the problem during finetuning
@sanchit-gandhi
I still get the same error whenever my code wants to enter the eval loop during finetuning
I am facing the same issue when running the evaluation.
Hi @Owos & @thoool - This seems to work for me, here's a repro: https://github.com/Vaibhavs10/scratchpad/blob/main/distil_whisper_medium_repro.ipynb
Can you try upgrading the version of transformers
or please share a reproducible snippet!
I also made a bunch of language detection fixes to the Whisper fine-tuning blog post and Colab - could you try using the latest versions to ensure you receive the bug fixes?
- Blog post: https://huggingface.co/blog/fine-tune-whisper
- Colab: https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/fine_tune_whisper.ipynb
- Script: https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-recognition#single-gpu-whisper-training
Let me know if the issue persists!
I just upgraded transformers
from 4.38.2 to 4.41.2, however, the error persists.
My setup is somewhat different because I have been trying to fine-tune a German version of Distil-Whisper, like so:
accelerate launch run_distillation.py
--model_name_or_path "./distil-large-v3-init"
--teacher_model_name_or_path "openai/whisper-large-v3"
--train_dataset_name "mozilla-foundation/common_voice_17_0"
--train_dataset_config_name "de"
--train_split_name "train"
--text_column_name "sentence"
--eval_dataset_name "mozilla-foundation/common_voice_17_0"
--eval_dataset_config_name "de"
--eval_split_name "validation"
--eval_text_column_name "sentence"
--eval_steps 1_000
--save_steps 1_000
--warmup_steps 100
--learning_rate 0.0001
--lr_scheduler_type "constant_with_warmup"
--timestamp_probability 0.2
--condition_on_prev_probability 0.2
--language "de"
--task "transcribe"
--logging_steps 25
--save_total_limit 3
--max_steps 100_000
--wer_threshold 20
--per_device_train_batch_size 32
--per_device_eval_batch_size 32
--dataloader_num_workers 2
--preprocessing_num_workers 2
--ddp_timeout 7200
--dtype "bfloat16"
--attn_implementation "sdpa"
--output_dir "./"
--do_train
--do_eval
--gradient_checkpointing
--overwrite_output_dir
--predict_with_generate
--freeze_encoder
--freeze_embed_positions
--use_pseudo_labels=False
For the evaluation, I am now inside my checkpoint folder when running the following command:
python run_eval.py
--model_name_or_path "./"
--dataset_name "mozilla-foundation/common_voice_17_0"
--dataset_config_name "de"
--dataset_split_name "test"
--text_column_name "sentence"
--batch_size 16
--dtype "bfloat16"
--generation_max_length 256
--language "de"
--attn_implementation "sdpa"
--streaming
Sure
8 Traceback (most recent call last):
9 File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 825, in <module>
10 main()
11 File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 572, in main
12 language = language_to_id(data_args.language, model.generation_config) if data_args.language else None
13 File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 378, in language_to_id
14 if language in generation_config.lang_to_id.keys():
15 AttributeError: 'GenerationConfig' object has no attribute 'lang_to_id'
Are you passing the language
argument to run_eval.py
when evaluating an English only checkpoint? Note that the language
argument should only be passed for multilingual checkpoints. I've opened a PR to throw a better warning here: https://github.com/huggingface/distil-whisper/pull/139
Otherwise, you're likely using a model with an outdated generation config for distillation! Could you update the generation config to match that of the original pre-trained model?
from transformers import GenerationConfig, AutoConfig
# fill me with the hub model id of the checkpoint you're distilling
MODEL_NAME = "sanchit-gandhi/whisper-small-hi"
vocab_size = AutoConfig.from_pretrained(MODEL_NAME).vocab_size
if vocab_size == 51864:
original_model = "openai/whisper-tiny.en"
elif vocab_size == 51865:
original_model = "openai/whisper-tiny"
else:
original_model = "openai/whisper-large-v3"
# load updated generation config
generation_config = GenerationConfig.from_pretrained(original_model)
# push updated generation config to the Hub
generation_config.push_to_hub(MODEL_NAME)
I am not quite sure if I understand this correctly.
The model that I used as a teacher model is --teacher_model_name_or_path "openai/whisper-large-v3"
, and I set --language "de"
while using --train_dataset_name "mozilla-foundation/common_voice_17_0"
. So I end up with a German distilled version of whisper-large-v3 which l is stored locally.
When executing the run_eval.py
file, I indeed pass --language "de"
just like I did during training. Do you mean I don't have to set language
as I now have a German version and no longer a multilingual version of Whisper?
FWIW:
python run_eval.py
--model_name_or_path "./"
--dataset_name "mozilla-foundation/common_voice_17_0"
--dataset_config_name "de"
--dataset_split_name "test"
--text_column_name "sentence"
--batch_size 16
--dtype "bfloat16"
--generation_max_length 256
--attn_implementation "sdpa"
--streaming
--return_timestamps False
seems to be circumventing the problem. That being said, I now face this error:
Start benchmarking common_voice_17_0/test...
Reading metadata...: 16183it [00:00, 41952.06it/s] | 0/1 [00:00<?, ?it/s]
/home/operation/miniconda3/envs/whisper-finetune/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:537: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
warnings.warn(...: 1it [00:00, 3.35it/s]
Samples: 16183it [13:45, 19.60it/s]
Datasets: 0%| | 0/1 [13:45<?, ?it/s]
Traceback (most recent call last):
File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 825, in <module>
main()
File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 763, in main
norm_transcriptions = [normalizer(pred) for pred in transcriptions]
File "/home/operation/whisper_finetune/distil-whisper/training/checkpoint-35000-epoch-1/run_eval.py", line 763, in <listcomp>
norm_transcriptions = [normalizer(pred) for pred in transcriptions]
File "/home/operation/miniconda3/envs/whisper-finetune/lib/python3.10/site-packages/transformers/models/whisper/english_normalizer.py", line 587, in __call__
s = self.standardize_spellings(s)
File "/home/operation/miniconda3/envs/whisper-finetune/lib/python3.10/site-packages/transformers/models/whisper/english_normalizer.py", line 507, in __call__
return " ".join(self.mapping.get(word, word) for word in s.split())
File "/home/operation/miniconda3/envs/whisper-finetune/lib/python3.10/site-packages/transformers/models/whisper/english_normalizer.py", line 507, in <genexpr>
return " ".join(self.mapping.get(word, word) for word in s.split())
AttributeError: 'NoneType' object has no attribute 'get'
Ah I see what's happening! The checkpoint you're evaluating is an intermediate checkpoint (i.e. one saved partway during training with accelerator.save_state
). This saves the model weights to checkpoint-35000-epoch-1
, but not the config, tokenizer, feature extractor or generation config.
To remedy this, could you copy the corresponding files into this checkpoint dir?
from transformers import GenerationConfig , WhisperConfig, WhisperProcessor
BASE_DIR = "/home/operation/whisper_finetune/distil-whisper/training/"
CHECKPOINT = "checkpoint-35000-epoch-1"
config = WhisperConfig.from_pretrained(BASE_DIR)
processor = WhisperProcessor.from_pretrained(BASE_DIR)
generation_config = GenerationConfig.from_pretrained(BASE_DIR)
config.save_pretrained(BASE_DIR + CHECKPOINT)
processor.save_pretrained(BASE_DIR + CHECKPOINT)
generation_config.save_pretrained(BASE_DIR + CHECKPOINT)
You should then be able to run evaluation using the scripts you shared above
What do you think about updating the distillation script to save the config/processor/generation config during intermediate saves @eustlb ? Would be useful for evaluating intermediate checkpoints.
That worked just fine, thanks @sanchit-gandhi
Agree there @sanchit-gandhi ! I'll update the distillation script.