openai/whisper-large-v3 · Issue when trying to run Whisper offline from locally saved pretrained model

Jun 7

•

Hi,

I am trying to run Whisper locally, using the model's downloaded files from a folder.

I downloaded the model for offline use, following the instructions suggested here, see my code below:

  from transformers import AutoTokenizer, AutoModelForSpeechSeq2Seq

  MODEL_FROM_FILE = os.path.join('models', 'whisper-large-v3')

  tokenizer = AutoTokenizer.from_pretrained("openai/whisper-large-v3")
  model = AutoModelForSpeechSeq2Seq.from_pretrained("openai/whisper-large-v3")

  tokenizer.save_pretrained(MODEL_FROM_FILE)
  model.save_pretrained(MODEL_FROM_FILE)

The first, problem I encountered was a missing file, getting the following error:

models/whisper-large-v3 does not appear to have a file named preprocessor_config.json. Checkout 'https://huggingface.co/models/whisper-large-v3/main' for available files.

Downloading the file manually (from here) seems to help overcome this problem, but then another came up, see below:

probability tensor contains either `inf`, `nan` or element < 0

Searching online found that it might be related to the device I am running on or some misconfiguration of my model.

The first thing I tried was to run on cpu instead of cuda:0 which is how I run normally (not when offline). See the original code below:

model: Whisper = None

DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

MODEL_FROM_FILE = os.path.join('models', 'whisper-large-v3')
model = AutoModelForSpeechSeq2Seq.from_pretrained(
  MODEL_FROM_FILE, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True, local_files_only=True)
model.to(DEVICE)   

processor = AutoProcessor.from_pretrained(MODEL_FROM_FILE)

asr = pipeline(
      task="automatic-speech-recognition",
      model=model,
      tokenizer=processor.tokenizer,
      feature_extractor=processor.feature_extractor,
      max_new_tokens=128,
      torch_dtype=torch_dtype,
      device=DEVICE,
)

temperature = 0.3
result = asr(audio_file,
              chunk_length_s=30, # 30 seconds
              batch_size=4,
              return_timestamps=True,
              generate_kwargs={"language":"english", "do_sample":True, "temperature":temperature})

Changing the DEVICE and torch_type (as shown below), seems to solve the problem.

DEVICE = "cpu"
torch_dtype = torch.float32

the version of torch installed on my machine is the following

torch==1.13.1+cu117 
torchvision==0.14.1+cu117 
torchaudio==0.13.1+cu117

Even though this solves my problem it's not an acceptable solution.

Any ideas about what might be the problem?

WeRChampion

Jun 8

what are the specifications of your device?

georgis-agent

Jun 10

Processor: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz 2.71 GHz
Installed RAM: 32.0 GB (31.8 GB usable)
System type: 64-bit operating system, x64-based processor

Is there something else you might need?

georgis-agent

Jun 10

•

edited Jun 10

It might be worth saying that the code runs fine when I download the model from Huggingface. My problem only occurs when I try to load it from local files.