google/gemma-2-27b-it · RuntimeError: CUDA error: device-side assert triggered

Jun 28

CUDA error while loading on multiple gpus on device_map="auto" as the tutorial intended.

Code:
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)

Error:
----> 6 outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

djstrong

Jul 3

Confirm.

brettbj

Jul 3

I believe this is the same issue as:
https://huggingface.co/google/gemma-2-9b-it/discussions/14

I think it's something to do with the sliding window, but I couldn't fix last night in an hour or two. I'll try to revisit when I have time, but if anyone else has a chance hoping this can help focus in

RitP

Jul 3

Yeah it occurs when the input exceeds a certain size. I tried it with max_sequence_length = 4096 and truncation = true but it still didnt work.

yixuantt

Jul 4

Same error. I tried to run it on the CPU. But got the following error:
IndexError: index 4480 is out of bounds for dimension 0 with size 4096