xxl model running on single a6000 vs 2*3090
#19
by
lukaemon
- opened
Running this code, taken from the doc with no change, on single a6000 works as expected, <pad> Wie alt sind Sie?</s>
.
from transformers import T5Tokenizer, T5ForConditionalGeneration
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl", device_map="auto")
input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))
But it generates garbage on 2*3090: <pad><pad>crawl himemme person tsch center Preis Tau the residency now
Can someone point me a direction how to get this working on 2*3090? What's possible source of problem?
I got it to work. device_map='auto' won't work on xxl modle for my 2*3090 pc.
This works fine.
model = T5ForConditionalGeneration.from_pretrained(
checkpoint,
low_cpu_mem_usage=True,
torch_dtype=torch.bfloat16)
model.parallelize()
lukaemon
changed discussion status to
closed