google/flan-t5-xxl · xxl model running on single a6000 vs 2*3090

Jan 9, 2023

Running this code, taken from the doc with no change, on single a6000 works as expected, <pad> Wie alt sind Sie?</s>.

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl", device_map="auto")

input_text = "translate English to German: How old are you?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids)
print(tokenizer.decode(outputs[0]))

But it generates garbage on 2*3090: <pad><pad>crawl himemme person tsch center Preis Tau the residency now

Can someone point me a direction how to get this working on 2*3090? What's possible source of problem?

lukaemon

Jan 10, 2023

I got it to work. device_map='auto' won't work on xxl modle for my 2*3090 pc.

This works fine.

model = T5ForConditionalGeneration.from_pretrained(
            checkpoint,
            low_cpu_mem_usage=True,
            torch_dtype=torch.bfloat16)
model.parallelize()

lukaemon changed discussion status to closed Jan 10, 2023