Out of memory issue.

#34
by kxgong - opened

Hi, I use the recommended way (from_pretrained(***) ) to load mixtral-8x7B but it says out-of-memory.

I use 8 x A100 GPUs to run this command. What is problem?

Thank you.

Hi @kxgong
I suggest to load the model in half-precision (torch_dtype=torch.float16) or in 4-bit precision load_in_4bit=True in order to load your model in the most memory efficient manner possible

Hi @kxgong
I suggest to load the model in half-precision (torch_dtype=torch.float16) or in 4-bit precision load_in_4bit=True in order to load your model in the most memory efficient manner possible

Thank you, I am using mixtral-8x7B for training. I wonder whether using 4bit will cause performance drop.

@kxgong if you use QLoRA you shouldn't expect performance drop with respect to full-finetuning. You can read more about QLoRA here: https://huggingface.co/blog/4bit-transformers-bitsandbytes and get started with resources on how to run QLoRA with this blogpost for example: https://pytorch.org/blog/finetune-llms/

Thanks for your help.

Sign up or log in to comment