How much VRAM did you use?

#2
by ShukantP - opened

Hi @v2ray , I'm curious what your VRAM requirements were after gradient checkpointing + expert tensor fix? Did you run training on the model?

Depends on the batch size. I already forgot the VRAM usage but you really need 8x A100 80GB or 8x H100 to efficiently fine-tune this model. And the model is not that good actually, I think it's better to just fine-tune Mixtral 8x22B or LLaMA 3 70B.

Thanks @v2ray . Were you able to do full parameter fine tuning with 1 node of 8xA100?

Owner

I haven't tried but I guess it's possible but you need very low batch size.
My test trains were using LoRA.

Sign up or log in to comment