How much VRAM did you use?
#2
by
ShukantP
- opened
Depends on the batch size. I already forgot the VRAM usage but you really need 8x A100 80GB or 8x H100 to efficiently fine-tune this model. And the model is not that good actually, I think it's better to just fine-tune Mixtral 8x22B or LLaMA 3 70B.
I haven't tried but I guess it's possible but you need very low batch size.
My test trains were using LoRA.