How much VRAM did you use?

by ShukantP - opened May 25

May 25

Hi @v2ray , I'm curious what your VRAM requirements were after gradient checkpointing + expert tensor fix? Did you run training on the model?

v2ray

Owner May 25

•

edited May 25

Depends on the batch size. I already forgot the VRAM usage but you really need 8x A100 80GB or 8x H100 to efficiently fine-tune this model. And the model is not that good actually, I think it's better to just fine-tune Mixtral 8x22B or LLaMA 3 70B.

ShukantP

May 26

Thanks @v2ray . Were you able to do full parameter fine tuning with 1 node of 8xA100?

v2ray

Owner May 26

I haven't tried but I guess it's possible but you need very low batch size.
My test trains were using LoRA.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment