Finetuning setup

#4
by Andriy - opened

Hi! We are trying to repeat your experiment. I see that you finetuned this model on 8xH100 with 4k-token inputs. What setup allowed you to fit both the model, the gradients, and the inputs into 8x80BG? Is it DeepSpeed or something else? Please share some details. Thanks!

Sign up or log in to comment