Finetuning setup

by Andriy - opened Feb 23

Feb 23

Hi! We are trying to repeat your experiment. I see that you finetuned this model on 8xH100 with 4k-token inputs. What setup allowed you to fit both the model, the gradients, and the inputs into 8x80BG? Is it DeepSpeed or something else? Please share some details. Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment