Running on multi-node infrastructure

#17
by pvalois - opened

Hello all,

Is there any resource that could help me run this on a multi-node environment? I am not really an expert on this, but I got access to a multi-node H100 machine (1 GPU per node) via mpirun, but I am unable to have the model loaded correctly using all available GPUs just using accelerate.

I would really appreciate some help of guidance if possible.

Thank for the amazing work!

Sign up or log in to comment