Running on multi-node infrastructure
#17
by
pvalois
- opened
Hello all,
Is there any resource that could help me run this on a multi-node environment? I am not really an expert on this, but I got access to a multi-node H100 machine (1 GPU per node) via mpirun, but I am unable to have the model loaded correctly using all available GPUs just using accelerate.
I would really appreciate some help of guidance if possible.
Thank for the amazing work!