low memory usage

#10
by Knut-J - opened

Is there any way to use low memory, as my GPU only has 24 GB. I get this error message torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 462.00 MiB. GPU

@Knut-J 24GB isn't gonna cut it for this beast of a model. NVLM-D 72B is huge. But don't give up yet! Try these tricks:

  • CPU offloading: Use device_map="auto" when loading the model. It'll be slow as molasses, but it might just work.
  • 8-bit quantization: Add load_in_8bit=True to your model loading. It'll sacrifice some quality, but hey, beggars can't be choosers.
    Last resort: Downgrade to a smaller model. Sometimes you gotta know when to fold 'em.

Fair warning: These hacks might make your inference slower than a snail on tranquilizers. But if you're dead set on using this model, it's worth a shot. Good luck!

Sign up or log in to comment