SeanScripts
/

Molmo-72B-0924-nf4

Image-Text-to-Text

text-generation

4-bit precision

Model card Files Files and versions Community

SeanScripts commited on 13 days ago

Commit

320db5c

•

1 Parent(s): d94b85e

Update README.md

Files changed (1) hide show

README.md +16 -3

README.md CHANGED Viewed

@@ -1,3 +1,16 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+base_model:
+- allenai/Molmo-72B-0924
+pipeline_tag: image-text-to-text
+library_name: transformers
+---
+Quantized with NF4 double quantization from [allenai/Molmo-72B-0924](https://huggingface.co/allenai/Molmo-72B-0924) using BitsAndBytes.
+Vision backbone modules were not quantized to NF4 (though they are still FP16), and need to be run in FP32 at the moment (layer norm precision loss issue), and should be offloaded to CPU or you'll run out of memory on 48 GB VRAM.
+This model just *barely* fits in 48 GB (tested on 2 x 3090, and gets about 6 tok/s). It probably doesn't have a very high max sequence length, but at least it works.
+For 2 cards with 24 GB VRAM, this requires a very specific device map to work. For single cards with 48 GB VRAM, I imagine it works much more smoothly.