SeanScripts commited on
Commit
320db5c
1 Parent(s): d94b85e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -3
README.md CHANGED
@@ -1,3 +1,16 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - allenai/Molmo-72B-0924
5
+ pipeline_tag: image-text-to-text
6
+ library_name: transformers
7
+ ---
8
+
9
+ Quantized with NF4 double quantization from [allenai/Molmo-72B-0924](https://huggingface.co/allenai/Molmo-72B-0924) using BitsAndBytes.
10
+
11
+ Vision backbone modules were not quantized to NF4 (though they are still FP16), and need to be run in FP32 at the moment (layer norm precision loss issue), and should be offloaded to CPU or you'll run out of memory on 48 GB VRAM.
12
+
13
+ This model just *barely* fits in 48 GB (tested on 2 x 3090, and gets about 6 tok/s). It probably doesn't have a very high max sequence length, but at least it works.
14
+
15
+ For 2 cards with 24 GB VRAM, this requires a very specific device map to work. For single cards with 48 GB VRAM, I imagine it works much more smoothly.
16
+