not run
#1
by
sdyy
- opened
not run on colab t4
Please hold on while I give it a try.
The GPU memory size of Colab's T4 is 16 GB, which is 128 Gbit. The model VPTQ-community/Qwen2.5-32B-Instruct-v8-k65536-65536-woft is equivalent to 4-bit quantization, meaning the model size requires 32 * 4 = 128 Gbit. Currently, Torch's implementation also needs additional RAM to store the kv cache, which likely won't fit in the GPU memory. I suggest you try VPTQ-community/Qwen2.5-32B-Instruct-v8-k65536-65536-woft (32B @ 2bit).