not run

#1
by sdyy - opened

not run on colab t4

VPTQ-community org

Please hold on while I give it a try.

VPTQ-community org

The GPU memory size of Colab's T4 is 16 GB, which is 128 Gbit. The model VPTQ-community/Qwen2.5-32B-Instruct-v8-k65536-65536-woft is equivalent to 4-bit quantization, meaning the model size requires 32 * 4 = 128 Gbit. Currently, Torch's implementation also needs additional RAM to store the kv cache, which likely won't fit in the GPU memory. I suggest you try VPTQ-community/Qwen2.5-32B-Instruct-v8-k65536-65536-woft (32B @ 2bit).

Sign up or log in to comment