not run

by sdyy - opened 6 days ago

Discussion

sdyy

6 days ago

not run on colab t4

OpenSourceRonin

VPTQ-community org 6 days ago

Please hold on while I give it a try.

OpenSourceRonin

VPTQ-community org 6 days ago

The GPU memory size of Colab's T4 is 16 GB, which is 128 Gbit. The model VPTQ-community/Qwen2.5-32B-Instruct-v8-k65536-65536-woft is equivalent to 4-bit quantization, meaning the model size requires 32 * 4 = 128 Gbit. Currently, Torch's implementation also needs additional RAM to store the kv cache, which likely won't fit in the GPU memory. I suggest you try VPTQ-community/Qwen2.5-32B-Instruct-v8-k65536-65536-woft (32B @ 2bit).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment