Spaces:
Running
GPTQ AND AWQ Support for ZeroGPU
Hey, I was wondering if Zerogpu supports AWQ and GPTQ quantisation considering that they are a dedicated GPU Quantization Type. I tried a lot of different ways to host my Qwen 2VL 72B Instruct AWQ Model but nothing seems to be working. If anyone could lend me a hand on this issue then I would be really thankful
.
I'll try to help debug it if I have the code.
However, it is not always possible to fix it since the specifications have changed considerably from the previous Zero GPU space...
https://huggingface.co/spaces/akhil2808/Qwen2_VL72B_OCR sure here u go.
Also the main question is " are GPTQ and AWQ" formats even supported by ZeroGPU
I committed a version to boot.
However, the inference does not work.
Maybe it would work if the entire AWQ model was small enough to load into CUDA, but when I tried that with the 70B model, it crashed due to lack of VRAM.🤢
A similar algorithm that came out recently managed to work in Zero GPU space. I'm not sure which one it was...
Edit:
I remember now, it was AQLM.
https://discuss.huggingface.co/t/error-running-model-in-zerogpu/109819
Also the main question is " are GPTQ and AWQ" formats even supported by ZeroGPU
The Format in itself is supported anyways.
what's not supported is loading 72B Vision models on zeroGPU, probably.
Quantized or not.
I thought the available VRAM was 40 GB? It's 80GB on the GPU specs, though.
@xi0v Oh is there any rule like that? You cant load models which are beyond a certain parameter? because this is less than a 13 billion parameter model which I think is small enough to fit on the 80GB Vram A100 ZeroGPU uses under the hood
Well zeroGPU is limited in terms of computational power (hence, it being free but with Qouta) and ZeroGPU uses 40GB A100, not an 80GB if I recall correctly. 13B models work with no problem. What you tried using is a 72B model with Vision capabilities (which makes it need even more computational power to run).