GPTQ 4Bit Llama 3.2-3B-Instruct with 100% Accuracy recovery
#29
by
Qubitium
- opened
I am happy to announce that users that want even faster inference of Llama 3.2 3B Instruct with even lower vram requirements can now production deploy via vLLM/SGLang using our highly accurate gptq 4bit quantized model
https://x.com/ModelCloudAi/status/1852249758913724752
https://huggingface.co/ModelCloud/Llama-3.2-3B-Instruct-gptqmodel-4bit-vortext-v3