This is the INT4 Llama-3-8b model quantized by per-channel QQQ. QQQ is an innovative and hardware-optimized W4A8 quantization solution. For more details, please refer to our code repo and our paper.
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.