Edit model card

This is the INT4 Llama-3-8b model quantized by per-group QQQ and the group size is 128. QQQ is an innovative and hardware-optimized W4A8 quantization solution. For more details, please refer to our code repo and our paper.

Downloads last month: 5,121

Safetensors

Model size

1.98B params

Tensor type

FP16

F32

I32

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.