Upload folder using huggingface_hub

7555d13 verified 4 months ago

4.15 kB

	# Block-AP (EfficientQAT w/o E2E-AP)

	[EfficientQAT](https://arxiv.org/abs/2407.11062) involves two consecutive training phases: Block-wise training of all parameters (Block-AP) and end-to-end training of quantization parameters (E2E-QP).

	In this repo, we provide the quantized checkpoints of Block-AP. Anyone can use them to reproduce our results or carry following research.

	## Performance

	\| Model \| Quantization \| WikiText2 PPL \| Avg. Accuracy \| Model Size (GB) \| Hub link\|
	\|-------\|--------------\|---------------\|---------------\|-----------------\|----------\|
	Llama-2-7B\|fp16\|5.47\|64.86\|13.2\|-\|
	Llama-2-7B\|w4g128\|5.56\|64.07\|3.7\|[Link](https://huggingface.co/ChenMnZ/Llama-2-7b-BlockAd-w4g128)\|
	Llama-2-7B\|w3g128\|5.89\|63.96\|3.1\|[Link](https://huggingface.co/ChenMnZ/Llama-2-7b-BlockAP-w3g128)\|
	Llama-2-7B\|w2g64\|7.65\|59.54\|2.3\|[Link](https://huggingface.co/ChenMnZ/Llama-2-7b-BlockAP-w2g64)\|
	Llama-2-7B\|w2g128\|7.94\|58.72\|2.2\|[Link](https://huggingface.co/ChenMnZ/Llama-2-7b-BlockAP-w2g128)\|
	Llama-2-13B\|fp16\|4.88\|67.81\|25.4\|-\|
	Llama-2-13B\|w4g128\|4.96\|67.27\|6.8\|[Link](https://huggingface.co/ChenMnZ/Llama-2-13b-BlockAP-w4g128)\|
	Llama-2-13B\|w3g128\|5.20\|67.30\|5.6\|[Link](https://huggingface.co/ChenMnZ/Llama-2-13b-BlockAP-w3g128)\|
	Llama-2-13B\|w2g64\|6.55\|63.10\|4.0\|[Link](https://huggingface.co/ChenMnZ/Llama-2-13b-BlockAP-w2g64)\|
	Llama-2-13B\|w2g128\|6.68\|63.49\|3.8\|[Link](https://huggingface.co/ChenMnZ/Llama-2-13b-BlockAP-w2g128)\|
	Llama-2-70B\|fp16\|3.32\|72.41\|131.6\|-\|
	Llama-2-70B\|w4g128\|3.41\|72.54\|35.8\|[Link](https://huggingface.co/ChenMnZ/Llama-2-70b-BlockAP-w4g128)\|
	Llama-2-70B\|w3g128\|3.65\|71.88\|29.1\|[Link](https://huggingface.co/ChenMnZ/Llama-2-70b-BlockAP-w3g128)\|
	Llama-2-70B\|w2g64\|4.96\|69.44\|20.1\|[Link](https://huggingface.co/ChenMnZ/Llama-2-70b-BlockAP-w2g64)\|
	Llama-2-70B\|w2g128\|5.26\|68.73\|18.9\|[Link](https://huggingface.co/ChenMnZ/Llama-2-70b-BlockAP-w2g128)\|
	Llama-3-8B\|fp16\|6.14\|68.58\|13.0\|-\|
	Llama-3-8B\|w4g128\|6.50\|68.43\|5.4\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-BlockAP-w4g128)\|
	Llama-3-8B\|w3g128\|7.34\|66.72\|4.7\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-BlockAP-w3g128)\|
	Llama-3-8B\|w2g64\|12.47\|58.65\|3.9\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-BlockAP-w2g64)\|
	Llama-3-8B\|w2g128\|13.25\|58.23\|3.8\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-BlockAP-w2g128)\|
	Llama-3-70B\|fp16\|2.85\|75.33\|137.8\|-\|
	Llama-3-70B\|w4g128\|3.18\|74.50\|38.9\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-BlockAP-w4g128)\|
	Llama-3-70B\|w3g128\|4.88\|71.90\|32.2\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-BlockAP-w3g128)\|
	Llama-3-70B\|w2g64\|13.75\|66.70\|23.2\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-BlockAP-w2g64)\|
	Llama-3-70B\|w2g128\|16.79\|65.06\|22.0\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-BlockAP-w2g128)\|
	Llama-3-8B-Instruct\|fp16\|8.29\|68.43\|13.0\|-\|
	Llama-3-8B-Instruct\|w4g128\|8.76\|67.80\|5.4\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-instruct-BlockAP-w4g128)\|
	Llama-3-8B-Instruct\|w3g128\|9.83\|66.54\|4.7\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-instruct-BlockAP-w3g128)\|
	Llama-3-8B-Instruct\|w2g64\|16.77\|58.62\|3.9\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-instruct-BlockAP-w2g64)\|
	Llama-3-8B-Instruct\|w2g128\|18.02\|57.19\|3.8\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-instruct-BlockAP-w2g128)\|
	Llama-3-70B-Instruct\|fp16\|5.33\|73.78\|137.8\|-\|
	Llama-3-70B-Instruct\|w4g128\|5.77\|73.52\|38.9\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-instruct-BlockAP-w4g128)\|
	Llama-3-70B-Instruct\|w3g128\|7.25\|69.80\|32.2\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-instruct-BlockAP-w3g128)\|
	Llama-3-70B-Instruct\|w2g64\|12.48\|65.60\|23.2\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-instruct-BlockAP-w2g64)\|
	Llama-3-70B-Instruct\|w2g128\|13.48\|61.75\|22.0\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-instruct-BlockAP-w2g128)\|


	## Usage
	Please refer [https://github.com/OpenGVLab/EfficientQAT](https://github.com/OpenGVLab/EfficientQAT) for details. These checkpoints can be used to [following E2E-AP](https://github.com/OpenGVLab/EfficientQAT?tab=readme-ov-file#training), as well as be [inferenced](https://github.com/OpenGVLab/EfficientQAT?tab=readme-ov-file#inference) directly.

	# Block-AP (EfficientQAT w/o E2E-AP)

	[EfficientQAT](https://arxiv.org/abs/2407.11062) involves two consecutive training phases: Block-wise training of all parameters (Block-AP) and end-to-end training of quantization parameters (E2E-QP).

	In this repo, we provide the quantized checkpoints of Block-AP. Anyone can use them to reproduce our results or carry following research.

	## Performance

	\| Model \| Quantization \| WikiText2 PPL \| Avg. Accuracy \| Model Size (GB) \| Hub link\|
	\|-------\|--------------\|---------------\|---------------\|-----------------\|----------\|
	Llama-2-7B\|fp16\|5.47\|64.86\|13.2\|-\|
	Llama-2-7B\|w4g128\|5.56\|64.07\|3.7\|[Link](https://huggingface.co/ChenMnZ/Llama-2-7b-BlockAd-w4g128)\|
	Llama-2-7B\|w3g128\|5.89\|63.96\|3.1\|[Link](https://huggingface.co/ChenMnZ/Llama-2-7b-BlockAP-w3g128)\|
	Llama-2-7B\|w2g64\|7.65\|59.54\|2.3\|[Link](https://huggingface.co/ChenMnZ/Llama-2-7b-BlockAP-w2g64)\|
	Llama-2-7B\|w2g128\|7.94\|58.72\|2.2\|[Link](https://huggingface.co/ChenMnZ/Llama-2-7b-BlockAP-w2g128)\|
	Llama-2-13B\|fp16\|4.88\|67.81\|25.4\|-\|
	Llama-2-13B\|w4g128\|4.96\|67.27\|6.8\|[Link](https://huggingface.co/ChenMnZ/Llama-2-13b-BlockAP-w4g128)\|
	Llama-2-13B\|w3g128\|5.20\|67.30\|5.6\|[Link](https://huggingface.co/ChenMnZ/Llama-2-13b-BlockAP-w3g128)\|
	Llama-2-13B\|w2g64\|6.55\|63.10\|4.0\|[Link](https://huggingface.co/ChenMnZ/Llama-2-13b-BlockAP-w2g64)\|
	Llama-2-13B\|w2g128\|6.68\|63.49\|3.8\|[Link](https://huggingface.co/ChenMnZ/Llama-2-13b-BlockAP-w2g128)\|
	Llama-2-70B\|fp16\|3.32\|72.41\|131.6\|-\|
	Llama-2-70B\|w4g128\|3.41\|72.54\|35.8\|[Link](https://huggingface.co/ChenMnZ/Llama-2-70b-BlockAP-w4g128)\|
	Llama-2-70B\|w3g128\|3.65\|71.88\|29.1\|[Link](https://huggingface.co/ChenMnZ/Llama-2-70b-BlockAP-w3g128)\|
	Llama-2-70B\|w2g64\|4.96\|69.44\|20.1\|[Link](https://huggingface.co/ChenMnZ/Llama-2-70b-BlockAP-w2g64)\|
	Llama-2-70B\|w2g128\|5.26\|68.73\|18.9\|[Link](https://huggingface.co/ChenMnZ/Llama-2-70b-BlockAP-w2g128)\|
	Llama-3-8B\|fp16\|6.14\|68.58\|13.0\|-\|
	Llama-3-8B\|w4g128\|6.50\|68.43\|5.4\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-BlockAP-w4g128)\|
	Llama-3-8B\|w3g128\|7.34\|66.72\|4.7\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-BlockAP-w3g128)\|
	Llama-3-8B\|w2g64\|12.47\|58.65\|3.9\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-BlockAP-w2g64)\|
	Llama-3-8B\|w2g128\|13.25\|58.23\|3.8\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-BlockAP-w2g128)\|
	Llama-3-70B\|fp16\|2.85\|75.33\|137.8\|-\|
	Llama-3-70B\|w4g128\|3.18\|74.50\|38.9\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-BlockAP-w4g128)\|
	Llama-3-70B\|w3g128\|4.88\|71.90\|32.2\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-BlockAP-w3g128)\|
	Llama-3-70B\|w2g64\|13.75\|66.70\|23.2\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-BlockAP-w2g64)\|
	Llama-3-70B\|w2g128\|16.79\|65.06\|22.0\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-BlockAP-w2g128)\|
	Llama-3-8B-Instruct\|fp16\|8.29\|68.43\|13.0\|-\|
	Llama-3-8B-Instruct\|w4g128\|8.76\|67.80\|5.4\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-instruct-BlockAP-w4g128)\|
	Llama-3-8B-Instruct\|w3g128\|9.83\|66.54\|4.7\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-instruct-BlockAP-w3g128)\|
	Llama-3-8B-Instruct\|w2g64\|16.77\|58.62\|3.9\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-instruct-BlockAP-w2g64)\|
	Llama-3-8B-Instruct\|w2g128\|18.02\|57.19\|3.8\|[Link](https://huggingface.co/ChenMnZ/Llama-3-8b-instruct-BlockAP-w2g128)\|
	Llama-3-70B-Instruct\|fp16\|5.33\|73.78\|137.8\|-\|
	Llama-3-70B-Instruct\|w4g128\|5.77\|73.52\|38.9\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-instruct-BlockAP-w4g128)\|
	Llama-3-70B-Instruct\|w3g128\|7.25\|69.80\|32.2\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-instruct-BlockAP-w3g128)\|
	Llama-3-70B-Instruct\|w2g64\|12.48\|65.60\|23.2\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-instruct-BlockAP-w2g64)\|
	Llama-3-70B-Instruct\|w2g128\|13.48\|61.75\|22.0\|[Link](https://huggingface.co/ChenMnZ/Llama-3-70b-instruct-BlockAP-w2g128)\|


	## Usage
	Please refer [https://github.com/OpenGVLab/EfficientQAT](https://github.com/OpenGVLab/EfficientQAT) for details. These checkpoints can be used to [following E2E-AP](https://github.com/OpenGVLab/EfficientQAT?tab=readme-ov-file#training), as well as be [inferenced](https://github.com/OpenGVLab/EfficientQAT?tab=readme-ov-file#inference) directly.