-
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Paper • 2407.11062 • Published • 8 -
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper • 2210.17323 • Published • 7 -
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper • 2306.00978 • Published • 8
Collections
Discover the best community collections!
Collections including paper arxiv:2306.00978
-
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper • 2306.00978 • Published • 8 -
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper • 2210.17323 • Published • 7 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 602
-
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 56 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 242 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 30 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 45
-
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Paper • 2307.13304 • Published • 2 -
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Paper • 2306.03078 • Published • 3 -
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Paper • 2308.13137 • Published • 17 -
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper • 2306.00978 • Published • 8
-
FP8-LM: Training FP8 Large Language Models
Paper • 2310.18313 • Published • 31 -
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Paper • 2310.16836 • Published • 13 -
TEQ: Trainable Equivalent Transformation for Quantization of LLMs
Paper • 2310.10944 • Published • 9 -
ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Paper • 2309.16119 • Published • 1
-
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper • 2208.07339 • Published • 4 -
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper • 2210.17323 • Published • 7 -
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Paper • 2211.10438 • Published • 2 -
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper • 2306.00978 • Published • 8