Less is More: Task-aware Layer-wise Distillation for Language Model Compression Paper • 2210.01351 • Published Oct 4, 2022 • 2
A Survey on Knowledge Distillation of Large Language Models Paper • 2402.13116 • Published Feb 20 • 2
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling Paper • 2311.00430 • Published Nov 1, 2023 • 57
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper • 2306.13649 • Published Jun 23, 2023 • 16
Compact Language Models via Pruning and Knowledge Distillation Paper • 2407.14679 • Published Jul 19 • 38
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published Aug 21 • 55
DistiLLM: Towards Streamlined Distillation for Large Language Models Paper • 2402.03898 • Published Feb 6 • 1
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes Paper • 2305.02301 • Published May 3, 2023 • 2