Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20 • 85
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization Paper • 2409.12903 • Published Sep 19 • 21
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models Paper • 2410.13841 • Published 23 days ago • 14