FLM-101B: An Open LLM and How to Train It with $100K Budget Paper • 2309.03852 • Published Sep 7, 2023 • 43
Composable Function-preserving Expansions for Transformer Architectures Paper • 2308.06103 • Published Aug 11, 2023 • 19
YaRN: Efficient Context Window Extension of Large Language Models Paper • 2309.00071 • Published Aug 31, 2023 • 65
TinyStories: How Small Can Language Models Be and Still Speak Coherent English? Paper • 2305.07759 • Published May 12, 2023 • 33
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 87
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training Paper • 2306.10209 • Published Jun 16, 2023 • 2
2x Faster Language Model Pre-training via Masked Structural Growth Paper • 2305.02869 • Published May 4, 2023 • 1
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale Paper • 2309.04564 • Published Sep 8, 2023 • 15
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Paper • 2309.12307 • Published Sep 21, 2023 • 87
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset Paper • 2309.11998 • Published Sep 21, 2023 • 24