Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 104
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14 • 72
Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation Paper • 2401.10838 • Published Jan 19 • 8