Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 104
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 60
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 73
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series Paper • 2405.19327 • Published May 29 • 45
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12 • 65
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Paper • 2407.01370 • Published Jul 1 • 85
Searching for Best Practices in Retrieval-Augmented Generation Paper • 2407.01219 • Published Jul 1 • 11
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models Paper • 2309.03883 • Published Sep 7, 2023 • 33
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6 • 33
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published Aug 27 • 138