DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search Paper • 2410.03864 • Published Oct 4 • 10
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning Paper • 2407.00617 • Published Jun 30 • 7
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28 • 94
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning Paper • 2406.12050 • Published Jun 17 • 18
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18 • 53
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models Paper • 2308.00304 • Published Aug 1, 2023 • 22