Collections
Discover the best community collections!
Collections including paper arxiv:2405.00664
-
Better & Faster Large Language Models via Multi-token Prediction
Paper • 2404.19737 • Published • 73 -
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3
Paper • 2405.00664 • Published • 18 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 118
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 16 -
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Paper • 2310.00535 • Published • 2 -
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Paper • 2307.09458 • Published • 10 -
The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 9
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 47 -
Linear Transformers are Versatile In-Context Learners
Paper • 2402.14180 • Published • 6 -
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 19 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 602