Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30 • 73
jonathanjordan21/mos-mamba-18x130m-trainer-dgx-lora-sft-merged Text Generation • Updated 28 days ago • 30