AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published 23 days ago • 56
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published 5 days ago • 43
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published Oct 1 • 28
Gated Slot Attention for Efficient Linear-Time Sequence Modeling Paper • 2409.07146 • Published Sep 11 • 19
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization Paper • 2409.12903 • Published Sep 19 • 21
Configurable Foundation Models: Building LLMs from a Modular Perspective Paper • 2409.02877 • Published Sep 4 • 27
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 155
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 115
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20 • 40
GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS Paper • 2408.01584 • Published Aug 2 • 7