Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion Paper • 2310.03502 • Published Oct 5, 2023 • 77
Small-scale proxies for large-scale Transformer training instabilities Paper • 2309.14322 • Published Sep 25, 2023 • 19
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models Paper • 2309.14717 • Published Sep 26, 2023 • 44
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model Paper • 2309.16058 • Published Sep 27, 2023 • 55
MotionLM: Multi-Agent Motion Forecasting as Language Modeling Paper • 2309.16534 • Published Sep 28, 2023 • 15
Stabilizing RLHF through Advantage Model and Selective Rehearsal Paper • 2309.10202 • Published Sep 18, 2023 • 9
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions Paper • 2309.10150 • Published Sep 18, 2023 • 24
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 87
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset Paper • 2309.04662 • Published Sep 9, 2023 • 22
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale Paper • 2309.04564 • Published Sep 8, 2023 • 15
Scaling Laws for Sparsely-Connected Foundation Models Paper • 2309.08520 • Published Sep 15, 2023 • 13
Composable Function-preserving Expansions for Transformer Architectures Paper • 2308.06103 • Published Aug 11, 2023 • 19
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning Paper • 2308.03526 • Published Aug 7, 2023 • 25