Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Paper • 1909.08053 • Published Sep 17, 2019 • 2