The Impact of Depth and Width on Transformer Language Model Generalization Paper • 2310.19956 • Published Oct 30, 2023 • 9