The Impact of Positional Encoding on Length Generalization in Transformers Paper • 2305.19466 • Published May 31, 2023 • 2
Teaching Transformers Causal Reasoning through Axiomatic Training Paper • 2407.07612 • Published Jul 10 • 2