RoBERTa: A Robustly Optimized BERT Pretraining Approach Paper • 1907.11692 • Published Jul 26, 2019 • 7
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks Paper • 1907.12461 • Published Jul 29, 2019 • 1
Transformer Language Models without Positional Encodings Still Learn Positional Information Paper • 2203.16634 • Published Mar 30, 2022 • 5