Headless Language Models: Learning without Predicting with Contrastive Weight Tying Paper • 2309.08351 • Published Sep 15, 2023 • 3