TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters Paper • 2410.23168 • Published 22 days ago • 22