Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Paper • 2410.10814 • Published 23 days ago • 48
MiniPLM: Knowledge Distillation for Pre-Training Language Models Paper • 2410.17215 • Published 15 days ago • 12
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Paper • 2410.16256 • Published 16 days ago • 58
CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models Paper • 2410.18505 • Published 13 days ago • 8