view post Post 1037 Reply Wow, impressive 340B model by nvidia with a nice permissive license! 🚀 The technical report is full of insights and seems to use a different learning rate schedule than cosine, probably a variant of WSD. Hope to get more info on that! 👀 nvidia/nemotron-4-340b-666b7ebaf1b3867caf2f1911
LLM.C Fineweb vs Edu-Fineweb eliebak/wsd_124M_150B_edu Text Generation • Updated Jun 11 • 11 eliebak/wsd_124M_150B_fw Text Generation • Updated Jun 11 • 7 eliebak/wsd_124M_300B_edu Text Generation • Updated Jun 11 • 11 eliebak/wsd_124M_300B_fw Text Generation • Updated Jun 11 • 8