Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
m-ricย 
posted an update Sep 11
Post
2173
๐—”๐—ฟ๐—ฐ๐—ฒ๐—ฒ ๐—ฟ๐—ฒ๐—น๐—ฒ๐—ฎ๐˜€๐—ฒ๐˜€ ๐—ฆ๐˜‚๐—ฝ๐—ฒ๐—ฟ๐—ก๐—ผ๐˜ƒ๐—ฎ, ๐—ฏ๐—ฒ๐˜๐˜๐—ฒ๐—ฟ ๐—ณ๐—ถ๐—ป๐—ฒ-๐˜๐˜‚๐—ป๐—ฒ ๐—ผ๐—ณ ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ-๐Ÿฏ.๐Ÿญ-๐Ÿณ๐Ÿฌ๐—•!

2๏ธโƒฃ versions: 70B and 8B
๐Ÿง  Trained by distilling logits from Llama-3.1-405B
๐Ÿฅ Used a clever compression method to reduce dataset weight from 2.9 Petabytes down to 50GB (may share it in a paper)
โš™๏ธ Not all benchmarks are improved: GPQA and MUSR go down a slight bit
๐Ÿค— 8B weights are available on HF (not the 70B)

Read their blog post ๐Ÿ‘‰ https://blog.arcee.ai/arcee-supernova-training-pipeline-and-model-composition/
Model weights (8B) ๐Ÿ‘‰ arcee-ai/Llama-3.1-SuperNova-Lite
In this post