Post
2173
๐๐ฟ๐ฐ๐ฒ๐ฒ ๐ฟ๐ฒ๐น๐ฒ๐ฎ๐๐ฒ๐ ๐ฆ๐๐ฝ๐ฒ๐ฟ๐ก๐ผ๐๐ฎ, ๐ฏ๐ฒ๐๐๐ฒ๐ฟ ๐ณ๐ถ๐ป๐ฒ-๐๐๐ป๐ฒ ๐ผ๐ณ ๐๐น๐ฎ๐บ๐ฎ-๐ฏ.๐ญ-๐ณ๐ฌ๐!
2๏ธโฃ versions: 70B and 8B
๐ง Trained by distilling logits from Llama-3.1-405B
๐ฅ Used a clever compression method to reduce dataset weight from 2.9 Petabytes down to 50GB (may share it in a paper)
โ๏ธ Not all benchmarks are improved: GPQA and MUSR go down a slight bit
๐ค 8B weights are available on HF (not the 70B)
Read their blog post ๐ https://blog.arcee.ai/arcee-supernova-training-pipeline-and-model-composition/
Model weights (8B) ๐ arcee-ai/Llama-3.1-SuperNova-Lite
2๏ธโฃ versions: 70B and 8B
๐ง Trained by distilling logits from Llama-3.1-405B
๐ฅ Used a clever compression method to reduce dataset weight from 2.9 Petabytes down to 50GB (may share it in a paper)
โ๏ธ Not all benchmarks are improved: GPQA and MUSR go down a slight bit
๐ค 8B weights are available on HF (not the 70B)
Read their blog post ๐ https://blog.arcee.ai/arcee-supernova-training-pipeline-and-model-composition/
Model weights (8B) ๐ arcee-ai/Llama-3.1-SuperNova-Lite