@m-ric on Hugging Face: "𝗛𝘂𝗻𝘆𝘂𝗮𝗻-𝗟𝗮𝗿𝗴𝗲 𝗷𝘂𝘀𝘁 𝗿𝗲𝗹𝗲𝗮𝘀𝗲𝗱 𝗯𝘆 𝗧𝗲𝗻𝗰𝗲𝗻𝘁:…"

Post

2484

𝗛𝘂𝗻𝘆𝘂𝗮𝗻-𝗟𝗮𝗿𝗴𝗲 𝗷𝘂𝘀𝘁 𝗿𝗲𝗹𝗲𝗮𝘀𝗲𝗱 𝗯𝘆 𝗧𝗲𝗻𝗰𝗲𝗻𝘁: 𝗟𝗮𝗿𝗴𝗲𝘀𝘁 𝗲𝘃𝗲𝗿 𝗼𝗽𝗲𝗻 𝗠𝗼𝗘 𝗟𝗟𝗠, 𝗼𝗻𝗹𝘆 𝟱𝟮𝗕 𝗮𝗰𝘁𝗶𝘃𝗲 𝗽𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿𝘀 𝗯𝘂𝘁 𝗯𝗲𝗮𝘁𝘀 𝗟𝗟𝗮𝗠𝗔 𝟯.𝟭-𝟰𝟬𝟱𝗕 𝗼𝗻 𝗺𝗼𝘀𝘁 𝗮𝗰𝗮𝗱𝗲𝗺𝗶𝗰 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀 🚀

⚡ Mixture of Experts (MoE) architecture: 389 B parameters in total, but only 52B are activated for any input

🧪 Trained on 7T tokens, including 1.5T tokens of synthetic data

🏗️ Architecture : Novel "recycle routing" prevents token dropping when experts are overrloaded

📊 Great benchmark results: Surpasses Llama-3-405B-Instruct in most benchmarks although it has 8x fewer active parameters
‣ Impressive perf on MATH: 77.4

🐋 Large context length: up to 256K tokens

🔒 License:
‣ Commercial use allowed, except if your products have >100M monthly active users
‣ No access in the EU

🤗 Model weights available on HF!

Read the full paper here 👉 Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent (2411.02265)

Join the conversation