Post
3121
Very interesting model just released by MyShell:
jetmoe/jetmoe-8b . It's a 8B-parameters MoE LLM so 2.2B active parameters, really efficient.
Main characteristics:
- impressive performances for its size (beating meta-llama/Llama-2-7b and huggyllama/llama-13b)
- combine Mixture of Attention heads (MoA) and Mixture of MLP Experts (MoE) – 8 experts with 2 being active for each token
- trained on a rather limited 1.25T tokens from publicly available datasets – training recipe follows the MiniCPM's two-phases training method => first time I see this for a 2B+ model
- $100k to train
- open weights - open sharing of recipes - open dataset - open code => ♡
- still interesting room to improve performances (be it only by training longer)
Links:
- report: https://research.myshell.ai/jetmoe
- model: jetmoe/jetmoe-8b
- code: https://github.com/myshell-ai/JetMoE
Note: I actually detailed all of the MiniCPM schedule, Mixture-of-expert (MoE) and many of the datasets used in this work in my recent little guide to building LLMs in 2024, so feel free to check it out if you want to learn more on these topics: https://www.youtube.com/watch?v=2-SPH9hIKT8
Main characteristics:
- impressive performances for its size (beating meta-llama/Llama-2-7b and huggyllama/llama-13b)
- combine Mixture of Attention heads (MoA) and Mixture of MLP Experts (MoE) – 8 experts with 2 being active for each token
- trained on a rather limited 1.25T tokens from publicly available datasets – training recipe follows the MiniCPM's two-phases training method => first time I see this for a 2B+ model
- $100k to train
- open weights - open sharing of recipes - open dataset - open code => ♡
- still interesting room to improve performances (be it only by training longer)
Links:
- report: https://research.myshell.ai/jetmoe
- model: jetmoe/jetmoe-8b
- code: https://github.com/myshell-ai/JetMoE
Note: I actually detailed all of the MiniCPM schedule, Mixture-of-expert (MoE) and many of the datasets used in this work in my recent little guide to building LLMs in 2024, so feel free to check it out if you want to learn more on these topics: https://www.youtube.com/watch?v=2-SPH9hIKT8