Model Summary
PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning. Paper: https://arxiv.org/abs/2408.13359
This is a GGUF quantized version.
Usage
Requires latest llama.cpp to run.
Generation
This is a simple example of how to use the PowerMoe GGUF:
./llama-cli -m PowerMoE4x800M_q3km.gguf -p "How about a snack?"
- Downloads last month
- 12
Model tree for TobDeBer/PowerMoe-3b-GGUF
Base model
ibm/PowerMoE-3bEvaluation results
- accuracy-norm on ARCself-reported58.100
- accuracy on BoolQself-reported65.000
- accuracy-norm on Hellaswagself-reported71.500
- accuracy-norm on OpenBookQAself-reported41.000
- accuracy-norm on PIQAself-reported79.100
- accuracy-norm on Winograndeself-reported65.000
- accuracy on MMLU (5 shot)self-reported42.800
- accuracy on GSM8k (5 shot)self-reported25.900
- accuracy on math (4 shot)self-reported14.800
- pass@1 on humanevalself-reported20.100