Edit model card

Model Summary

PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning. Paper: https://arxiv.org/abs/2408.13359

This is a GGUF quantized version.

Usage

Requires latest llama.cpp to run.

Generation

This is a simple example of how to use the PowerMoe GGUF:

./llama-cli -m PowerMoE4x800M_q3km.gguf -p "How about a snack?"

Downloads last month
12
GGUF
Model size
3.51B params
Architecture
granite
Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for TobDeBer/PowerMoe-3b-GGUF

Base model

ibm/PowerMoE-3b
Quantized
(4)
this model

Evaluation results