Edit model card

Sparse-Llama-3.1-8B-gsm8k-2of4

Model Overview

  • Model Architecture: Llama-3.1-8B
    • Input: Text
    • Output: Text
  • Model Optimizations:
    • Sparsity: 2:4
  • Release Date: 11/21/2024
  • Version: 1.0
  • License(s): llama3.1
  • Model Developers: Neural Magic

This is AI model especialized in grade-school math obtained by fine-tuning the 2:4 sparse Sparse-Llama-3.1-8B-2of4 on the GSM8k dataset. It achieves 66.9% 0-shot accuracy on the test set of GSM8k, compared to 66.3% for the fine-tuned dense model Llama-3.1-8B-gsm8k — demonstrating over 100% accuracy recovery. In constrast, the pretrained Llama-3.1-8B achieves 50.7% 5-shot accuracy and the sparse foundational Sparse-Llama-3.1-8B-2of4 model achieves 56.3% 5-shot accuracy.

Model Optimizations

This inherits the optimizations from its parent, Sparse-Llama-3.1-8B-2of4. Namely, all linear operators within transformer blocks were pruned to the 2:4 sparsity pattern: in each group of four weights, two are retained while two are pruned.

Deployment with vLLM

This model can be deployed efficiently using the vLLM backend. vLLM aslo supports OpenAI-compatible serving. See the documentation for more details.

Evaluation

This model was evaluated on the lm-evaluation-harness.

Accuracy

GSM8k Benchmark

Metric Llama-3.1-8B
(5-shot)
Sparse-Llama-3.1-8B-2of4
(5-shot)
Llama-3.1-8B-gsm8k
(0-shot)
Sparse-Llama-3.1-8B-gsm8k-2of4
(0-shot)
Accuracy 50.7% 56.3% 66.3% 66.9%
Downloads last month
72
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4

Finetuned
(3)
this model
Quantizations
1 model

Dataset used to train neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4

Collection including neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4