Model Card for mamba-2.8b-slimpj-OpenOrca_1ep

This is a finetune of mamba-2.8b-slimpj for instruction following using the OpenOrca dataset.

Model Details

Model Description

This is a finetune of the mamba reference model mamba-2.8b-slimpj from the paper https://arxiv.org/abs/2312.00752

It has been fine-tuned for instruction following using the OpenOrca dataset and training for 1 epoch.

Model type: Mamba State Space Model (mamba_ssm)
Finetuned from model: https://huggingface.co/state-spaces/mamba-2.8b-slimpj

Uses

This model is intended to evaluate fine-tuning results on mamba models.

Usage

Prompt structure

The prompt structure used in fine-tuning is alpaca format:

"### Human:\n%question%\n\n### AI response:\n%response%"

Training Details

Training Data

https://huggingface.co/datasets/Open-Orca/OpenOrca

Training Procedure

Trained using text-generation-webui with code from the mamba_ssm pull request.

Training Hyperparameters

Training regime: Trained in bfloat16 with the following parameters:

{
  "trained_model_name": "mamba-2.8b-slimpj-OpenOrc_1ep",
  "save_steps": 500000.0,
  "micro_batch_size": 4,
  "batch_size": 128,
  "epochs": 1.0,
  "learning_rate": "3e-4",
  "lr_scheduler_type": "linear",
  "cutoff_len": 256,
  "dataset": "OpenOrca",
  "eval_dataset": "None",
  "format": "openorca-format",
  "warmup_steps": 100.0,
  "optimizer": "paged_adamw_8bit",
  "hard_cut_string": "\\n\\n\\n",
  "add_eos_token": false,
  "min_chars": 0.0,
}

Reported train_loss was 0.6762700151924311

Results

lm-evaluation-harness results for final model

mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)

Tasks	Version	Filter	Metric	Value		Stderr
arc_challenge	1	none	acc	0.2594	±	0.0128
		none	acc_norm	0.2935	±	0.0133
arc_easy	1	none	acc	0.4390	±	0.0102
		none	acc_norm	0.4032	±	0.0101
boolq	2	none	acc	0.5801	±	0.0086
lambada_openai	1	none	perplexity	27.8582	±	1.1183
		none	acc	0.3683	±	0.0067
openbookqa	1	none	acc	0.2500	±	0.0194
		none	acc_norm	0.3700	±	0.0216
piqa	1	none	acc	0.6817	±	0.0109
		none	acc_norm	0.6839	±	0.0108
winogrande	1	none	acc	0.5770	±	0.0139

lm-evaluation-harness results after half epoch

mamba_ssm (pretrained=mamba-2.8b-slimpj-OpenOrca_1ep-checkpoints/checkpoint-500000), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)

Tasks	Version	Filter	Metric	Value		Stderr
arc_challenge	1	none	acc	0.2602	±	0.0128
		none	acc_norm	0.2833	±	0.0132
arc_easy	1	none	acc	0.4533	±	0.0102
		none	acc_norm	0.4125	±	0.0101
boolq	2	none	acc	0.4095	±	0.0086
lambada_openai	1	none	perplexity	30.4832	±	1.2403
		none	acc	0.3551	±	0.0067
openbookqa	1	none	acc	0.2420	±	0.0192
		none	acc_norm	0.3640	±	0.0215
piqa	1	none	acc	0.6812	±	0.0109
		none	acc_norm	0.6730	±	0.0109
winogrande	1	none	acc	0.5588	±	0.0140

Reference lm-evaluation-harness results for the base model mamba-2.8b-slimpj without fine-tuning

mamba_ssm (pretrained=mamba-2.8b-slimpj), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: auto (32)

Tasks	Version	Filter	Metric	Value		Stderr
arc_challenge	1	none	acc	0.3882	±	0.0142
		none	acc_norm	0.4155	±	0.0144
arc_easy	1	none	acc	0.7264	±	0.0091
		none	acc_norm	0.6814	±	0.0096
boolq	2	none	acc	0.7107	±	0.0079
lambada_openai	1	none	perplexity	5.8770	±	0.1881
		none	acc	0.6427	±	0.0067
openbookqa	1	none	acc	0.2860	±	0.0202
		none	acc_norm	0.3980	±	0.0219
piqa	1	none	acc	0.7709	±	0.0098
		none	acc_norm	0.7813	±	0.0096
winogrande	1	none	acc	0.6614	±	0.0133

Summary

The models measured perplexity and accuracy got worse, but it's known that that can be an effect of fine-tuning. Perplexity and accuracy improved in the second half of the training, so it's likely that the inital worsening was caused by forcing a prompt structure onto the base model, which was trained only on unstructured text.

The answer quality as percieved by users is yet to be evaluated.

Environmental Impact

Hardware Type: RTX 3090
Hours used: 118