metadata

license: apache-2.0
base_model: argilla/zephyr-7b-spin-iter2-v0
tags:
  - generated_from_trainer
model-index:
  - name: zephyr-7b-spin-iter3-v0
    results: []
datasets:
  - argilla/10k_prompts_SPIN_iter3_zephyr_top
  - argilla/10k_prompts_SPIN_iter2_zephyr_top
  - DIBT/10k_prompts_ranked

zephyr-7b-spin-iter3-v0

A model matching the results of SPIN with very little data (30x less), carefully curated by the amazing Data Is Better Together community

This model is a fine-tuned version of argilla/zephyr-7b-spin-iter2-v0 on the argilla/10k_prompts_SPIN_iter3_zephyr_top and the argilla/10k_prompts_SPIN_iter2_zephyr_top dataset.

Check this repo for full reproducible code using the original SPIN implementation and distilabel.

If you want to contribute to high quality datasets like this, contribute to the DIBT prompt collective initiative.

MT-Bench results

Model	1st Turn Score	2nd Turn Score	Average Score	SPIN paper Score
zephyr-7b-sft-full	6.6625	6.0250	6.34375	5.94
zephyr-7b-spin-iter0-v0	6.64375	6.1750	6.409375	6.46
zephyr-7b-spin-iter1-v0	6.90625	6.3000	6.603125	6.65
zephyr-7b-spin-iter2-v0	7.1375	6.3125	6.725000	6.78
zephyr-7b-spin-iter3-v0	7.09375	6.4500	6.771875	-

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/real	Rewards/generated	Rewards/accuracies	Rewards/margins	Logps/generated	Logps/real	Logits/generated	Logits/real
0.2928	0.49	25	0.3951	-2.6212	-20.3268	0.9062	17.7056	-700.5638	-278.0876	-2.8098	-2.8090
0.1487	0.97	50	0.1319	-2.9077	-29.1459	0.9375	26.2382	-702.3276	-278.1449	-2.8218	-2.8066
0.006	1.46	75	0.1269	-2.6037	-29.1519	0.9583	26.5482	-702.3289	-278.0841	-2.8175	-2.8037
0.0086	1.94	100	0.1099	-2.9181	-29.6970	0.9271	26.7789	-702.4378	-278.1470	-2.8177	-2.8051

Framework versions

Transformers 4.37.0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2