Text Generation
Transformers
Safetensors
mistral
Generated from Trainer
conversational
text-generation-inference
Inference Endpoints
dvilasuero's picture
dvilasuero HF staff
Update README.md
9bd3ab7 verified
metadata
license: apache-2.0
base_model: argilla/zephyr-7b-spin-iter2-v0
tags:
  - generated_from_trainer
model-index:
  - name: zephyr-7b-spin-iter3-v0
    results: []
datasets:
  - argilla/10k_prompts_SPIN_iter3_zephyr_top
  - argilla/10k_prompts_SPIN_iter2_zephyr_top
  - DIBT/10k_prompts_ranked

zephyr-7b-spin-iter3-v0

A model matching the results of SPIN with very little data (30x less), carefully curated by the amazing Data Is Better Together community

Built with Distilabel

This model is a fine-tuned version of argilla/zephyr-7b-spin-iter2-v0 on the argilla/10k_prompts_SPIN_iter3_zephyr_top and the argilla/10k_prompts_SPIN_iter2_zephyr_top dataset.

Check this repo for full reproducible code using the original SPIN implementation and distilabel.

If you want to contribute to high quality datasets like this, contribute to the DIBT prompt collective initiative.

MT-Bench results

Model 1st Turn Score 2nd Turn Score Average Score SPIN paper Score
zephyr-7b-sft-full 6.6625 6.0250 6.34375 5.94
zephyr-7b-spin-iter0-v0 6.64375 6.1750 6.409375 6.46
zephyr-7b-spin-iter1-v0 6.90625 6.3000 6.603125 6.65
zephyr-7b-spin-iter2-v0 7.1375 6.3125 6.725000 6.78
zephyr-7b-spin-iter3-v0 7.09375 6.4500 6.771875 -

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2.0

Training results

Training Loss Epoch Step Validation Loss Rewards/real Rewards/generated Rewards/accuracies Rewards/margins Logps/generated Logps/real Logits/generated Logits/real
0.2928 0.49 25 0.3951 -2.6212 -20.3268 0.9062 17.7056 -700.5638 -278.0876 -2.8098 -2.8090
0.1487 0.97 50 0.1319 -2.9077 -29.1459 0.9375 26.2382 -702.3276 -278.1449 -2.8218 -2.8066
0.006 1.46 75 0.1269 -2.6037 -29.1519 0.9583 26.5482 -702.3289 -278.0841 -2.8175 -2.8037
0.0086 1.94 100 0.1099 -2.9181 -29.6970 0.9271 26.7789 -702.4378 -278.1470 -2.8177 -2.8051

Framework versions

  • Transformers 4.37.0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2