---
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
- generated_from_trainer
model-index:
- name: spin-v-diverse
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# spin-v-diverse

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0027
- Rewards/real: -2.6757
- Rewards/generated: -21.8763
- Rewards/accuracies: 1.0
- Rewards/margins: 19.2006
- Logps/generated: -346.5988
- Logps/real: -161.4224
- Logits/generated: -2.5880
- Logits/real: -2.4315

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/real | Rewards/generated | Rewards/accuracies | Rewards/margins | Logps/generated | Logps/real | Logits/generated | Logits/real |
|:-------------:|:-----:|:----:|:---------------:|:------------:|:-----------------:|:------------------:|:---------------:|:---------------:|:----------:|:----------------:|:-----------:|
| 0.0257        | 0.06  | 100  | 0.0288          | 1.0058       | -5.7769           | 0.9928             | 6.7828          | -185.6055       | -124.6072  | -2.8843          | -2.6520     |
| 0.0096        | 0.13  | 200  | 0.0126          | -0.1554      | -12.6258          | 0.9984             | 12.4704         | -254.0941       | -136.2193  | -2.5945          | -2.2413     |
| 0.024         | 0.19  | 300  | 0.0126          | 0.1173       | -11.0946          | 0.9968             | 11.2119         | -238.7820       | -133.4925  | -2.7227          | -2.5040     |
| 0.0065        | 0.26  | 400  | 0.0082          | -0.1964      | -13.6305          | 0.9984             | 13.4341         | -264.1411       | -136.6298  | -2.7028          | -2.4738     |
| 0.0073        | 0.32  | 500  | 0.0081          | 0.0850       | -13.4368          | 0.9984             | 13.5218         | -262.2040       | -133.8156  | -2.6477          | -2.4285     |
| 0.0035        | 0.38  | 600  | 0.0071          | -2.8739      | -18.4641          | 1.0                | 15.5902         | -312.4772       | -163.4043  | -2.5956          | -2.3811     |
| 0.0097        | 0.45  | 700  | 0.0077          | -2.2908      | -16.9898          | 0.9984             | 14.6989         | -297.7338       | -157.5739  | -2.5210          | -2.2045     |
| 0.0052        | 0.51  | 800  | 0.0065          | -1.6983      | -19.8323          | 0.9992             | 18.1340         | -326.1593       | -151.6484  | -2.7183          | -2.5409     |
| 0.0037        | 0.58  | 900  | 0.0067          | -1.2826      | -16.6590          | 0.9984             | 15.3763         | -294.4258       | -147.4920  | -2.6881          | -2.5334     |
| 0.0023        | 0.64  | 1000 | 0.0047          | -1.9423      | -19.2263          | 1.0                | 17.2840         | -320.0990       | -154.0886  | -2.6404          | -2.4694     |
| 0.0041        | 0.7   | 1100 | 0.0050          | -2.4756      | -19.3047          | 1.0                | 16.8290         | -320.8827       | -159.4218  | -2.6368          | -2.4329     |
| 0.0033        | 0.77  | 1200 | 0.0037          | -2.8600      | -20.2625          | 1.0                | 17.4025         | -330.4614       | -163.2654  | -2.6240          | -2.4681     |
| 0.0042        | 0.83  | 1300 | 0.0032          | -2.6738      | -20.7669          | 1.0                | 18.0931         | -335.5057       | -161.4039  | -2.5974          | -2.4463     |
| 0.0031        | 0.9   | 1400 | 0.0030          | -2.1767      | -20.6456          | 0.9992             | 18.4690         | -334.2925       | -156.4323  | -2.6144          | -2.4595     |
| 0.0015        | 0.96  | 1500 | 0.0027          | -2.6757      | -21.8763          | 1.0                | 19.2006         | -346.5988       | -161.4224  | -2.5880          | -2.4315     |


### Framework versions

- Transformers 4.37.0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2