metadata

license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - generated_from_trainer
model-index:
  - name: spin-margin2
    results: []

spin-margin2

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0010
Rewards/real: -0.7975
Rewards/generated: -20.4822
Rewards/accuracies: 1.0
Rewards/margins: 19.6846
Logps/generated: -303.8466
Logps/real: -141.0674
Logits/generated: -2.6068
Logits/real: -2.3492

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/real	Rewards/generated	Rewards/accuracies	Rewards/margins	Logps/generated	Logps/real	Logits/generated	Logits/real
0.0043	0.19	100	0.0049	0.9120	-9.6012	1.0	10.5132	-195.0367	-123.9721	-2.7982	-2.5652
0.0034	0.39	200	0.0024	-0.0739	-14.1834	1.0	14.1095	-240.8593	-133.8314	-2.8109	-2.5347
0.0007	0.58	300	0.0012	-0.2381	-16.9127	1.0	16.6746	-268.1524	-135.4731	-2.7308	-2.4046
0.0016	0.78	400	0.0010	-1.1878	-19.5719	1.0	18.3841	-294.7439	-144.9703	-2.6559	-2.3917
0.0001	0.97	500	0.0010	-0.7975	-20.4822	1.0	19.6846	-303.8466	-141.0674	-2.6068	-2.3492

Framework versions

Transformers 4.37.0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.15.2