spin-margin2 / README.md
AmberYifan's picture
Model save
4e5f5c5 verified
metadata
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - generated_from_trainer
model-index:
  - name: spin-margin2
    results: []

spin-margin2

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0010
  • Rewards/real: -0.7975
  • Rewards/generated: -20.4822
  • Rewards/accuracies: 1.0
  • Rewards/margins: 19.6846
  • Logps/generated: -303.8466
  • Logps/real: -141.0674
  • Logits/generated: -2.6068
  • Logits/real: -2.3492

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/real Rewards/generated Rewards/accuracies Rewards/margins Logps/generated Logps/real Logits/generated Logits/real
0.0043 0.19 100 0.0049 0.9120 -9.6012 1.0 10.5132 -195.0367 -123.9721 -2.7982 -2.5652
0.0034 0.39 200 0.0024 -0.0739 -14.1834 1.0 14.1095 -240.8593 -133.8314 -2.8109 -2.5347
0.0007 0.58 300 0.0012 -0.2381 -16.9127 1.0 16.6746 -268.1524 -135.4731 -2.7308 -2.4046
0.0016 0.78 400 0.0010 -1.1878 -19.5719 1.0 18.3841 -294.7439 -144.9703 -2.6559 -2.3917
0.0001 0.97 500 0.0010 -0.7975 -20.4822 1.0 19.6846 -303.8466 -141.0674 -2.6068 -2.3492

Framework versions

  • Transformers 4.37.0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2