File size: 7,624 Bytes
6e44a4f
 
 
 
 
2eb22f9
6e44a4f
 
 
 
 
 
 
 
 
 
2eb22f9
6e44a4f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
- generated_from_trainer
- alignment-handbook
model-index:
- name: zephyr-7b-dpo-full
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# zephyr-7b-dpo-full

This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6910
- Rewards/chosen: -3.9218
- Rewards/rejected: -8.2942
- Rewards/accuracies: 0.8125
- Rewards/margins: 4.3724
- Logps/rejected: -279.5480
- Logps/chosen: -293.9998
- Logits/rejected: -2.6725
- Logits/chosen: -2.7826

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 32
- total_train_batch_size: 64
- total_eval_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.5386        | 0.1   | 100  | 0.5208          | 0.0564         | -0.7521          | 0.7188             | 0.8085          | -204.1269      | -254.2179    | -3.0136         | -3.0550       |
| 0.4931        | 0.21  | 200  | 0.4882          | -0.0132        | -1.2683          | 0.7812             | 1.2551          | -209.2889      | -254.9136    | -3.1056         | -3.1407       |
| 0.479         | 0.31  | 300  | 0.5038          | -0.1035        | -1.4012          | 0.7812             | 1.2978          | -210.6186      | -255.8163    | -3.0809         | -3.1328       |
| 0.5052        | 0.41  | 400  | 0.5154          | -0.1923        | -1.8783          | 0.7969             | 1.6860          | -215.3891      | -256.7043    | -2.9104         | -2.9644       |
| 0.4513        | 0.52  | 500  | 0.4979          | 0.0207         | -1.6562          | 0.7969             | 1.6769          | -213.1682      | -254.5742    | -3.0061         | -3.0657       |
| 0.4905        | 0.62  | 600  | 0.4907          | -0.0944        | -1.5847          | 0.7656             | 1.4903          | -212.4527      | -255.7256    | -2.9374         | -3.0170       |
| 0.5609        | 0.72  | 700  | 0.4928          | -0.4249        | -1.7238          | 0.7656             | 1.2989          | -213.8441      | -259.0304    | -2.9475         | -3.0128       |
| 0.5338        | 0.83  | 800  | 0.4767          | -0.1567        | -1.9114          | 0.8125             | 1.7547          | -215.7200      | -256.3484    | -2.8455         | -2.9183       |
| 0.5039        | 0.93  | 900  | 0.4854          | -0.0886        | -1.6900          | 0.75               | 1.6014          | -213.5057      | -255.6674    | -2.8295         | -2.9093       |
| 0.0776        | 1.03  | 1000 | 0.4938          | -0.4848        | -2.5287          | 0.7656             | 2.0438          | -221.8927      | -259.6300    | -2.7580         | -2.8437       |
| 0.0901        | 1.14  | 1100 | 0.5071          | -1.0800        | -3.2419          | 0.7812             | 2.1619          | -229.0247      | -265.5817    | -2.8036         | -2.8858       |
| 0.0828        | 1.24  | 1200 | 0.5159          | -0.9682        | -3.4087          | 0.7812             | 2.4406          | -230.6935      | -264.4635    | -2.7961         | -2.8708       |
| 0.0916        | 1.34  | 1300 | 0.5222          | -1.0832        | -3.5535          | 0.7969             | 2.4703          | -232.1411      | -265.6135    | -2.8019         | -2.8754       |
| 0.0965        | 1.44  | 1400 | 0.5204          | -1.1951        | -3.5681          | 0.7969             | 2.3731          | -232.2874      | -266.7324    | -2.8058         | -2.8884       |
| 0.0716        | 1.55  | 1500 | 0.5381          | -1.6588        | -4.0838          | 0.7188             | 2.4250          | -237.4441      | -271.3697    | -2.7979         | -2.8862       |
| 0.0957        | 1.65  | 1600 | 0.5151          | -1.1746        | -3.7477          | 0.75               | 2.5731          | -234.0834      | -266.5278    | -2.7960         | -2.8976       |
| 0.0645        | 1.75  | 1700 | 0.5393          | -1.7591        | -4.6011          | 0.8125             | 2.8419          | -242.6167      | -272.3728    | -2.7483         | -2.8592       |
| 0.0838        | 1.86  | 1800 | 0.5385          | -1.6606        | -4.4648          | 0.7656             | 2.8042          | -241.2545      | -271.3875    | -2.7311         | -2.8383       |
| 0.1106        | 1.96  | 1900 | 0.5322          | -1.5621        | -3.9779          | 0.7969             | 2.4158          | -236.3850      | -270.4025    | -2.8194         | -2.9133       |
| 0.0174        | 2.06  | 2000 | 0.5921          | -2.4968        | -5.9514          | 0.7969             | 3.4546          | -256.1199      | -279.7498    | -2.7579         | -2.8631       |
| 0.0134        | 2.17  | 2100 | 0.6247          | -2.9002        | -6.4277          | 0.7969             | 3.5275          | -260.8829      | -283.7838    | -2.7316         | -2.8319       |
| 0.0148        | 2.27  | 2200 | 0.6402          | -3.2520        | -7.0627          | 0.7812             | 3.8106          | -267.2330      | -287.3020    | -2.6991         | -2.8064       |
| 0.0142        | 2.37  | 2300 | 0.6563          | -3.2715        | -7.1303          | 0.8281             | 3.8588          | -267.9088      | -287.4962    | -2.6871         | -2.7992       |
| 0.011         | 2.48  | 2400 | 0.6605          | -3.2996        | -7.2258          | 0.7969             | 3.9262          | -268.8643      | -287.7776    | -2.6555         | -2.7717       |
| 0.0065        | 2.58  | 2500 | 0.6935          | -3.6399        | -8.0232          | 0.8125             | 4.3832          | -276.8377      | -291.1808    | -2.6780         | -2.7902       |
| 0.0089        | 2.68  | 2600 | 0.6773          | -3.4822        | -7.8182          | 0.8125             | 4.3360          | -274.7881      | -289.6033    | -2.6885         | -2.7994       |
| 0.0102        | 2.79  | 2700 | 0.6813          | -3.5909        | -7.8097          | 0.8281             | 4.2187          | -274.7028      | -290.6908    | -2.6877         | -2.7970       |
| 0.0136        | 2.89  | 2800 | 0.6892          | -3.8236        | -8.1490          | 0.8125             | 4.3254          | -278.0957      | -293.0175    | -2.6765         | -2.7862       |
| 0.0091        | 2.99  | 2900 | 0.6913          | -3.9199        | -8.3004          | 0.8125             | 4.3806          | -279.6104      | -293.9802    | -2.6728         | -2.7830       |


### Framework versions

- Transformers 4.35.0
- Pytorch 2.1.0+cu118
- Datasets 2.14.6
- Tokenizers 0.14.1