Model save

64b43a7 10 months ago

7.5 kB

	---
	license: apache-2.0
	base_model: alignment-handbook/zephyr-7b-sft-full
	tags:
	- generated_from_trainer
	model-index:
	- name: zephyr-7b-dpo-full
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# zephyr-7b-dpo-full

	This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6929
	- Rewards/chosen: -2.2624
	- Rewards/rejected: -5.6900
	- Rewards/accuracies: 0.7619
	- Rewards/margins: 3.4275
	- Logps/rejected: -348.8656
	- Logps/chosen: -389.8162
	- Logits/rejected: -2.8188
	- Logits/chosen: -2.8149

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-07
	- train_batch_size: 8
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 64
	- total_eval_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.5504 \| 0.1 \| 100 \| 0.5407 \| 0.5287 \| -0.1810 \| 0.7579 \| 0.7098 \| -293.7762 \| -361.9044 \| -2.9360 \| -2.9366 \|
	\| 0.541 \| 0.21 \| 200 \| 0.5221 \| 0.6692 \| -0.5569 \| 0.7698 \| 1.2261 \| -297.5352 \| -360.5003 \| -2.9786 \| -2.9802 \|
	\| 0.6034 \| 0.31 \| 300 \| 0.5459 \| 0.7375 \| -0.4578 \| 0.7619 \| 1.1953 \| -296.5442 \| -359.8170 \| -3.0234 \| -3.0360 \|
	\| 0.5944 \| 0.41 \| 400 \| 0.5573 \| 0.4979 \| -0.8938 \| 0.7698 \| 1.3917 \| -300.9036 \| -362.2126 \| -2.9639 \| -2.9621 \|
	\| 0.5512 \| 0.52 \| 500 \| 0.5257 \| 0.4355 \| -1.0167 \| 0.7579 \| 1.4522 \| -302.1330 \| -362.8364 \| -3.0485 \| -3.0406 \|
	\| 0.5879 \| 0.62 \| 600 \| 0.5288 \| 0.4707 \| -0.9291 \| 0.7579 \| 1.3998 \| -301.2572 \| -362.4848 \| -2.9911 \| -2.9869 \|
	\| 0.6773 \| 0.72 \| 700 \| 0.5853 \| 0.0472 \| -0.9185 \| 0.7460 \| 0.9657 \| -301.1505 \| -366.7194 \| -3.0564 \| -3.0418 \|
	\| 0.5263 \| 0.83 \| 800 \| 0.5151 \| 0.2246 \| -1.1914 \| 0.7619 \| 1.4160 \| -303.8796 \| -364.9458 \| -2.9662 \| -2.9637 \|
	\| 0.5366 \| 0.93 \| 900 \| 0.5134 \| 0.2511 \| -1.0873 \| 0.75 \| 1.3384 \| -302.8385 \| -364.6808 \| -2.9824 \| -2.9907 \|
	\| 0.1034 \| 1.03 \| 1000 \| 0.5107 \| 0.3073 \| -1.4321 \| 0.7619 \| 1.7394 \| -306.2867 \| -364.1185 \| -2.9096 \| -2.9202 \|
	\| 0.1114 \| 1.14 \| 1100 \| 0.5344 \| 0.1332 \| -1.8449 \| 0.7460 \| 1.9781 \| -310.4148 \| -365.8598 \| -2.9561 \| -2.9666 \|
	\| 0.1338 \| 1.24 \| 1200 \| 0.5350 \| -0.0814 \| -2.1418 \| 0.7738 \| 2.0604 \| -313.3835 \| -368.0058 \| -2.9460 \| -2.9508 \|
	\| 0.0979 \| 1.34 \| 1300 \| 0.5474 \| -0.0945 \| -2.2500 \| 0.7659 \| 2.1554 \| -314.4657 \| -368.1371 \| -2.9172 \| -2.9201 \|
	\| 0.1366 \| 1.44 \| 1400 \| 0.5440 \| -0.4749 \| -2.3968 \| 0.7579 \| 1.9219 \| -315.9338 \| -371.9403 \| -2.9134 \| -2.9144 \|
	\| 0.1042 \| 1.55 \| 1500 \| 0.5524 \| -0.5014 \| -2.6803 \| 0.7698 \| 2.1789 \| -318.7686 \| -372.2054 \| -2.9361 \| -2.9306 \|
	\| 0.1313 \| 1.65 \| 1600 \| 0.5333 \| -0.2234 \| -2.1867 \| 0.75 \| 1.9634 \| -313.8333 \| -369.4255 \| -2.9060 \| -2.8999 \|
	\| 0.1629 \| 1.75 \| 1700 \| 0.5655 \| -0.3904 \| -2.7591 \| 0.75 \| 2.3687 \| -319.5572 \| -371.0959 \| -2.9182 \| -2.9096 \|
	\| 0.0993 \| 1.86 \| 1800 \| 0.5605 \| -0.7117 \| -2.9701 \| 0.7460 \| 2.2584 \| -321.6668 \| -374.3084 \| -2.8602 \| -2.8477 \|
	\| 0.1116 \| 1.96 \| 1900 \| 0.5649 \| -0.6379 \| -2.7259 \| 0.7540 \| 2.0880 \| -319.2250 \| -373.5707 \| -2.9277 \| -2.9150 \|
	\| 0.0193 \| 2.06 \| 2000 \| 0.6122 \| -0.9412 \| -3.7861 \| 0.7619 \| 2.8449 \| -329.8275 \| -376.6041 \| -2.8919 \| -2.8825 \|
	\| 0.0175 \| 2.17 \| 2100 \| 0.6523 \| -1.6027 \| -4.6832 \| 0.7659 \| 3.0805 \| -338.7977 \| -383.2186 \| -2.8474 \| -2.8393 \|
	\| 0.0131 \| 2.27 \| 2200 \| 0.6702 \| -1.8899 \| -5.0304 \| 0.7421 \| 3.1406 \| -342.2704 \| -386.0904 \| -2.8128 \| -2.8069 \|
	\| 0.0243 \| 2.37 \| 2300 \| 0.6559 \| -1.6715 \| -4.7369 \| 0.7698 \| 3.0654 \| -339.3347 \| -383.9066 \| -2.8547 \| -2.8490 \|
	\| 0.0142 \| 2.48 \| 2400 \| 0.6734 \| -1.9463 \| -5.1224 \| 0.7579 \| 3.1761 \| -343.1900 \| -386.6547 \| -2.8394 \| -2.8352 \|
	\| 0.0211 \| 2.58 \| 2500 \| 0.6890 \| -2.1114 \| -5.5608 \| 0.7698 \| 3.4494 \| -347.5744 \| -388.3059 \| -2.8369 \| -2.8333 \|
	\| 0.011 \| 2.68 \| 2600 \| 0.6999 \| -2.3020 \| -5.8073 \| 0.7659 \| 3.5053 \| -350.0389 \| -390.2114 \| -2.8299 \| -2.8258 \|
	\| 0.0114 \| 2.79 \| 2700 \| 0.6951 \| -2.2382 \| -5.6885 \| 0.7698 \| 3.4503 \| -348.8512 \| -389.5739 \| -2.8207 \| -2.8172 \|
	\| 0.0437 \| 2.89 \| 2800 \| 0.6911 \| -2.2294 \| -5.6156 \| 0.7659 \| 3.3861 \| -348.1217 \| -389.4860 \| -2.8151 \| -2.8117 \|
	\| 0.0109 \| 2.99 \| 2900 \| 0.6909 \| -2.2776 \| -5.6932 \| 0.7659 \| 3.4156 \| -348.8980 \| -389.9677 \| -2.8187 \| -2.8148 \|


	### Framework versions

	- Transformers 4.35.0
	- Pytorch 2.1.1+cu121
	- Datasets 2.14.6
	- Tokenizers 0.14.1