hongce-tech/openhermes-mistral-dpo-gptq

4db68ba verified 9 months ago

5.83 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: TheBloke/OpenHermes-2-Mistral-7B-GPTQ
	model-index:
	- name: openhermes-mistral-dpo-gptq
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# openhermes-mistral-dpo-gptq

	This model is a fine-tuned version of [TheBloke/OpenHermes-2-Mistral-7B-GPTQ](https://huggingface.co/TheBloke/OpenHermes-2-Mistral-7B-GPTQ) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4346
	- Rewards/chosen: 0.6886
	- Rewards/rejected: -0.1517
	- Rewards/accuracies: 0.875
	- Rewards/margins: 0.8403
	- Logps/rejected: -258.0681
	- Logps/chosen: -269.4644
	- Logits/rejected: -2.3873
	- Logits/chosen: -2.4450

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 4
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 2
	- training_steps: 100
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6927 \| 0.02 \| 5 \| 0.6723 \| -0.0624 \| -0.1130 \| 0.5 \| 0.0506 \| -257.6814 \| -276.9746 \| -2.3921 \| -2.4532 \|
	\| 0.6896 \| 0.04 \| 10 \| 0.6814 \| -0.0837 \| -0.1949 \| 0.5625 \| 0.1113 \| -258.5006 \| -277.1875 \| -2.3785 \| -2.4393 \|
	\| 0.7286 \| 0.06 \| 15 \| 0.7217 \| -0.1116 \| -0.2049 \| 0.8125 \| 0.0933 \| -258.6005 \| -277.4668 \| -2.3732 \| -2.4343 \|
	\| 0.6049 \| 0.08 \| 20 \| 0.6488 \| -0.5231 \| -0.7234 \| 0.9375 \| 0.2003 \| -263.7849 \| -281.5815 \| -2.3599 \| -2.4201 \|
	\| 3.1019 \| 0.1 \| 25 \| 0.6202 \| -0.7269 \| -1.0069 \| 0.9375 \| 0.2800 \| -266.6205 \| -283.6199 \| -2.3529 \| -2.4132 \|
	\| 3.4522 \| 0.12 \| 30 \| 0.6238 \| -0.8793 \| -1.2160 \| 0.875 \| 0.3367 \| -268.7114 \| -285.1440 \| -2.3418 \| -2.4001 \|
	\| 1.7538 \| 0.14 \| 35 \| 0.6336 \| -0.5977 \| -0.8794 \| 0.875 \| 0.2816 \| -265.3451 \| -282.3282 \| -2.3479 \| -2.4068 \|
	\| 0.6167 \| 0.16 \| 40 \| 0.6979 \| 0.0308 \| -0.1700 \| 0.8125 \| 0.2008 \| -258.2513 \| -276.0429 \| -2.3591 \| -2.4196 \|
	\| 1.5103 \| 0.18 \| 45 \| 0.7053 \| 0.0521 \| -0.1713 \| 0.875 \| 0.2233 \| -258.2638 \| -275.8300 \| -2.3607 \| -2.4207 \|
	\| 0.6762 \| 0.2 \| 50 \| 0.7144 \| 0.1606 \| -0.1470 \| 0.875 \| 0.3076 \| -258.0209 \| -274.7448 \| -2.3658 \| -2.4243 \|
	\| 0.6587 \| 0.22 \| 55 \| 0.7123 \| 0.1399 \| -0.2934 \| 0.8125 \| 0.4333 \| -259.4854 \| -274.9521 \| -2.3670 \| -2.4244 \|
	\| 0.7563 \| 0.24 \| 60 \| 0.7987 \| 0.4547 \| 0.0155 \| 0.8125 \| 0.4391 \| -256.3959 \| -271.8042 \| -2.3793 \| -2.4378 \|
	\| 0.8208 \| 0.26 \| 65 \| 0.8288 \| 1.0234 \| 0.5622 \| 0.8125 \| 0.4611 \| -250.9289 \| -266.1172 \| -2.4012 \| -2.4618 \|
	\| 0.9904 \| 0.28 \| 70 \| 0.7683 \| 1.4763 \| 0.9615 \| 0.8125 \| 0.5148 \| -246.9362 \| -261.5881 \| -2.4184 \| -2.4798 \|
	\| 0.8327 \| 0.3 \| 75 \| 0.6556 \| 1.6107 \| 1.0087 \| 0.8125 \| 0.6019 \| -246.4639 \| -260.2441 \| -2.4218 \| -2.4838 \|
	\| 0.8238 \| 0.32 \| 80 \| 0.5524 \| 1.5571 \| 0.8762 \| 0.8125 \| 0.6809 \| -247.7892 \| -260.7801 \| -2.4168 \| -2.4797 \|
	\| 0.7712 \| 0.34 \| 85 \| 0.5144 \| 1.3444 \| 0.6352 \| 0.8125 \| 0.7092 \| -250.1996 \| -262.9072 \| -2.4079 \| -2.4697 \|
	\| 0.691 \| 0.36 \| 90 \| 0.4688 \| 1.0225 \| 0.2544 \| 0.875 \| 0.7682 \| -254.0075 \| -266.1254 \| -2.3981 \| -2.4588 \|
	\| 0.6386 \| 0.38 \| 95 \| 0.4490 \| 0.8498 \| 0.0425 \| 0.875 \| 0.8074 \| -256.1265 \| -267.8524 \| -2.3927 \| -2.4521 \|
	\| 0.6413 \| 0.4 \| 100 \| 0.4346 \| 0.6886 \| -0.1517 \| 0.875 \| 0.8403 \| -258.0681 \| -269.4644 \| -2.3873 \| -2.4450 \|


	### Framework versions

	- PEFT 0.8.2
	- Transformers 4.37.2
	- Pytorch 2.0.1+cu117
	- Datasets 2.17.1
	- Tokenizers 0.15.2