Edit model card

gpt-imdb-cdpo_0.15-beta_0.1

This model is a fine-tuned version of lvwerra/gpt2-imdb on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5181
  • Rewards/chosen: -0.6104
  • Rewards/rejected: -1.9969
  • Rewards/accuracies: 0.9271
  • Rewards/margins: 1.3866
  • Logps/rejected: -283.6544
  • Logps/chosen: -241.3688
  • Logits/rejected: -36.1797
  • Logits/chosen: -37.0193

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 24
  • eval_batch_size: 24
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.99) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 150
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5541 0.21 500 0.5598 -0.1801 -1.1214 0.8417 0.9413 -274.8995 -237.0667 -33.1267 -34.0864
0.5399 0.42 1000 0.5555 -0.4075 -1.5309 0.8604 1.1234 -278.9942 -239.3399 -36.6366 -37.5032
0.5379 0.63 1500 0.5445 -0.5885 -1.8167 0.875 1.2282 -281.8521 -241.1506 -34.0236 -34.9075
0.5224 0.83 2000 0.5347 -0.4581 -1.7693 0.8917 1.3112 -281.3783 -239.8462 -34.9412 -35.8186
0.4992 1.04 2500 0.5318 -0.5998 -1.9222 0.9000 1.3224 -282.9072 -241.2631 -34.8041 -35.6967
0.5654 1.25 3000 0.5308 -0.5502 -1.9299 0.9021 1.3797 -282.9844 -240.7672 -35.6718 -36.5937
0.5382 1.46 3500 0.5247 -0.4952 -1.8522 0.9125 1.3570 -282.2072 -240.2172 -35.7229 -36.6547
0.5409 1.67 4000 0.5220 -0.5742 -1.9755 0.9292 1.4013 -283.4403 -241.0072 -36.4780 -37.3339
0.4911 1.88 4500 0.5186 -0.6281 -2.0249 0.9271 1.3967 -283.9341 -241.5466 -36.1014 -36.8989
0.5007 2.08 5000 0.5170 -0.6115 -2.0085 0.9312 1.3969 -283.7699 -241.3805 -36.7092 -37.5360
0.4714 2.29 5500 0.5166 -0.5400 -1.9265 0.9229 1.3865 -282.9501 -240.6650 -36.1382 -36.9914
0.5159 2.5 6000 0.5168 -0.5925 -1.9754 0.9271 1.3829 -283.4395 -241.1906 -35.9587 -36.8156
0.5103 2.71 6500 0.5171 -0.6197 -2.0190 0.9333 1.3993 -283.8753 -241.4619 -36.0316 -36.8825
0.5049 2.92 7000 0.5181 -0.6104 -1.9969 0.9271 1.3866 -283.6544 -241.3688 -36.1797 -37.0193

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.1
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
9
Safetensors
Model size
124M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for Myashka/gpt-imdb-cdpo_0.15-beta_0.1

Base model

lvwerra/gpt2-imdb
Finetuned
this model