Qwen2-7B-Instruct-SPPO-Function-call-v2.5
This model is a fine-tuned version of slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1 on the slm-research-vn/dpo-format-function-calling-v3, the slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4 and the argilla/dpo-mix-7k datasets. It achieves the following results on the evaluation set:
- Loss: 0.3208
- Rewards/chosen: 1.7980
- Rewards/rejected: -0.0440
- Rewards/accuracies: 0.8853
- Rewards/margins: 1.8420
- Logps/rejected: -275.4126
- Logps/chosen: -225.6960
- Logits/rejected: -0.7099
- Logits/chosen: -0.6648
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6553 | 0.1048 | 100 | 0.6206 | 0.2426 | 0.0793 | 0.7735 | 0.1633 | -272.9460 | -256.8048 | -0.7286 | -0.6915 |
0.4736 | 0.2095 | 200 | 0.4579 | 1.2344 | 0.4185 | 0.8353 | 0.8160 | -266.1621 | -236.9672 | -0.6975 | -0.6532 |
0.4158 | 0.3143 | 300 | 0.4030 | 1.6264 | 0.4492 | 0.8500 | 1.1771 | -265.5471 | -229.1290 | -0.7183 | -0.6811 |
0.3913 | 0.4191 | 400 | 0.3698 | 1.7637 | 0.3444 | 0.8559 | 1.4194 | -267.6444 | -226.3811 | -0.7164 | -0.6677 |
0.3117 | 0.5238 | 500 | 0.3486 | 1.7529 | 0.1705 | 0.8706 | 1.5824 | -271.1227 | -226.5988 | -0.7171 | -0.6770 |
0.3219 | 0.6286 | 600 | 0.3346 | 1.7488 | 0.0498 | 0.8765 | 1.6990 | -273.5360 | -226.6806 | -0.7125 | -0.6709 |
0.2924 | 0.7334 | 700 | 0.3259 | 1.7948 | 0.0020 | 0.8824 | 1.7929 | -274.4924 | -225.7591 | -0.7103 | -0.6733 |
0.3287 | 0.8381 | 800 | 0.3221 | 1.7998 | -0.0221 | 0.8735 | 1.8218 | -274.9728 | -225.6601 | -0.7049 | -0.6610 |
0.3149 | 0.9429 | 900 | 0.3215 | 1.7999 | -0.0363 | 0.8824 | 1.8362 | -275.2581 | -225.6584 | -0.7051 | -0.6616 |
Framework versions
- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 2