sumuks commited on
Commit
945205a
1 Parent(s): 3107fd8

End of training

Browse files
Files changed (1) hide show
  1. README.md +148 -0
README.md ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: microsoft/Phi-3-mini-4k-instruct
3
+ library_name: peft
4
+ license: mit
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: phi3-nosys-gpt4ominiplans-27k-512rank
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.4.1`
20
+ ```yaml
21
+ # model and tokenizer
22
+ base_model: microsoft/Phi-3-mini-4k-instruct # change for model
23
+ trust_remote_code: true
24
+ sequence_len: 2048
25
+
26
+ strict: false
27
+
28
+ model_type: AutoModelForCausalLM
29
+ tokenizer_type: AutoTokenizer
30
+ bf16: auto
31
+ pad_to_sequence_len: true
32
+ save_safetensors: true
33
+
34
+
35
+ datasets:
36
+ - path: verifiers-for-code/sampled_10k_from_27k
37
+ type: completion
38
+ field: text_nosys_phi
39
+ train_on_split: train
40
+
41
+ val_set_size: 0.05
42
+
43
+ # lora
44
+ adapter: lora
45
+ lora_r: 512
46
+ lora_alpha: 32
47
+ lora_dropout: 0.05
48
+ lora_target_linear: true
49
+ lora_modules_to_save:
50
+ - embed_tokens
51
+ - lm_head
52
+ use_rslora: true
53
+
54
+ # logging
55
+ wandb_project: valeris
56
+ wandb_name: phi3-nosys-gpt4ominiplans-27k-512rank
57
+
58
+ output_dir: ./outputs/phi3-nosys-gpt4ominiplans-27k-512rank
59
+
60
+ gradient_accumulation_steps: 2
61
+ gradient_checkpointing: true
62
+ gradient_checkpointing_kwargs:
63
+ use_reentrant: true
64
+ micro_batch_size: 2
65
+ num_epochs: 1
66
+ eval_batch_size: 2
67
+ warmup_ratio: 0.05
68
+ learning_rate: 5e-6
69
+ lr_scheduler: cosine
70
+ optimizer: adamw_torch
71
+
72
+ hub_model_id: verifiers-for-code/phi3-nosys-gpt4ominiplans-27k-512rank
73
+ push_to_hub: true
74
+ hub_strategy: all_checkpoints
75
+ hub_always_push: true
76
+ evals_per_epoch: 8
77
+ saves_per_epoch: 4
78
+ logging_steps: 1
79
+ # eval_table_size: 10
80
+ # eval_max_new_tokens: 512
81
+
82
+ tokens: ["<thinking>", "</thinking>", "<plan>", "</plan>"]
83
+
84
+ special_tokens:
85
+ pad_token: "<|endoftext|>"
86
+
87
+ ```
88
+
89
+ </details><br>
90
+
91
+ # phi3-nosys-gpt4ominiplans-27k-512rank
92
+
93
+ This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on the None dataset.
94
+ It achieves the following results on the evaluation set:
95
+ - Loss: 0.8553
96
+
97
+ ## Model description
98
+
99
+ More information needed
100
+
101
+ ## Intended uses & limitations
102
+
103
+ More information needed
104
+
105
+ ## Training and evaluation data
106
+
107
+ More information needed
108
+
109
+ ## Training procedure
110
+
111
+ ### Training hyperparameters
112
+
113
+ The following hyperparameters were used during training:
114
+ - learning_rate: 5e-06
115
+ - train_batch_size: 2
116
+ - eval_batch_size: 2
117
+ - seed: 42
118
+ - distributed_type: multi-GPU
119
+ - num_devices: 8
120
+ - gradient_accumulation_steps: 2
121
+ - total_train_batch_size: 32
122
+ - total_eval_batch_size: 16
123
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
124
+ - lr_scheduler_type: cosine
125
+ - lr_scheduler_warmup_steps: 14
126
+ - num_epochs: 1
127
+
128
+ ### Training results
129
+
130
+ | Training Loss | Epoch | Step | Validation Loss |
131
+ |:-------------:|:------:|:----:|:---------------:|
132
+ | 1.0833 | 0.0034 | 1 | 1.0330 |
133
+ | 1.0118 | 0.1279 | 38 | 0.9947 |
134
+ | 0.9884 | 0.2559 | 76 | 0.9393 |
135
+ | 0.9277 | 0.3838 | 114 | 0.8987 |
136
+ | 0.8411 | 0.5118 | 152 | 0.8723 |
137
+ | 0.8863 | 0.6397 | 190 | 0.8590 |
138
+ | 0.8637 | 0.7677 | 228 | 0.8557 |
139
+ | 0.9009 | 0.8956 | 266 | 0.8553 |
140
+
141
+
142
+ ### Framework versions
143
+
144
+ - PEFT 0.11.1
145
+ - Transformers 4.44.0.dev0
146
+ - Pytorch 2.4.0
147
+ - Datasets 2.19.1
148
+ - Tokenizers 0.19.1