pbevan11 commited on
Commit
0e38eff
1 Parent(s): e4bae56

End of training

Browse files
Files changed (2) hide show
  1. README.md +12 -20
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -67,7 +67,7 @@ wandb_name: test
67
  gradient_accumulation_steps: 4
68
  micro_batch_size: 2 # was 16
69
  eval_batch_size: 2 # was 16
70
- num_epochs: 4
71
  optimizer: paged_adamw_32bit
72
  lr_scheduler: cosine
73
  learning_rate: 0.0002
@@ -104,12 +104,12 @@ special_tokens:
104
 
105
  </details><br>
106
 
107
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sncds/ocr-ft/runs/xvs2hfvk)
108
  # llama-3-8b-ocr-correction
109
 
110
  This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.
111
  It achieves the following results on the evaluation set:
112
- - Loss: 0.2022
113
 
114
  ## Model description
115
 
@@ -137,29 +137,21 @@ The following hyperparameters were used during training:
137
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
138
  - lr_scheduler_type: cosine
139
  - lr_scheduler_warmup_steps: 10
140
- - num_epochs: 4
141
 
142
  ### Training results
143
 
144
  | Training Loss | Epoch | Step | Validation Loss |
145
  |:-------------:|:------:|:----:|:---------------:|
146
  | 0.6611 | 0.0165 | 1 | 0.6229 |
147
- | 0.3152 | 0.2469 | 15 | 0.2871 |
148
- | 0.2078 | 0.4938 | 30 | 0.2166 |
149
- | 0.2219 | 0.7407 | 45 | 0.1920 |
150
- | 0.193 | 0.9877 | 60 | 0.1819 |
151
- | 0.1319 | 1.2140 | 75 | 0.1776 |
152
- | 0.1269 | 1.4609 | 90 | 0.1769 |
153
- | 0.1408 | 1.7078 | 105 | 0.1713 |
154
- | 0.1347 | 1.9547 | 120 | 0.1692 |
155
- | 0.0622 | 2.1811 | 135 | 0.1879 |
156
- | 0.0674 | 2.4280 | 150 | 0.1868 |
157
- | 0.0715 | 2.6749 | 165 | 0.1876 |
158
- | 0.0567 | 2.9218 | 180 | 0.1851 |
159
- | 0.0445 | 3.1481 | 195 | 0.1928 |
160
- | 0.0419 | 3.3951 | 210 | 0.2017 |
161
- | 0.0371 | 3.6420 | 225 | 0.2021 |
162
- | 0.0382 | 3.8889 | 240 | 0.2022 |
163
 
164
 
165
  ### Framework versions
 
67
  gradient_accumulation_steps: 4
68
  micro_batch_size: 2 # was 16
69
  eval_batch_size: 2 # was 16
70
+ num_epochs: 2
71
  optimizer: paged_adamw_32bit
72
  lr_scheduler: cosine
73
  learning_rate: 0.0002
 
104
 
105
  </details><br>
106
 
107
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sncds/ocr-ft/runs/m4qbupk5)
108
  # llama-3-8b-ocr-correction
109
 
110
  This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.
111
  It achieves the following results on the evaluation set:
112
+ - Loss: 0.1742
113
 
114
  ## Model description
115
 
 
137
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
138
  - lr_scheduler_type: cosine
139
  - lr_scheduler_warmup_steps: 10
140
+ - num_epochs: 2
141
 
142
  ### Training results
143
 
144
  | Training Loss | Epoch | Step | Validation Loss |
145
  |:-------------:|:------:|:----:|:---------------:|
146
  | 0.6611 | 0.0165 | 1 | 0.6229 |
147
+ | 0.3149 | 0.2469 | 15 | 0.2870 |
148
+ | 0.2074 | 0.4938 | 30 | 0.2166 |
149
+ | 0.2211 | 0.7407 | 45 | 0.1937 |
150
+ | 0.195 | 0.9877 | 60 | 0.1825 |
151
+ | 0.1411 | 1.2140 | 75 | 0.1787 |
152
+ | 0.1348 | 1.4609 | 90 | 0.1760 |
153
+ | 0.1479 | 1.7078 | 105 | 0.1743 |
154
+ | 0.1413 | 1.9547 | 120 | 0.1742 |
 
 
 
 
 
 
 
 
155
 
156
 
157
  ### Framework versions
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2ff311d8ac6132ef54205aa7b59159911a3cb6b0c383e7e95facb72e3778eb2e
3
  size 167934026
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c64465bb2211b47808dc809512a591f6ada32a06c95e2e5ae6b3bef6b9622301
3
  size 167934026