pbevan11
/

llama-3-8b-ocr-correction

@@ -67,7 +67,7 @@ wandb_name: test
 gradient_accumulation_steps: 4
 micro_batch_size: 2 # was 16
 eval_batch_size: 2 # was 16
-num_epochs: 4
 optimizer: paged_adamw_32bit
 lr_scheduler: cosine
 learning_rate: 0.0002
@@ -104,12 +104,12 @@ special_tokens:
 </details><br>
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sncds/ocr-ft/runs/xvs2hfvk)
 # llama-3-8b-ocr-correction
 This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.2022
 ## Model description
@@ -137,29 +137,21 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
-- num_epochs: 4
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
 | 0.6611        | 0.0165 | 1    | 0.6229          |
-| 0.3152        | 0.2469 | 15   | 0.2871          |
-| 0.2078        | 0.4938 | 30   | 0.2166          |
-| 0.2219        | 0.7407 | 45   | 0.1920          |
-| 0.193         | 0.9877 | 60   | 0.1819          |
-| 0.1319        | 1.2140 | 75   | 0.1776          |
-| 0.1269        | 1.4609 | 90   | 0.1769          |
-| 0.1408        | 1.7078 | 105  | 0.1713          |
-| 0.1347        | 1.9547 | 120  | 0.1692          |
-| 0.0622        | 2.1811 | 135  | 0.1879          |
-| 0.0674        | 2.4280 | 150  | 0.1868          |
-| 0.0715        | 2.6749 | 165  | 0.1876          |
-| 0.0567        | 2.9218 | 180  | 0.1851          |
-| 0.0445        | 3.1481 | 195  | 0.1928          |
-| 0.0419        | 3.3951 | 210  | 0.2017          |
-| 0.0371        | 3.6420 | 225  | 0.2021          |
-| 0.0382        | 3.8889 | 240  | 0.2022          |
 ### Framework versions

 gradient_accumulation_steps: 4
 micro_batch_size: 2 # was 16
 eval_batch_size: 2 # was 16
+num_epochs: 2
 optimizer: paged_adamw_32bit
 lr_scheduler: cosine
 learning_rate: 0.0002
 </details><br>
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sncds/ocr-ft/runs/m4qbupk5)
 # llama-3-8b-ocr-correction
 This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.1742
 ## Model description
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
+- num_epochs: 2
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
 | 0.6611        | 0.0165 | 1    | 0.6229          |
+| 0.3149        | 0.2469 | 15   | 0.2870          |
+| 0.2074        | 0.4938 | 30   | 0.2166          |
+| 0.2211        | 0.7407 | 45   | 0.1937          |
+| 0.195         | 0.9877 | 60   | 0.1825          |
+| 0.1411        | 1.2140 | 75   | 0.1787          |
+| 0.1348        | 1.4609 | 90   | 0.1760          |
+| 0.1479        | 1.7078 | 105  | 0.1743          |
+| 0.1413        | 1.9547 | 120  | 0.1742          |
 ### Framework versions

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2ff311d8ac6132ef54205aa7b59159911a3cb6b0c383e7e95facb72e3778eb2e
 size 167934026

 version https://git-lfs.github.com/spec/v1
+oid sha256:c64465bb2211b47808dc809512a591f6ada32a06c95e2e5ae6b3bef6b9622301
 size 167934026