End of training

Files changed (2) hide show

README.md CHANGED Viewed

@@ -7,16 +7,18 @@ tags:
 - sft
 - generated_from_trainer
 model-index:
-- name: YaHaHamaraLlama
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# YaHaHamaraLlama
 This model is a fine-tuned version of [ahxt/LiteLlama-460M-1T](https://huggingface.co/ahxt/LiteLlama-460M-1T) on the None dataset.
 ## Model description
@@ -36,18 +38,26 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 0.0002
-- train_batch_size: 8
-- eval_batch_size: 8
 - seed: 42
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
 - lr_scheduler_warmup_ratio: 0.03
 - num_epochs: 5
 ### Training results
 ### Framework versions

 - sft
 - generated_from_trainer
 model-index:
+- name: ColdLLamaLite
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# ColdLLamaLite
 This model is a fine-tuned version of [ahxt/LiteLlama-460M-1T](https://huggingface.co/ahxt/LiteLlama-460M-1T) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 2.3021
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 0.0002
+- train_batch_size: 32
+- eval_batch_size: 32
 - seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 256
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.03
 - num_epochs: 5
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 4.1436        | 0.8   | 25   | 3.8815          |
+| 3.6028        | 1.6   | 50   | 3.2639          |
+| 2.9395        | 2.4   | 75   | 2.5905          |
+| 2.4548        | 3.2   | 100  | 2.3582          |
+| 2.337         | 4.0   | 125  | 2.3102          |
+| 2.3125        | 4.8   | 150  | 2.3024          |
 ### Framework versions

runs/Aug02_13-07-01_fastgpuserv/events.out.tfevents.1722602007.fastgpuserv.3714303.1 ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:d6890949d3abe45db8d962b80fd220ed72f34e906ba435db853a4ad9a61faf49
+size 359