End of training

Browse files

Files changed (5) hide show

README.md +29 -29
adapter_model.safetensors +1 -1
emissions.csv +1 -1
runs/Jul17_19-43-13_msc-modeltrain-pod/events.out.tfevents.1721245397.msc-modeltrain-pod.2588.0 +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.5993
 ## Model description
@@ -49,11 +49,11 @@ The following `bitsandbytes` quantization config was used during training:
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 0.0002
-- train_batch_size: 16
 - eval_batch_size: 8
 - seed: 42
-- gradient_accumulation_steps: 4
 - total_train_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
@@ -64,31 +64,31 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 3.329         | 1.33  | 10   | 1.8003          |
-| 1.296         | 2.67  | 20   | 1.0774          |
-| 0.9489        | 4.0   | 30   | 0.9022          |
-| 0.7167        | 5.33  | 40   | 0.7270          |
-| 0.552         | 6.67  | 50   | 0.7372          |
-| 0.4766        | 8.0   | 60   | 0.7281          |
-| 0.4153        | 9.33  | 70   | 0.7673          |
-| 0.3614        | 10.67 | 80   | 0.8597          |
-| 0.3238        | 12.0  | 90   | 0.8915          |
-| 0.2923        | 13.33 | 100  | 0.9281          |
-| 0.2648        | 14.67 | 110  | 1.0239          |
-| 0.2483        | 16.0  | 120  | 1.0198          |
-| 0.2311        | 17.33 | 130  | 1.1314          |
-| 0.2196        | 18.67 | 140  | 1.2578          |
-| 0.2109        | 20.0  | 150  | 1.3155          |
-| 0.1997        | 21.33 | 160  | 1.2602          |
-| 0.1927        | 22.67 | 170  | 1.4758          |
-| 0.191         | 24.0  | 180  | 1.4080          |
-| 0.1834        | 25.33 | 190  | 1.4783          |
-| 0.1799        | 26.67 | 200  | 1.5217          |
-| 0.1796        | 28.0  | 210  | 1.5525          |
-| 0.1738        | 29.33 | 220  | 1.5714          |
-| 0.1725        | 30.67 | 230  | 1.5953          |
-| 0.1727        | 32.0  | 240  | 1.5980          |
-| 0.172         | 33.33 | 250  | 1.5993          |
 ### Framework versions

 This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.7622
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 8
 - eval_batch_size: 8
 - seed: 42
+- gradient_accumulation_steps: 8
 - total_train_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 3.7647        | 1.36  | 10   | 3.3602          |
+| 2.8           | 2.71  | 20   | 2.0480          |
+| 1.5819        | 4.07  | 30   | 1.2852          |
+| 1.1832        | 5.42  | 40   | 1.1025          |
+| 1.0318        | 6.78  | 50   | 1.0150          |
+| 0.9674        | 8.14  | 60   | 0.9718          |
+| 0.8975        | 9.49  | 70   | 0.9348          |
+| 0.8375        | 10.85 | 80   | 0.8912          |
+| 0.7851        | 12.2  | 90   | 0.8685          |
+| 0.728         | 13.56 | 100  | 0.8443          |
+| 0.6804        | 14.92 | 110  | 0.8038          |
+| 0.6123        | 16.27 | 120  | 0.7684          |
+| 0.5536        | 17.63 | 130  | 0.7314          |
+| 0.4922        | 18.98 | 140  | 0.6943          |
+| 0.4738        | 20.34 | 150  | 0.7095          |
+| 0.4467        | 21.69 | 160  | 0.7344          |
+| 0.4452        | 23.05 | 170  | 0.7397          |
+| 0.4258        | 24.41 | 180  | 0.7332          |
+| 0.4179        | 25.76 | 190  | 0.7436          |
+| 0.4105        | 27.12 | 200  | 0.7373          |
+| 0.4081        | 28.47 | 210  | 0.7596          |
+| 0.4005        | 29.83 | 220  | 0.7552          |
+| 0.4001        | 31.19 | 230  | 0.7652          |
+| 0.393         | 32.54 | 240  | 0.7612          |
+| 0.4016        | 33.9  | 250  | 0.7622          |
 ### Framework versions

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cdd34584c6acc5f0d9ac55c6e1190f8f78557a7c4ce7d5d4bea77352db827001
 size 75523312

 version https://git-lfs.github.com/spec/v1
+oid sha256:ff4a36e1a6dd121f9e1a7d350268e382f1c8d02a207ccedd1f6d86a1205890f1
 size 75523312

emissions.csv CHANGED Viewed

	@@ -1,2 +1,2 @@
1	timestamp,experiment_id,project_name,duration,emissions,energy_consumed,country_name,country_iso_code,region,on_cloud,cloud_provider,cloud_region
2	- 2024-07-~~17T17~~:04:02,~~acf495a5~~-~~fe7d~~-~~4741~~-~~820f~~-~~f1df91f69def~~,codecarbon,~~714~~.~~2170021533966~~,0.~~0483531364122497~~,0.~~07194239682851639~~,United Kingdom,GBR,scotland,N,,


1	timestamp,experiment_id,project_name,duration,emissions,energy_consumed,country_name,country_iso_code,region,on_cloud,cloud_provider,cloud_region
2	+ 2024-07-17T20:05:05,a2d5e5ed-35d4-4380-9b5c-d7337e3b0d8b,codecarbon,1308.196786403656,0.08001020888288951,0.11904349179564445,United Kingdom,GBR,scotland,N,,

runs/Jul17_19-43-13_msc-modeltrain-pod/events.out.tfevents.1721245397.msc-modeltrain-pod.2588.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:45c6f2c37d9b504e1fd9bb669a6cc59b1b1987e5ec96612b182f1fef8356687d
+size 17467

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ef8e5dc5533306720bffeb0cf13b6ace0ad8000b5f6b5240b681a005a13a301e
 size 4984

 version https://git-lfs.github.com/spec/v1
+oid sha256:c02584efc3e8ec3d6a7e29381306b66072c8cda4d47e71431afa8b88df054da1
 size 4984