farpluto commited on
Commit
009da04
1 Parent(s): 4c1a98b

End of training

Browse files
README.md CHANGED
@@ -19,6 +19,8 @@ should probably proofread and complete it, then remove this comment. -->
19
  # SmolLM-135M-Instruct-Finetune-LoRA
20
 
21
  This model is a fine-tuned version of [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) on the generator dataset.
 
 
22
 
23
  ## Model description
24
 
@@ -49,11 +51,19 @@ The following hyperparameters were used during training:
49
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
50
  - lr_scheduler_type: cosine
51
  - lr_scheduler_warmup_ratio: 0.1
52
- - training_steps: 1
53
  - mixed_precision_training: Native AMP
54
 
55
  ### Training results
56
 
 
 
 
 
 
 
 
 
57
 
58
 
59
  ### Framework versions
 
19
  # SmolLM-135M-Instruct-Finetune-LoRA
20
 
21
  This model is a fine-tuned version of [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) on the generator dataset.
22
+ It achieves the following results on the evaluation set:
23
+ - Loss: 1.4752
24
 
25
  ## Model description
26
 
 
51
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
  - lr_scheduler_type: cosine
53
  - lr_scheduler_warmup_ratio: 0.1
54
+ - num_epochs: 4
55
  - mixed_precision_training: Native AMP
56
 
57
  ### Training results
58
 
59
+ | Training Loss | Epoch | Step | Validation Loss |
60
+ |:-------------:|:------:|:----:|:---------------:|
61
+ | 1.9806 | 0.6173 | 25 | 1.9171 |
62
+ | 1.8272 | 1.2346 | 50 | 1.7049 |
63
+ | 1.6412 | 1.8519 | 75 | 1.5713 |
64
+ | 1.5453 | 2.4691 | 100 | 1.5098 |
65
+ | 1.5006 | 3.0864 | 125 | 1.4822 |
66
+ | 1.4829 | 3.7037 | 150 | 1.4752 |
67
 
68
 
69
  ### Framework versions
adapter_config.json CHANGED
@@ -20,13 +20,13 @@
20
  "rank_pattern": {},
21
  "revision": null,
22
  "target_modules": [
23
- "up_proj",
24
- "k_proj",
25
- "o_proj",
26
  "down_proj",
27
- "q_proj",
28
  "v_proj",
29
- "gate_proj"
 
 
 
 
30
  ],
31
  "task_type": "CAUSAL_LM",
32
  "use_dora": false,
 
20
  "rank_pattern": {},
21
  "revision": null,
22
  "target_modules": [
 
 
 
23
  "down_proj",
 
24
  "v_proj",
25
+ "gate_proj",
26
+ "k_proj",
27
+ "o_proj",
28
+ "up_proj",
29
+ "q_proj"
30
  ],
31
  "task_type": "CAUSAL_LM",
32
  "use_dora": false,
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4fe7e9521eca7961cd31bd7700cbef38e9ae1b6e91059c464843c6045fc6f83f
3
  size 19593064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c3b4ca791c9f9c01295edcaef3a43f2136420e2dade49afd94618c13ef947b6b
3
  size 19593064
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2e5ebb42d904b2d2cff340cde81eabbc832efe72d870e31b37eef616aa413661
3
  size 5496
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5cba0bfc5734dbb897f1356225cd298207d7c656f42a7fa9bb6c108759ff5ea
3
  size 5496