Text Generation
Transformers
TensorBoard
Safetensors
llama
alignment-handbook
trl
sft
Generated from Trainer
conversational
text-generation-inference
Inference Endpoints
lillian039 commited on
Commit
b015d82
1 Parent(s): 4ae10f3

Model save

Browse files
README.md CHANGED
@@ -1,19 +1,11 @@
1
  ---
2
  library_name: transformers
3
  license: llama3.2
4
- base_model: meta-llama/Llama-3.2-1B
5
  tags:
6
- - alignment-handbook
7
  - trl
8
  - sft
9
  - generated_from_trainer
10
- - trl
11
- - sft
12
- - generated_from_trainer
13
- datasets:
14
- - barc0/transduction_angmented_100k-gpt4-description-gpt4omini-code_generated_problems
15
- - barc0/transduction_angmented_100k_gpt4o-mini_generated_problems
16
- - barc0/transduction_rearc_dataset_400k
17
  model-index:
18
  - name: llama3.2-1b-instruct-fft-transduction-engineer_lr1e-5_epoch4
19
  results: []
@@ -24,9 +16,9 @@ should probably proofread and complete it, then remove this comment. -->
24
 
25
  # llama3.2-1b-instruct-fft-transduction-engineer_lr1e-5_epoch4
26
 
27
- This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on the barc0/transduction_angmented_100k-gpt4-description-gpt4omini-code_generated_problems, the barc0/transduction_angmented_100k_gpt4o-mini_generated_problems and the barc0/transduction_rearc_dataset_400k datasets.
28
  It achieves the following results on the evaluation set:
29
- - Loss: 0.0363
30
 
31
  ## Model description
32
 
@@ -50,10 +42,10 @@ The following hyperparameters were used during training:
50
  - eval_batch_size: 8
51
  - seed: 42
52
  - distributed_type: multi-GPU
53
- - num_devices: 4
54
  - gradient_accumulation_steps: 2
55
- - total_train_batch_size: 128
56
- - total_eval_batch_size: 32
57
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
58
  - lr_scheduler_type: cosine
59
  - lr_scheduler_warmup_ratio: 0.1
@@ -61,12 +53,12 @@ The following hyperparameters were used during training:
61
 
62
  ### Training results
63
 
64
- | Training Loss | Epoch | Step | Validation Loss |
65
- |:-------------:|:------:|:----:|:---------------:|
66
- | 0.0584 | 0.9998 | 2251 | 0.0631 |
67
- | 0.0502 | 2.0 | 4503 | 0.0447 |
68
- | 0.0421 | 2.9998 | 6754 | 0.0367 |
69
- | 0.0269 | 3.9991 | 9004 | 0.0363 |
70
 
71
 
72
  ### Framework versions
 
1
  ---
2
  library_name: transformers
3
  license: llama3.2
4
+ base_model: meta-llama/Llama-3.2-1B-Instruct
5
  tags:
 
6
  - trl
7
  - sft
8
  - generated_from_trainer
 
 
 
 
 
 
 
9
  model-index:
10
  - name: llama3.2-1b-instruct-fft-transduction-engineer_lr1e-5_epoch4
11
  results: []
 
16
 
17
  # llama3.2-1b-instruct-fft-transduction-engineer_lr1e-5_epoch4
18
 
19
+ This model is a fine-tuned version of [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.0409
22
 
23
  ## Model description
24
 
 
42
  - eval_batch_size: 8
43
  - seed: 42
44
  - distributed_type: multi-GPU
45
+ - num_devices: 8
46
  - gradient_accumulation_steps: 2
47
+ - total_train_batch_size: 256
48
+ - total_eval_batch_size: 64
49
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
50
  - lr_scheduler_type: cosine
51
  - lr_scheduler_warmup_ratio: 0.1
 
53
 
54
  ### Training results
55
 
56
+ | Training Loss | Epoch | Step | Validation Loss |
57
+ |:-------------:|:-----:|:----:|:---------------:|
58
+ | 0.0618 | 1.0 | 1126 | 0.0657 |
59
+ | 0.0504 | 2.0 | 2252 | 0.0494 |
60
+ | 0.0363 | 3.0 | 3378 | 0.0418 |
61
+ | 0.0238 | 4.0 | 4504 | 0.0409 |
62
 
63
 
64
  ### Framework versions
all_results.json CHANGED
@@ -1,14 +1,9 @@
1
  {
2
- "epoch": 3.9991117033089054,
3
- "eval_loss": 0.03634560480713844,
4
- "eval_runtime": 402.1706,
5
- "eval_samples": 15166,
6
- "eval_samples_per_second": 37.71,
7
- "eval_steps_per_second": 1.179,
8
- "total_flos": 1010387024977920.0,
9
- "train_loss": 0.05092642861352228,
10
- "train_runtime": 98184.5209,
11
  "train_samples": 288129,
12
- "train_samples_per_second": 11.738,
13
- "train_steps_per_second": 0.092
14
  }
 
1
  {
2
+ "epoch": 4.0,
3
+ "total_flos": 1010769106173952.0,
4
+ "train_loss": 0.055861650320550765,
5
+ "train_runtime": 18499.8374,
 
 
 
 
 
6
  "train_samples": 288129,
7
+ "train_samples_per_second": 62.299,
8
+ "train_steps_per_second": 0.243
9
  }
generation_config.json CHANGED
@@ -1,8 +1,11 @@
1
  {
2
- "_from_model_config": true,
3
  "bos_token_id": 128000,
4
  "do_sample": true,
5
- "eos_token_id": 128001,
 
 
 
 
6
  "temperature": 0.6,
7
  "top_p": 0.9,
8
  "transformers_version": "4.45.0.dev0"
 
1
  {
 
2
  "bos_token_id": 128000,
3
  "do_sample": true,
4
+ "eos_token_id": [
5
+ 128001,
6
+ 128008,
7
+ 128009
8
+ ],
9
  "temperature": 0.6,
10
  "top_p": 0.9,
11
  "transformers_version": "4.45.0.dev0"
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 3.9991117033089054,
3
- "total_flos": 1010387024977920.0,
4
- "train_loss": 0.05092642861352228,
5
- "train_runtime": 98184.5209,
6
  "train_samples": 288129,
7
- "train_samples_per_second": 11.738,
8
- "train_steps_per_second": 0.092
9
  }
 
1
  {
2
+ "epoch": 4.0,
3
+ "total_flos": 1010769106173952.0,
4
+ "train_loss": 0.055861650320550765,
5
+ "train_runtime": 18499.8374,
6
  "train_samples": 288129,
7
+ "train_samples_per_second": 62.299,
8
+ "train_steps_per_second": 0.243
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff