elinas commited on
Commit
9762575
1 Parent(s): 4d559d0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -3
README.md CHANGED
@@ -1,3 +1,81 @@
1
- ---
2
- license: llama3
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - elinas/Llama-3-13B-Instruct
4
+ library_name: transformers
5
+ tags:
6
+ - mergekit
7
+ - merge
8
+ license: llama3
9
+ ---
10
+ # Llama-3-15B-Instruct-ft
11
+
12
+ This is a QLoRA **finetune** of a merge of pre-trained language models created using...
13
+
14
+ TODO
15
+
16
+ ## Datasets
17
+
18
+ * [Chat-Error/Pure-dove-sharegpt](https://huggingface.co/datasets/Chat-Error/Pure-dove-sharegpt)
19
+
20
+ ## Finetuning details
21
+ This is a QLoRA model and all modules were targeted.
22
+ ```yaml
23
+ lora_target_modules:
24
+ - down_proj
25
+ - o_proj
26
+ ```
27
+
28
+ ```yaml
29
+ The following hyperparameters were used during training:
30
+ - learning_rate: 1e-05
31
+ - train_batch_size: 2
32
+ - eval_batch_size: 2
33
+ - seed: 42
34
+ - distributed_type: multi-GPU
35
+ - num_devices: 3
36
+ - total_train_batch_size: 6
37
+ - total_eval_batch_size: 6
38
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
39
+ - lr_scheduler_type: cosine
40
+ - lr_scheduler_warmup_steps: 25
41
+ - num_epochs: 1
42
+ ```
43
+
44
+ Optimizer `paged_adamw_8bit` and Deepspeed ZeRO 3 was used at a LR of `1e-5` using the cosine scheduler for 1 epoch on 3x3090s taking 4h 12m 13s total.
45
+
46
+ Sample packing and padding was disabled to reduce VRAM consumption significantly at the cost of speed.
47
+
48
+ W&B Run Summary
49
+ ```
50
+ wandb: Run summary:
51
+ wandb: eval/loss 0.94497
52
+ wandb: eval/runtime 276.2864
53
+ wandb: eval/samples_per_second 1.397
54
+ wandb: eval/steps_per_second 0.235
55
+ wandb: total_flos 12246605365248.0
56
+ wandb: train/epoch 1.0
57
+ wandb: train/global_step 579
58
+ wandb: train/grad_norm 0.80411
59
+ wandb: train/learning_rate 0.0
60
+ wandb: train/loss 1.085
61
+ wandb: train_loss 0.8834
62
+ wandb: train_runtime 9893.1688
63
+ wandb: train_samples_per_second 0.351
64
+ wandb: train_steps_per_second 0.059
65
+ ```
66
+
67
+ ### Framework versions
68
+
69
+ - PEFT 0.10.0
70
+ - Transformers 4.40.0.dev0
71
+ - Pytorch 2.3.0+cu121
72
+ - Datasets 2.15.0
73
+ - Tokenizers 0.15.0
74
+
75
+ ## Model Evaluation
76
+
77
+ TBD
78
+
79
+ If you have any questions or comments on the model, feel free to open a discussion in the community tab.
80
+
81
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)