Safetensors
English
llama
sound language model
bachvudinh commited on
Commit
0a28bc8
1 Parent(s): 7c5b2ee

Upload Llama3.1 with Whisper Tokenizer at step 5000

Browse files
Files changed (3) hide show
  1. 8B_full.yaml +90 -0
  2. config.json +1 -39
  3. log_1723708612.txt +0 -0
8B_full.yaml ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Config for multi-device full finetuning in full_finetune_distributed.py
2
+ # using a Llama3 8B Instruct model
3
+ #
4
+ # This config assumes that you've run the following command before launching
5
+ # this run:
6
+ # tune download meta-llama/Meta-Llama-3-8B-Instruct --output-dir /tmp/Meta-Llama-3-8B-Instruct --hf-token <HF_TOKEN>
7
+ #
8
+ # To launch on 4 devices, run the following command from root:
9
+ # tune run --nproc_per_node 4 full_finetune_distributed --config llama3/8B_full
10
+ #
11
+ # You can add specific overrides through the command line. For example
12
+ # to override the checkpointer directory while launching training
13
+ # you can run:
14
+ # tune run --nproc_per_node 4 full_finetune_distributed --config llama3/8B_full checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
15
+ #
16
+ # This config works best when the model is being fine-tuned on 2+ GPUs.
17
+ # Single device full finetuning requires more memory optimizations. It's
18
+ # best to use 8B_full_single_device.yaml for those cases
19
+ # Tokenizer
20
+ tokenizer:
21
+ _component_: torchtune.models.llama3.llama3_s_tokenizer
22
+ path: ../model_zoo/tokenizer.model
23
+ max_seq_len: 1024
24
+
25
+ # Dataset
26
+ dataset:
27
+ _component_: torchtune.datasets.chat_dataset
28
+ source: homebrewltd/instruction-speech-whispervq-v2
29
+ conversation_style: openai
30
+ max_seq_len: 1024
31
+ split: train
32
+ train_on_input: True
33
+
34
+ seed: 42
35
+ shuffle: True
36
+ # Model Arguments
37
+ model:
38
+ _component_: torchtune.models.llama3_1.llama3_1_s_8b
39
+ # path: model_zoo/Llama3.1_s_8b_init
40
+ checkpointer:
41
+ _component_: torchtune.utils.FullModelHFCheckpointerSaveSteps
42
+ checkpoint_dir: ../model_zoo/llama3.1-s-base-2024-08-17
43
+ checkpoint_files: [
44
+ pytorch_model.bin,
45
+ ]
46
+ recipe_checkpoint: null
47
+ output_dir: ../model_zoo/llama3-s-instruct2
48
+ model_type: LLAMA3
49
+ resume_from_checkpoint: False
50
+ save_every_n_steps: 1000
51
+ max_checkpoints: 3
52
+ # Fine-tuning arguments
53
+ batch_size: 8
54
+ epochs: 5
55
+ max_steps_per_epoch: null
56
+ gradient_accumulation_steps: 2
57
+ compile: False
58
+ # Optimizer and Scheduler
59
+ optimizer:
60
+ _component_: torch.optim.AdamW #change this to use adam_mini: torchtune.modules.optimizer.Adam_mini
61
+ weight_decay: 0.005
62
+ lr: 1e-4
63
+ fused: True
64
+ lr_scheduler:
65
+ _component_: torchtune.modules.get_cosine_schedule_with_warmup
66
+ num_warmup_steps: 80
67
+
68
+ loss:
69
+ _component_: torch.nn.CrossEntropyLoss
70
+
71
+ fsdp:
72
+ cpu_offload: False
73
+
74
+ # Training env
75
+ device: cuda
76
+ dtype: bf16
77
+
78
+ # Memory management
79
+ enable_activation_checkpointing: True
80
+ memory_efficient_fsdp_wrap: True
81
+ ac_mode: 'selective'
82
+
83
+
84
+ # Logging
85
+ metric_logger:
86
+ _component_: torchtune.utils.metric_logging.DiskLogger
87
+ log_dir: ${output_dir}
88
+ output_dir: ../model_zoo/Llama3-instruct2-log/
89
+ log_every_n_steps: 1
90
+ log_peak_memory_stats: False
config.json CHANGED
@@ -1,39 +1 @@
1
- {
2
- "_name_or_path": "llama3.1-s-base-2024-08-17/",
3
- "architectures": [
4
- "LlamaForCausalLM"
5
- ],
6
- "attention_bias": false,
7
- "attention_dropout": 0.0,
8
- "bos_token_id": 128000,
9
- "eos_token_id": [
10
- 128001,
11
- 128008,
12
- 128009
13
- ],
14
- "hidden_act": "silu",
15
- "hidden_size": 4096,
16
- "initializer_range": 0.02,
17
- "intermediate_size": 14336,
18
- "max_position_embeddings": 131072,
19
- "mlp_bias": false,
20
- "model_type": "llama",
21
- "num_attention_heads": 32,
22
- "num_hidden_layers": 32,
23
- "num_key_value_heads": 8,
24
- "pretraining_tp": 1,
25
- "rms_norm_eps": 1e-05,
26
- "rope_scaling": {
27
- "factor": 8.0,
28
- "high_freq_factor": 4.0,
29
- "low_freq_factor": 1.0,
30
- "original_max_position_embeddings": 8192,
31
- "rope_type": "llama3"
32
- },
33
- "rope_theta": 500000.0,
34
- "tie_word_embeddings": false,
35
- "torch_dtype": "bfloat16",
36
- "transformers_version": "4.44.0",
37
- "use_cache": true,
38
- "vocab_size": 128771
39
- }
 
1
+ {"_name_or_path": "meta-llama/Meta-Llama-3.1-8B-Instruct", "architectures": ["LlamaForCausalLM"], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": [128001, 128008, 128009], "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 131072, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": {"factor": 8.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3"}, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.43.1", "use_cache": true, "vocab_size": 128771}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
log_1723708612.txt ADDED
The diff for this file is too large to render. See raw diff