Ali-Forootani commited on
Commit
ed27633
1 Parent(s): 78b7550

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +224 -2
README.md CHANGED
@@ -3,13 +3,235 @@ library_name: transformers
3
  tags: []
4
  ---
5
 
6
- # Model Card for Model ID
 
 
 
 
7
 
8
  <!-- Provide a quick summary of what the model is/does. -->
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
 
12
- ## Model Details
13
 
14
  ### Model Description
15
 
 
3
  tags: []
4
  ---
5
 
6
+ # Fine-tune Llama 3 with ORPO
7
+
8
+ ORPO is a new exciting fine-tuning technique that combines the traditional supervised fine-tuning and preference alignment stages into a single process. This reduces the computational resources and time required for training. Moreover, empirical results demonstrate that ORPO outperforms other alignment methods on various model sizes and benchmarks.
9
+
10
+ In this article, we will fine-tune the new Llama 3 8B model using ORPO with the TRL library.
11
 
12
  <!-- Provide a quick summary of what the model is/does. -->
13
 
14
+ ## ORPO
15
+ Instruction tuning and preference alignment are essential techniques for adapting Large Language Models (LLMs) to specific tasks. Traditionally, this involves a multi-stage process: 1/ Supervised Fine-Tuning (SFT) on instructions to adapt the model to the target domain, followed by 2/ preference alignment methods like Reinforcement Learning with Human Feedback (RLHF) or Direct Preference Optimization (DPO) to increase the likelihood of generating preferred responses over rejected ones.
16
+
17
+ However, researchers have identified a limitation in this approach. While SFT effectively adapts the model to the desired domain, it inadvertently increases the probability of generating undesirable answers alongside preferred ones. This is why the preference alignment stage is necessary to widen the gap between the likelihoods of preferred and rejected outputs.
18
+
19
+ see more on ORPO [link](https://arxiv.org/abs/2403.07691)
20
+
21
+ ## Fine-tuning Llama 3 with ORPO
22
+
23
+ [Llama 3](https://github.com/meta-llama/llama3/tree/main) is the latest family of LLMs developed by Meta. The models were trained on an extensive dataset of 15 trillion tokens (compared to 2T tokens for Llama 2). Two model sizes have been released: a 70 billion parameter model and a smaller 8 billion parameter model. The 70B model has already demonstrated impressive performance, scoring 82 on the MMLU benchmark and 81.7 on the HumanEval benchmark.
24
+
25
+ Llama 3 models also increased the context length up to 8,192 tokens (4,096 tokens for Llama 2), and potentially scale up to 32k with RoPE. Additionally, the models use a new tokenizer with a 128K-token vocabulary, reducing the number of tokens required to encode text by 15%. This vocabulary also explains the bump from 7B to 8B parameters.
26
+
27
+ # Required packages
28
+ ```bash
29
+ pip install -U transformers datasets accelerate peft trl bitsandbytes wandb
30
+ pip install -qqq flash-attn
31
+ pip install -qU transformers accelerate
32
+ ```
33
+
34
+ Once it's installed, we can import the necessary libraries and log in to W&B (optional):
35
+
36
+ ```python
37
+
38
+ """
39
+ wandb
40
+ https://wandb.ai/aliforootani-UFZ
41
+ you need wb_token
42
+ """
43
+
44
+ import gc
45
+ import os
46
+
47
+ import torch
48
+ import wandb
49
+ from datasets import load_dataset
50
+
51
+
52
+
53
+ # Directly insert your Weights & Biases API key here
54
+ wb_token = 'your_wb_token'
55
+ wandb.login(key=wb_token)
56
+
57
+
58
+ from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training
59
+
60
+ from transformers import (
61
+ AutoModelForCausalLM,
62
+ AutoTokenizer,
63
+ BitsAndBytesConfig,
64
+ TrainingArguments,
65
+ pipeline,)
66
+
67
+ from trl import ORPOConfig, ORPOTrainer, setup_chat_format
68
+ ```
69
+
70
+ If you have a recent GPU, you should also be able to use the Flash Attention library to replace the default eager attention implementation with a more efficient one.
71
+
72
+
73
+ ```python
74
+
75
+ if torch.cuda.get_device_capability()[0] >= 128:
76
+
77
+ attn_implementation = "flash_attention_2"
78
+ torch_dtype = torch.bfloat16
79
+ else:
80
+ attn_implementation = "eager"
81
+ torch_dtype = torch.float16
82
+
83
+
84
+ ##################################
85
+
86
+ import sys
87
+ import os
88
+
89
+ cwd = os.getcwd()
90
+ # sys.path.append(cwd + '/my_directory')
91
+ sys.path.append(cwd)
92
+
93
+
94
+ def setting_directory(depth):
95
+ current_dir = os.path.abspath(os.getcwd())
96
+ root_dir = current_dir
97
+ for i in range(depth):
98
+ root_dir = os.path.abspath(os.path.join(root_dir, os.pardir))
99
+ sys.path.append(os.path.dirname(root_dir))
100
+ return root_dir
101
+
102
+ # I load the model from local directory!
103
+ model_path = "/data/bio-eng-llm/llm_repo/mlabonne/OrpoLlama-3-8B"
104
+ ```
105
+
106
+ In the following, we will load the OrpoLlama-3-8B in 4-bit precision thanks to bitsandbytes. We then set the LoRA configuration using PEFT for QLoRA. I'm also using the convenient setup_chat_format() function to modify the model and tokenizer for ChatML support. It automatically applies this chat template, adds special tokens, and resizes the model's embedding layer to match the new vocabulary size.
107
+
108
+
109
+
110
+ ```python
111
+
112
+ # QLoRA config
113
+ bnb_config = BitsAndBytesConfig(
114
+ load_in_4bit=True,
115
+ bnb_4bit_quant_type="nf4",
116
+ bnb_4bit_compute_dtype= torch_dtype,
117
+ bnb_4bit_use_double_quant=True,
118
+ )
119
+
120
+ # LoRA config
121
+ peft_config = LoraConfig(
122
+ r=16,
123
+ lora_alpha=32,
124
+ lora_dropout=0.05,
125
+ bias="none",
126
+ task_type="CAUSAL_LM",
127
+ target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj']
128
+ )
129
+
130
+ # Load tokenizer
131
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
132
+
133
+ # Load model
134
+ model = AutoModelForCausalLM.from_pretrained(
135
+ model_path,
136
+ quantization_config=bnb_config,
137
+ device_map="auto",
138
+ attn_implementation= attn_implementation
139
+ )
140
+
141
+
142
+ model, tokenizer = setup_chat_format(model, tokenizer)
143
+ model = prepare_model_for_kbit_training(model)
144
+ ```
145
+
146
+ Now that the model is ready for training, we can take care of the dataset. We load mlabonne/orpo-dpo-mix-40k and use the apply_chat_template() function to convert the "chosen" and "rejected" columns into the ChatML format. Note that I'm only using 1,000 samples and not the entire dataset, as it would take too long to run.
147
+
148
+ First, we need to set a few hyperparameters:
149
+
150
+ learning_rate: ORPO uses very low learning rates compared to traditional SFT or even DPO. This value of 8e-6 comes from the original paper, and roughly corresponds to an SFT learning rate of 1e-5 and a DPO learning rate of 5e-6. I would recommend increasing it around 1e-6 for a real fine-tune.
151
+ beta: It is the $\lambda$ parameter in the paper, with a default value of 0.1. An appendix from the original paper shows how it's been selected with an ablation study.
152
+ Other parameters, like max_length and batch size are set to use as much VRAM as available (~20 GB in this configuration). Ideally, we would train the model for 3-5 epochs, but we'll stick to 1 here.
153
+
154
+ Finally, we can train the model using the ORPOTrainer, which acts as a wrapper.
155
+
156
+
157
+ ```python
158
+
159
+ dataset_name = "/data/bio-eng-llm/llm_repo/mlabonne/OrpoLlama-3-8B"
160
+
161
+ dataset = load_dataset(dataset_name, split="all")
162
+ dataset = dataset.shuffle(seed=42).select(range(1000))
163
+
164
+
165
+ def format_chat_template(row):
166
+ row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)
167
+ row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)
168
+ return row
169
+
170
+ dataset = dataset.map(
171
+ format_chat_template,
172
+ num_proc= os.cpu_count(),
173
+ )
174
+ dataset = dataset.train_test_split(test_size=0.01)
175
+
176
+ epochs=20
177
+
178
+ orpo_args = ORPOConfig(
179
+ learning_rate=8e-6,
180
+ beta=0.1,
181
+ lr_scheduler_type="linear",
182
+ max_length=1024,
183
+ max_prompt_length=512,
184
+ per_device_train_batch_size=2,
185
+ per_device_eval_batch_size=2,
186
+ gradient_accumulation_steps=4,
187
+ optim="paged_adamw_8bit",
188
+ num_train_epochs=epochs,
189
+ evaluation_strategy="steps",
190
+ eval_steps=0.2,
191
+ logging_steps=1,
192
+ warmup_steps=10,
193
+ report_to="wandb",
194
+ output_dir="./results/",
195
+ )
196
+
197
+ trainer = ORPOTrainer(
198
+ model=model,
199
+ args=orpo_args,
200
+ train_dataset=dataset["train"],
201
+ eval_dataset=dataset["test"],
202
+ peft_config=peft_config,
203
+ tokenizer=tokenizer,
204
+ )
205
+ trainer.train()
206
+
207
+ import os
208
+
209
+ # Define the directory where you want to save the model
210
+ #
211
+
212
+ root_dir = setting_directory(0)
213
+
214
+ save_dir = root_dir + f"models/fine_tuned_models/OrpoLlama-3-8B_{epochs}e_qa_qa"
215
+ #trainer.save_model(save_dir)
216
+
217
+
218
+ # Create the directory if it doesn't exist
219
+ os.makedirs(save_dir, exist_ok=True)
220
+
221
+ # Combine the directory path with the model name
222
+ #new_model_path = os.path.join(save_dir, "OrpoLlama-3-8B")
223
+
224
+ # Save the model to the specified directory
225
+ trainer.save_model(save_dir)
226
+
227
+
228
+ #new_model = "OrpoLlama-3-8B"
229
+ #trainer.save_model(new_model)
230
+ ```
231
+
232
+ Training the model on these 1,000 samples and 20 epochs took about 22 hours on an Nvidia-A100 80GB GPU, but based on the Wnadb graphs only 34GB has been used. Let's check the W&B plots:
233
 
234
 
 
235
 
236
  ### Model Description
237