Unscale FP16 Gradients Help
Hello,
I'm trying to train a model using the Trainer from the Transformers library. I am using a quantized model with FP16 optimization, but during training, I encounter the error ValueError: Attempting to unscale FP16 gradients..
Here is my code:
import transformers
from torch.nn import CrossEntropyLoss
from transformers import AutoTokenizer
from datasets import load_dataset
Define your model and tokenizer (these should already be defined in your code)
MODEL_NAME = "vilsonrodrigues/falcon-7b-instruct-sharded"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token
Load your data
data = load_dataset('csv', data_files='/content/Sumoquote Training Database.csv')
Define your tokenizer function
def tokenize_and_format(examples):
# Here, I'm assuming that the 'User' and 'Prompt' fields in your CSV contains the text you want to model.
text = [f"{x} {y}" for x, y in zip(examples['User'], examples['Prompt'])]
tokenized = tokenizer(text, truncation=True, padding='max_length')
# Format the data for causal language modeling
tokenized['labels'] = tokenized['input_ids'].copy()
tokenized['input_ids'] = [ids[:-1] for ids in tokenized['input_ids']]
tokenized['labels'] = [ids[1:] for ids in tokenized['labels']]
return tokenized
Apply the tokenizer function to your data
data = data.map(tokenize_and_format, batched=True)
data.set_format(type='torch', columns=['input_ids', 'labels'])
Define the training arguments
training_args = transformers.TrainingArguments(
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
num_train_epochs=1,
learning_rate=2e-4,
fp16=True,
save_total_limit=3,
logging_steps=1,
output_dir="experiments",
optim="adamw_8bit",
lr_scheduler_type="cosine",
warmup_ratio=0.05,
)
Define the callback
class EnsureGradsAreFP32(transformers.TrainerCallback):
def on_backward_end(self, args, state, control, **kwargs):
if args.fp16:
for param in model.parameters():
if param.grad is not None:
param.grad.data = param.grad.data.float()
Create the Trainer
trainer = transformers.Trainer(
model=model,
train_dataset=data['train'], # Here, I've used the Dataset
args=training_args,
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
callbacks=[EnsureGradsAreFP32()]
)
Disable caching
model.config.use_cache = False
Train the model
trainer.train()
Things I've tried:
-Disabling gradient accumulation.
-Changing the optimizer to "adamw_8bit".
-Making sure all gradients are in FP32 before calling optimizer.step().
-Disabling caching.
Despite these efforts, the problem still persists. Any guidance would be greatly appreciated.
how do you define your model?
what is your Transformers version?
apparently your code is correct and should work. I recommend opening an issue on Github showing the complete code