I am currently fine-tuning the model on TPU with the transformers v4.43.0.dev0 framework and always getting nan loss and nan grad norm. I am fine-tuning it with LoRA and the model is loaded as bfloat16.
transformers v4.43.0.dev0
bfloat16
Can anyone fix it? Thank you in advance.
Here is a snippet of my training arguments:
Ditto on this, grad norm exploding even with clip enabled.
· Sign up or log in to comment