NAN when training

#29

by nthehai01 - opened Jul 21

Discussion

nthehai01

Jul 21

•

edited Jul 21

I am currently fine-tuning the model on TPU with the transformers v4.43.0.dev0 framework and always getting nan loss and nan grad norm. I am fine-tuning it with LoRA and the model is loaded as bfloat16.

Can anyone fix it? Thank you in advance.

Here is a snippet of my training arguments:

lbathen

Jul 29

Ditto on this, grad norm exploding even with clip enabled.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment