finetuning args
#8
by
lvkaokao
- opened
hi, I have seen the training args in the files. And have a question about the hyper-parameters:
the training dataset is ~300k, epoch=4, per_device_train_batch=6, and gradient accumulation=4, GPU cards=4, but the global steps=1204, is the hyper-parameters correct?
I am trying to reproduce your results, but I find the metric of ARC and Hellaswag decreases significantly during the training.
Hope to get your reply~
Thanks