loss震荡幅度比较大是正常的嘛，loss是在3个epoch的哪个时候开始下降并保持稳定的呢

#13

by Aibet - opened Aug 30, 2023

Discussion

Aibet

Aug 30, 2023

•

edited Aug 30, 2023

，这是我的训练参数
--bf16 True
--num_train_epochs 3
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 32
--tf32 false
--model_max_length 4096
--deepspeed playground/deepspeed_config_s2.json
--gradient_checkpointing true\

fireballoon

Owner Sep 6, 2023

我的训练还是挺稳定的。一直都在稳定下降。
图中的loss是单个数据的还是一个batch (32条数据)的平均？

Aibet

Sep 6, 2023

我这个是单用一个数据集做的训练，然后波动比较大。请问，这是否和您ft代码里的那个mix_data有关，因为并没有看到实际的数据mix的代码调用，可以提供一下这块的代码嘛？谢谢

fireballoon

Owner Sep 6, 2023

我在loss计算时候是使用的100个batch的平均，所以可能看上去比较稳定。我没检查每个batch的loss的波动。
mix data是用于平衡不同数据的采样比例。由于leetcode数据比较少，我在训练模型（https://huggingface.co/fireballoon/baichuan-vicuna-chinese-7b ）把leetcode数据的采样率提高到了2倍（即每个epoch会过2遍leetcode数据）。其他数据采样率都是1，即1epoch过一遍。

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment