|
args = ( |
|
wandb=False |
|
prompt_type='chat' |
|
data_path='instruct.zh.jsonl' |
|
model_path='llama_7b' |
|
micro_batch=3 |
|
total_batch=32 |
|
log_steps=100 |
|
eval_steps=0 |
|
save_steps=200 |
|
warmup_ratio=0.01 |
|
test_size=0 |
|
resume_from_checkpoint=None |
|
ignore_data_skip=False |
|
) |
|
|
|
>>> trainable params: 19988480 || all params: 6758404096 || trainable%: 0.2957573965106688 |
|
***** Running training ***** |
|
Num examples = 51,584 |
|
Num Epochs = 3 |
|
Instantaneous batch size per device = 3 |
|
Total train batch size (w. parallel, distributed & accumulation) = 30 |
|
Gradient Accumulation steps = 10 |
|
Total optimization steps = 4,836 |
|
Number of trainable parameters = 19,988,480 |
|
4836/4836 [41:14:59<00:00, 30.71s/it] |
|
{'train_runtime': 148499.2028, 'train_samples_per_second': 0.977, 'train_steps_per_second': 0.033, 'train_loss': 0.7752850797671341, 'epoch': 2.81} |