Training Parameters
#13
by
maveriq
- opened
Hi again. Since your work is probably going to encourage quite a few people to train their own LMs from scratch (I know I am going to), can you share the training hyperparameters, so that we can do a fair comparison with your model and results? Specifically I am looking for information on :
- Optimizer and it's parameters ( e.g. betas and eps in case of Adam)
- Learning rate schedulers and it's parameters (e.g. type of scheduler and warm pct, decay shape etc.)
- Batch size, learning rate
- num steps/tokens seen
- any optimizations e.g. fairscale, deepspeed etc.?
Thank you.
maveriq
changed discussion status to
closed