float32 vs bf16
#5
by
janimo
- opened
Why the difference between this and the -it model dtypes?
Full precision is usually useful for pre-training. For inference, using bfloat16
should be good :)
Other models (even regular gemma) have bf16 both for base and it models, hence my question about what is the rationale here. Is f32 needed for proper recurrent gemma fine-tuning?
f32 is not needed to do fine-tuning, either f32 or bf16 will be possible to do fine-tuning with.