Only 4G memory usage when inferring while 38Go when training
#15
by
hayj
- opened
Is it normal it takes much more GPU mem when training, or am I wrongly using it?
I use a Nvidia A100.
Yes, this is normal. During training, it needs to store optimizer states, intermediate activations, and some other stuff, which are several times larger than the model weights.
Please refer to https://huggingface.co/docs/transformers/v4.20.1/en/perf_train_gpu_one#anatomy-of-models-memory for more details.