CUDA usage is low
When I trained a gemma2, the GPU usage is low (0% at most of time). And when I use the same method (LoRA, peft library) to train llama, the GPU usage is constantly about 100%. What's the reason?
Hi @Max545 ,
I executed both the models in GPU type NVIDIA_TESLA_A100 x 1. When running models like google/gemma-2b
and meta-llama/Llama-2-7b-hf
, if the device is not specified as "auto"
, the models will use system RAM instead of the GPU. However, if you explicitly set device="cuda"
, the models will automatically run on the GPU, utilizing its computational power for faster processing. Please refer to the following gist for more details: link to gist.
The difference in GPU usage between Gemma2 and LLaMA during fine-tuning with LoRA can be attributed to several factors:
Model architecture: LLaMA is more optimized for efficient GPU usage, while Gemma2 may not be as well-tuned for GPU-heavy tasks.
Memory bottlenecks: Inefficient memory management or slow data transfer between CPU and GPU in Gemma2 can result in lower GPU usage.
Framework support: LLaMA has better support in the PEFT library and related tools, which could lead to better GPU utilization compared to Gemma2.
Thank you.