GPU memory for training #5

ChongWang1024 · 2024-01-26T09:15:52Z

Hi,
Thanks for sharing the code of this interesting work.

I am trying to run the training on the fastMRI dataset and I got CUDA out of memory issue even with batch size=1.
My GPU is NVIDIA A5000, which has 24G memory.

Could you please tell me how much GPU memory is required to train with batchsize=1?

BTW, I noticed that the memory is gradually increasing for each iteration (batch).
Is that normal? Maybe this is somehow related to the code itself and I didn't notice.

Many thanks! looking forward to your reply.

hellopipu · 2024-01-27T19:35:13Z

Hi @ChongWang1024 ,

Approximately 26 GB of GPU memory is required for training on the FastMRI knee dataset. You can decrease the feature dimension to accommodate your GPU.

I haven't observed any gradual increase in memory usage from my end. Could you provide more details about this issue?

hellopipu · 2024-01-28T01:07:09Z

Hi @ChongWang1024 ,

Please update the code and then add --low_mem in the training command. This will enable you to use only ~22GB of memory without modifying the model.

hellopipu · 2024-02-29T18:06:59Z

The potential reason for memory leakage is the pip version of h5py package. You can fix it by conda install h5py or pip install h5py==3.3.

reference:
facebookresearch/fastMRI#217
facebookresearch/fastMRI#215

ChongWang1024 · 2024-03-01T13:55:15Z

The potential reason for memory leakage is the pip version of h5py package. You can fix it by conda install h5py or pip install h5py==3.3.

reference: facebookresearch/fastMRI#217 facebookresearch/fastMRI#215

Hi,
Thanks for your detailed reply.
I have figured out the problem, it seems to be the wrong version of my pytorch-lightning and h5py.

Many thanks!

hellopipu closed this as completed Jan 28, 2024

hellopipu reopened this Feb 29, 2024

ChongWang1024 closed this as completed Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU memory for training #5

GPU memory for training #5

ChongWang1024 commented Jan 26, 2024

hellopipu commented Jan 27, 2024

hellopipu commented Jan 28, 2024

hellopipu commented Feb 29, 2024 •

edited

Loading

ChongWang1024 commented Mar 1, 2024

GPU memory for training #5

GPU memory for training #5

Comments

ChongWang1024 commented Jan 26, 2024

hellopipu commented Jan 27, 2024

hellopipu commented Jan 28, 2024

hellopipu commented Feb 29, 2024 • edited Loading

ChongWang1024 commented Mar 1, 2024

hellopipu commented Feb 29, 2024 •

edited

Loading