-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] GPU memory explodes when using Conv2D layers in Dict Observations FeatureExtractor #863
Comments
Hello, |
The error happens when you collect data or when training is done (during gradient update)? Why do you use a |
@araffin the error happens at the start of the training. I am using |
@araffin What is curious is the drastic difference in memory usage between using only the Flatten() layer and the two Conv2D layer. |
yes, and you can exclude specific observation key too (recommended here). could you please give the output of |
OS: Linux-5.14.18-100.fc33.x86_64-x86_64-with-fedora-33-Thirty_Three #1 SMP Fri Nov 12 17:38:44 UTC 2021 |
Your math is only taking into account the input, no? I think you should be able to provide a minimal code to reproduce that issue independent of SB3 (because your issues happens at train time). |
@araffin I guess your right! Although its strange that the difference between the FeatureExtractor with the nn.Flatten() layer only vs. the one with the two Conv2D should only be a few thousand parameters. It should not have that much of an impact on the memory usage. Were talking a 6GB difference in memory usage for two Conv2D layers of 16 3x3 kernels each lol. |
replacing the two nn.Conv2D layers by two nn.Linear layers of 1024 units works like a charm and has no significant impact on the memory usage. Those 2 dense layers should have more parameters than the two Conv2D layers... Again pointing to something weird with the nn.Conv2D memory usage. |
Still having some issues at #1630 with this. I am not using any Conv2D layer. In fact I am using a GraphTransformer, so I cannot reduce the size of the matrixes or the graph nor flatten them. Does anyone addressed the real issue with this? |
🐛 Bug
I am having issues in SB3 with a CustomFeatureExtractor for a Dict observation space that is making my GPU memory explode. The observation space is composed of a single channel image (1,51,101) and three vectors with corresponding dimensions (9,) , (1,) & (1,). My problem is when I add Conv2D layers for the image in the feature extractor, the GPU memory explodes and I get a OOM error. Replacing the Conv2D layer with a simple Flatten() layer works like a charm. When using the Conv2D layer for the image, the GPU memory of my 3080 caps off at over 9GB, and using the flatten layer instead of the CONV2D runs with only 3.2GB of used memory.... How can a single layer of 2D convolution add over 6GB of GPU memory usage? I have done the math for the space taken by the batch of observations in Float32: ((101*51) + (9) + (1) +(1)) * 4bytes * 10000 = 0.2 GB of memory. The value and policy network with the feature extractor should only be around 1.3M parameters which should more than fit in my 10GB of memory on the 3080.
To Reproduce
Expected behavior
I expect the two conv2D layer to only add a modest amount of parameters.
### System Info
GPU is an RTX3080.
I installed everything using pip on a separate conda environment.
Additional context
I am surprised that the network runs fine with just a nn.Flatten() layer for the image feature extractor and that adding the two layers of Conv2D adds up to 6GB of memory usage.
I am using a lot of vectorized environment because the real use case involves an cpu intensive env.
Checklist
The text was updated successfully, but these errors were encountered: