Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM issue #22

Open
ecilay opened this issue Aug 9, 2018 · 2 comments
Open

OOM issue #22

ecilay opened this issue Aug 9, 2018 · 2 comments

Comments

@ecilay
Copy link

ecilay commented Aug 9, 2018

When i followed the instructions as specified in the docker setup, it always give out of memory error. But I am already using an AWS P3 instance, which has a Tesla V100.
Is this expected or sometime is wrong in my setup?

My config:
tensorflow-gpu: 1.10.0
keras: 2.0.9

Error from vgg_normalised.py line 38:

OOM when allocating tensor of shape [3] and type float
[[Node: vgg_encoder/preprocess/Const_1 = Constdtype=DT_FLOAT, value=Tensor<type: float shape: [3] values: -103.939 -116.779 -123.68>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Thanks!

@eridgd
Copy link
Owner

eridgd commented Aug 9, 2018

The V100 has 16GB of VRAM, so that should certainly be enough (it runs fine on my 4GB GPU). Does the same thing happen if you try running it outside of docker?

The first thing that comes to mind is that you may be running something else that's holding onto GPU mem, that's a mistake I make all the time. If you run nvidia-smi and look in the "Memory-Usage" column, does it show 16GB total with only a small portion of that used? You can run this within the container as well: nvidia-docker run --rm wct-tf nvidia-smi

@ecilay
Copy link
Author

ecilay commented Aug 9, 2018

hello thanks for the reply!
Yea i checked nvidia-smi all the time and it is occupying the full memory like below screenshot for a aws p2 instance (12G GPU). I didn't use docker, directly issue the python command in instance.
I used the command python3 stylize.py --checkpoints models/relu5_1 models/relu4_1 models/relu3_1 models/relu2_1 models/relu1_1 --relu-targets relu5_1 relu4_1 relu3_1 relu2_1 relu1_1 --style-size 512 --alpha 0.8 --out-path static/style.jpg.

screen shot 2018-08-09 at 4 44 48 pm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants