-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenCL support #44
Comments
I will try to debug this and get back to this post in a few days. Unfortunately, I am new to lua and torch and this is the first code that I have written in this language, hence am still learning. Also, it occurred to me today that can my GPU memory be an issue? I only have a 1 GB ATI FirePro 3900 GPU. |
@napsternxg GPU memory could indeed be an issue: It seems that the memory requirements are growing exponentially (quadratically?) with the size of the image you are rendering. I am running in CPU mode on a 32GB machine and I start running out of memory at -image_size > 1024. |
Good call on GPU memory; 1GB is not enough for the default settings. |
I tried running with |
Ok using multiple print statements, I believe I have figured out the issue: I checked the clnn documentation and I see all the layers are implemented. Is there something I am missing ? These are the first few lines of my generated require 'nn'
require 'clnn'
local model = {}
table.insert(model, {'conv1_1', nn.SpatialConvolutionMM(3, 64, 3, 3, 1, 1, 1, 1)})
table.insert(model, {'relu1_1', nn.ReLU(true)})
table.insert(model, {'conv1_2', nn.SpatialConvolutionMM(64, 64, 3, 3, 1, 1, 1, 1)})
table.insert(model, {'relu1_2', nn.ReLU(true)})
table.insert(model, {'pool1', nn.SpatialMaxPooling(2, 2, 2, 2, 0, 0):ceil()})
table.insert(model, {'conv2_1', nn.SpatialConvolutionMM(64, 128, 3, 3, 1, 1, 1, 1)})
table.insert(model, {'relu2_1', nn.ReLU(true)})
table.insert(model, {'conv2_2', nn.SpatialConvolutionMM(128, 128, 3, 3, 1, 1, 1, 1)})
table.insert(model, {'relu2_2', nn.ReLU(true)})
table.insert(model, {'pool2', nn.SpatialMaxPooling(2, 2, 2, 2, 0, 0):ceil()}) |
I'm not really sure what's wrong; here are two random ideas: (1) In the .opencl.lua file maybe you also need to |
seems to be the case if I understand correctly: https://github.com/hughperkins/clnn/search?q=ceil @hughperkins could you confirm? |
Yes, (Edit: an alternative way to hack this for now, if you dont need the functionality behind
This will monkey-patch SpatialMaxPooling to have this method, although the method wont actually do anything for now (PS Wow, the pictures of output from the neural-style project on the front-page README.md look awesome :-O ) (Edit 3: by the way, when
then I run it like:
you need to build in debug mode to get line numbers and stuff. I usually do this by editing the |
Thanks @hughperkins. I ran GDB on the file and here is the result. $ gdb luajit -ex "catch throw"
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from luajit...(no debugging symbols found)...done.
Catchpoint 1 (throw)
(gdb) run neural_style_opencl.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -gpu 0 -backend 'clnn' -output_image profile.png -image_size 10
Starting program: /home/username/torch/install/bin/luajit neural_style_opencl.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -gpu 0 -backend 'clnn' -output_image profile.png -image_size 10
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Traceback (most recent call last):
File "/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19-gdb.py", line 63, in <module>
from libstdcxx.v6.printers import register_libstdcxx_printers
ImportError: No module named 'libstdcxx'
In Function main
Starting load model
In loadcaffe_load
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192
Successfully loaded models/VGG_ILSVRC_19_layers.caffemodel
Finished proto to lua
In iteration %d 1
conv1_1: 64 3 3 3
In iteration %d 2
In iteration %d 3
conv1_2: 64 64 3 3
In iteration %d 4
In iteration %d 5
In iteration %d 6
conv2_1: 128 64 3 3
In iteration %d 7
In iteration %d 8
conv2_2: 128 128 3 3
In iteration %d 9
In iteration %d 10
In iteration %d 11
conv3_1: 256 128 3 3
In iteration %d 12
In iteration %d 13
conv3_2: 256 256 3 3
In iteration %d 14
In iteration %d 15
conv3_3: 256 256 3 3
In iteration %d 16
In iteration %d 17
conv3_4: 256 256 3 3
In iteration %d 18
In iteration %d 19
In iteration %d 20
conv4_1: 512 256 3 3
In iteration %d 21
In iteration %d 22
conv4_2: 512 512 3 3
In iteration %d 23
In iteration %d 24
conv4_3: 512 512 3 3
In iteration %d 25
In iteration %d 26
conv4_4: 512 512 3 3
In iteration %d 27
In iteration %d 28
In iteration %d 29
conv5_1: 512 512 3 3
In iteration %d 30
In iteration %d 31
conv5_2: 512 512 3 3
In iteration %d 32
In iteration %d 33
conv5_3: 512 512 3 3
In iteration %d 34
In iteration %d 35
conv5_4: 512 512 3 3
In iteration %d 36
In iteration %d 37
In iteration %d 38
In iteration %d 39
fc6: 1 1 25088 4096
In iteration %d 40
In iteration %d 41
In iteration %d 42
fc7: 1 1 4096 4096
In iteration %d 43
In iteration %d 44
In iteration %d 45
fc8: 1 1 4096 1000
In iteration %d 46
Finished iterations clnn
Finished network setup
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Turks
[New Thread 0x7fffb2cc1700 (LWP 5091)]
Catchpoint 1 (exception thrown), 0x00007fffc389b8b0 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 And here is the back trace:
|
Ok, good. Then, if you do the following you should get the error message, I think:
I suspect, given where it is, and what it's doing, that it might say 'out of memory', ie "CL_MEM_OBJECT_ALLOCATION_FAILURE". |
Note that since using gdb is kind of annoying :-P So, I've pushed a couple of updates to cltorch, that will catch the exception, and convert it into a torch error, eg:
It still contains a lot of 'magic messages', but mildly more informative than before perhaps? You can update to this version by simply rerunning |
@hughperkins You rock! Thanks for helping out on the OpenCL port - I know almost nothing about OpenCL. |
Thanks a lot @hughperkins for looking into this. I have not yet updated cltorch but have run
I will run my code again with the new cltorch and report any other findings. |
"Invalid buffer size". Hmmm. It probably means that trying to allocate a buffer that is far too big, or perhaps wont fit in available memory. It plausibly could mean that the size of the buffer being requested has been corrupted somehow. However, looking at the stack trace you provided earlier, we can see the size in frame 3, For the second point, maximum GPU buffer alloc size, you might have an executable in
On line 8, you can see 'max memory alloc size', ie the largest buffer you can allocate at once. For my laptop, it is 256MB, less than 400MB. |
Here is the output after updating my cltorch package.
|
Ok, looks like the error message is more informative than the original |
If you notice in the command I am running. I am setting the Also, I couldn't find gpuinfo in my torch folder. |
Hmmmm.... 10? You mean, it's a 10 by 10 image? Edit: for gpuinfo, you might have a system/opencl command |
These are the lines where the image transformation is taking place: local content_image = image.load(params.content_image, 3)
content_image = image.scale(content_image, params.image_size, 'bilinear')
local content_image_caffe = preprocess(content_image):float()
local style_image = image.load(params.style_image, 3)
local style_size = math.ceil(params.style_scale * params.image_size)
style_image = image.scale(style_image, style_size, 'bilinear')
local style_image_caffe = preprocess(style_image):float()
I believe it is re-sizing it to |
Hmmm, looks like it is the fully-connected layer that is causing the 400MB alloc:
|
If you modify the conv5_4 layer to have eg 256 output planes, instead of 512, then you can probably reduce the fc6 layer from |
I couldn't find the |
Try |
This is the input from
|
So, what you can try doing is changing conv5_4 layer to have 128 output planes, and change fc6 from |
I made the changes and am still getting a new error:
I believe the reason is that the original caffee model has that layer size which can't be changed. This neural-style code is only to do inference and not train the model. Hence, I should use the original model. Probably @jcjohnson can confirm on this. |
Hmmm, right, your explanation appears to match the error message. I guess you will need to use a smaller model perhaps. How about this one? https://gist.github.com/mavenlin/d802a5849de39225bcc6 |
I have pushed my changes to https://github.com/napsternxg/neural-style/tree/opencl I will try to run with the smaller model. In the meanwhile will it be possible for you to run my code and see if it works, maybe the issue is only my GPU memory. It would be great to know if the port works on other opencl systems without using any cuda libraries. |
Tried it on my 1Gb card. Command line: Got further than @napsternxg managed (so it does seem to be the lack of GPU memory in his case), but then got a 'not implemented':
The line that throws that error: https://github.com/hughperkins/clnn/blob/6f79cd72d4a2434dd55d5e8a365013c632146155/SpatialMaxPooling.cpp#L166 |
Ok got it. |
I think you can remove accGradParameters. That cuts down on memory slightly. Have a look at hughperkins@d9e4dd4 |
Got it. Changed that and now I am able to run it using multiple content and style layers. And I am getting pretty promising results. If you run it with the following command:
You will get image like the following: @jcjohnson what do you think of this output. It was constructed using the |
Cool :-) |
@napsternxg Looks pretty good to me - better than the results I got with CaffeNet, but not quite as nice as the VGG-19 results. At any rate it looks like the OpenCL port is pretty much working as intended at this point; I'm happy to merge into master if you send me a PR. |
@jcjohnson I can send the pull request but it will not work out of the box. @hughperkins has made some changes to torch code as well as clnn code for average pooling which may cause an issue. I will clean up some of the things on my side and update the code on my repo as of now. I think once the clnn issue is fixed we can merge it into your repo. |
Sounds good to me. |
Hi. I've created a new version of clnn which uses less memory. Comparing with other versions, on my 1GB NVIDIA card:
You need to install branch
|
th neural_style.lua -style_image examples/inputs/picasso_selfport1907.jpg -content_image examples/inputs/brad_pitt.jpg -gpu 1 -output_image profile.png -image_size 300 -model_file models/vgg_normalised.caffemodel -backend clnn -num_iterations 1000 -save_iter 50 -normalize_gradients -content_weight 50000 -style_weight 90000 |
Hi codeaudit, this thread is getting a bit crazy long :-P Do you mind opening a new issue in https://github.com/hughperkins/clnn/issues please? Also, please provide the exact commit, branch, and repository that you are running from. It looks like you are using a branch/commit that is importing |
it give me this :Successfully loaded models/nin_imagenet_conv.caffemodel |
Strong preference for logging issues at https://github.com/hughperkins/clnn/issues ;-) But... anything to do with prototxt is often a problem with the downloaded model. Can you run
(Edit: hmmm, I suppose logging this at clnn would look a bit strange. I guess logging things into this thread is ok for now) |
7fbe66d382c55f742a6f8c6171011eb models/download_models.sh |
Hmmm, looks convincing. will ponder a bit. To avoid spamming everyone subscribed to this thread, I've enabled issue logging at https://github.com/hughperkins/neural-style/issues , and logged your issue at hughperkins#1 |
Update: spatialaveragepooling ceil mode has now been merged into nn, ie torch/nn#365 , and to clnn master, commit hughperkins/clnn@6e4976c , so we should be good to create a pull request for the opencl changes into neural-style now. Who's going to do that? |
I will send that by tomorrow. |
you're great guys, now I can finally create fake Rembrandts with my Radeon 💃 |
Cool :-)
:-D |
Closes jcjohnson#44 Thanks to @hugeperkins and @jcjohnson for all the help. README updated with the new example.
@jcjohnson I have send the pull request. Ready to Merge =) |
I tried implementing OpenCL support and the code is at: https://github.com/napsternxg/neural-style/tree/opencl
However I get the following error when running the code:
I believe the issue is because of the SpatialConvolutionMM which is implemented in ccn2 module.
The text was updated successfully, but these errors were encountered: