-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jcufft branch status #1
Comments
Yes, I would like to see this jcufft implementation integrated into the original project. I was in contact with Daniel Sage, the lead developer of DL2 and it seemed to be interested in that implementation but I didn't hear from him for two months now. I am waiting for him to get in touch with me. Note that this implementation perform FFT/iFFT on the GPU but the actual deconvolution is still done on the CPU. Which mean that there is a lot of time spent transferring data back and forth in between CPU and GPU. That being said I saw performance improvement using the jcufft wrapper and I guess that improvement will vary according to your image size. A more promising approach is to perform everything on the GPU FFT/deconvolution/iFFT. @bnorthan has a very nice implementation that uses that approach and the increase in speed are dramatic (one order of magnitude). The setup is a still a bit complex to try but @bnorthan is working on it: https://github.com/imagej/ops-experiments. We are getting closer and closer to an open-source package for real-time deconvolution :-) |
Ah awesome! Thanks for the info @hadim, that's super helpful. Specifically, I was asking because I was exploring alternatives to commercial options and in some (far from exhaustive) testing I noticed that at least for one dataset, DL2 was giving comparable runtimes and that the GPU-enabled extension you added was a solid 3-4x improvement over the CPU only FFT. Everything built cleanly for me on the first try too (so kudos for that!) and perhaps the need here for only jcuda and the usual Nvidia toolkit will outweigh some extra performance benefit (for us at least) from using something presumably harder to integrate into a build like YacuDecu. Either way though, @bnorthan if you have any experience you'd be willing to share on trying to get it all on a GPU I'd love to hear it. I think we may be straddling a fence on trying to do something like that ourselves but if there's any way to help or not duplicate some effort that would be fantastic. Thanks guys. |
Hi @eric-czech , @hadim I've been working on a wrapper to YacuDecu which also uses the fast Gibson Lanni implementation by Jizhou Li. Combined you can get near real time performance but the build is a bit complex, atleast for now. If you are interested I can give you further details. The code is here (note the linux build is currently in a separate branch). |
Hey @hadim and @bnorthan, just to close this out I thought I'd mention that in order to get at least one simple deconvolution algorithm running entirely on a GPU without a crazy build or anything (and that works well across platforms) I ported a vanilla Richardson Lucy implementation to run on Tensorflow. It turned out to be way faster than anything else I tried and since Tensorflow checks the Mac, Windows, Linux, GPU/Multi-GPU, and easy to install boxes, it seems like it could be a good platform for realtime deconvolution. Realtime wasn't the use case at our lab, we just needed something fast and free for big 3D microscopy volumes, but for what it's worth I dropped all that work into this Flowdec repo. |
Thank you for keeping us in touch. That looks promising! |
I would encourage you to report your work on the ImageJ forum: http://forum.imagej.net/ |
Will do! |
Hey @hadim, I was curious what the state was of the jcuda implementation I saw you added here. I had tried your code here myself as an experiment and it all seemed to work appropriately but would I'd love to get a better sense of whether or not you were trying to get this pushed back to the original project. Did you see anything to indicate that the GPU support wasn't working as expected? Or did it at least work well for your purposes so far?
Thank you for putting this out in the wild!
The text was updated successfully, but these errors were encountered: