-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Almost finished #5
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A good test for the eltype generality of our operations
Now it's possible to construct kernel-tuples with a mix of TS and AbstractArray kernels, and things Just Work.
No actual tests changed
Previously, filter cascades would result in undefined values on the edges of intermediate states. This tracks the edge size at each stage of the cascade to ensure that no invalid values are used.
From an efficiency standpoint it's better to do the TriggsSdika kernels first.
Taken from Images pull request 544
Switches to returning ReshapedVectors from the factored kernels, and adds optimized methods. It's not clear we strictly need all the optimized methods, so some of these may go away sometime.
The performance is very good; at this point the main avenue for improvement is to implement lazy padding.
This means that padding is no longer necessary for pure-FIR filters (and pure-IIR filters). The most complex part of this was getting the indices-handling correct.
Awesome \m/ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This isn't quite done yet, but I can't resist the opportunity for a status update. It has taken longer than expected, but I decided to take the opportunity to really push the architecture and see how flexible and performant I could make this. It turns out that quite a lot could be done, and the results are impressive if I say so myself 😄. One of the motivations was benchmarking some of @mronian's code, which pointed out that
imgradients
(and really,imfilter
) was the bottleneck. So aside from being an exercise in testing the architecture, it seemed there might be real-world performance benefits to be had.This isn't finished yet, so this is uglier than it will be, but check this out:
Compare to Images:
And compare to a GPU-implementation (note this moves data to-and-from the GPU, but for me
@benchmark
crashes otherwise with aDevice out of memory
error):So if you want the data back on the CPU again for whatever comes next, we're now quite a bit faster in plain julia than a GPU implementation.
Notes:
JULIA_NUM_THREADS=8
. Yes, this is now supporting threads. Single-threaded this exhibits a median of 13ms, so it's still a considerable improvement over what we have in Images. (Spoiler: it uses a fancy cache-efficient tiled implementation, using tricks developed in TiledIteration.)