-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split the CUDA CKF into different TUs #742
Conversation
bbeb4f7
to
30aebf1
Compare
My plan to finally get rid of those .ipp files has failed. 🫡 Anyway, updated. |
24a9c9e
to
4ec0290
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure there's no hastiness on this one...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, let's go with this general setup at the moment. As we discussed in person, I have some further ideas for how to tweak the code later on even further, but these changes do not make those future changes any more difficult. (The general direction of my idea is very similar.)
device/common/include/traccc/finding/device/apply_interaction.hpp
Outdated
Show resolved
Hide resolved
4ec0290
to
54e7489
Compare
Great! I've rebased and incorporated the change requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a little more to go...
54e7489
to
bf45c83
Compare
This commit splits the monstrously large CUDA track finding translation unit up into smaller ones, one for each of the kernels. This should speed up compilation times and decrease memory usage. Also groups the payloads for each of the functions into convenient structs, so we don't need to pass 20+ arguments for some of the kernel calls. Does not change the functionality of the code.
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get it in finally...
This commit splits the monstrously large CUDA track finding (and some CCL) translation unit up into smaller ones, one for each of the kernels. This should speed up compilation times and decrease memory usage.
Also groups the payloads for each of the functions into convenient structs, so we don't need to pass 20+ arguments for some of the kernel calls.
Does not change the functionality of the code.