-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuse inner loop kernels in device CKF #695
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While not understanding most of the changes in the logic, I like the general direction quite a bit. As long as the code still works after this in the same way as it did before, I'm very much on board. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reducing the number of kernels is definitely good. I think the CPU finding algorithm will also need to remove orig_param_id
and skip_counter
to keep the symmetry but this can be done later.
Have you checked the performance of this?
device/common/include/traccc/finding/device/impl/find_tracks.ipp
Outdated
Show resolved
Hide resolved
997ee7c
to
d60e5d7
Compare
PR updated.
It's essentially identical: Before:
After:
|
The inner loop of the device CKF consists of five loops: material interaction application, measurement counting, candidate finding, hole writing, and propagation. I believe that the middle three can be easily merged into a single kernel, reducing the amount of work we have to do on the host and simplifying thd code a lot. This commit makes that change.
The inner loop of the device CKF consists of five loops: material interaction application, measurement counting, candidate finding, hole writing, and propagation. I believe that the middle three can be easily merged into a single kernel, reducing the amount of work we have to do on the host and simplifying the code a lot. This commit makes that change.