-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/neptune opencl #1397
Feat/neptune opencl #1397
Conversation
17bcf4e
to
1f3a3c7
Compare
8f20e24
to
4121d26
Compare
5fa8c6a
to
d892291
Compare
@cryptonemo @porcuquine what's the status of this? |
I haven't tested it, but the changes are mostly feature related, so it depends entirely on the state of neptune's usage of them. Can revisit after other issues are fixed. |
This fully integrates the neptune OpenCL changes. It does not realize the full expected performance gains because we now hit a bottleneck in how the data is loaded onto the GPU from disk. As @dignifiedquire and I discussed, the likely next step is to release this so users can test to ensure they are able to use the If it takes too long, or I have extra time, I can perform the optimization first — but the above is a logical progression and what we have discussed as the plan of record. I haven't had time to actually investigate the fix, but some discussion has prompted my memory, and I think I know the general shape of the problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, will merge after a rebase
d892291
to
abfc6d3
Compare
This is rebased now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good -- clear to merge (squash and merge, preferred) when CI is happy.
This PR adds support for a new feature flag,
gpu2
, which can be used in place ofgpu
. When this flag is active, theopencl
flag of the neptune crate will be used. A new CI test demonstrates how to enable this option and establishes that it works (at least in the CI environment).Because the
opencl
neptune flag makes GPU batch hashing almost twice as fast, we would expect to see that performance realized in GPU column and tree building here. However, the actual gains seen now are much less. My hypothesis is that this is because the parallelism and i/o here were optimized just well enough to not be the bottleneck previously. This hypothesis is partially confirmed by observation that GPU utilization with thegpu2
flag is only a little more than 50% — suggesting that the problem is indeed inability to feed the GPU fast enough.I propose we merge this feature now and encourage users to experiment with the
gpu2
flag in order to ensure it can indeed run successfully on deployed platforms. Performance should still be equivalent to or better than the oldgpu
usage.Once we have established that
gpu2
does not cause harm? That will free us to remove the old behavior and replace it with the new — so that thegpu
flag also activates neptune'sopencl
flag. This will keep the code base simpler and remove one source of complexity when optimizing tree-building in order to exploit the new, better performance of neptuneopencl
.