-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solving perferomance regression issue by caching the kernel_bundle #896
Conversation
@adarshyoga Could you please test this PR to see if you are getting any slow down? |
Are you caching both llvm bit code and sycl program generated from spirv? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let Adarsh test before merge. thanks.
yes @adarshyoga |
a06f8f0
to
691338f
Compare
The performance regression in blackscholes no longer happens with this fix. See below times from blackscholes which are close to what we were getting with 0.19.0. ERF: Numba@jit-loop-par | Size: 524288 | MOPS: 490767591.93 | TIME: 0.002137 |
71d8cb1
to
e6c9cd1
Compare
d2f007d
to
59da1d9
Compare
expected = a * c | ||
|
||
device = dpctl.SyclDevice(filter_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to get away from this pattern as it leads to lots of skipped tests when the filter string is not supported. Instead of using the filter string parameter, just set the device to dpctl.select_default_device()
. In future, we will remove all filter string params and control default device using SYCL environment variables.
|
||
kernel = dpex.kernel(check_bool_kernel) | ||
|
||
kernel[dpex.Range(a.size)](da, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test can also be done using a scalar and setting range to 1
, in effect launching a sycl single_task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you can directly use dpctl constructors and do not need to copy data from NumPy.
@chudur-budur Looks good! Need some minor changes to the test case. Can you also squash your changes into just two commits: one with the caching changes and the other with the test case changes. |
Store exec_queue along with kernel_bundle Properly initialize self._kernel_bundle_cache with NullCache() Save kernel_bundle with exec_queue as a key
59da1d9
to
3958d13
Compare
Thanks! I think it's good to go. |
@ fcharras Can you please evaluate the PR to see if you see performance improvements in your benchmarks? |
After #898 my benchmarks work again and I can compare properly performances before and after #804 . This PR definitely fixes a major slowdown that would have made the JIT near unusable for me, it's a 👍 for merge After this PR the performances before and after #804 have the same order of magnitude and is usable again. But the python overhead is not as negligible as it used to be before #804 . I would not say it completely fixes #886 . As suggested there, maybe the caching instructions can be made a bit more efficient (e.g attaching the codegen to the jitkernel instances). (or any other bottleneck that a profiler can highlight). The added overhead I see in our KMeans is about 1sc per run, for small workload it's noticeable. (edit: NB: a KMeans run is several hundred kernel calls so the overhead is about 0.005 sc per kernel call) |
Solving perferomance regression issue by caching the kernel_bundle 8d0c8ea
This PR might solve issue #886.
In version 0.19.0, the
create_program_from_spirv()
was done only once. See line 483 incompiler.py
.In the current main it's been done everytime when
__call__()
function was invoked fromJitKernel
(See here). Therefore we are now caching thekernel_bundle
to avoid the repeated call ofcreate_program_from_spirv()
.