Solving perferomance regression issue by caching the kernel_bundle #896

chudur-budur · 2023-02-02T23:00:01Z

This PR might solve issue #886.

In version 0.19.0, the create_program_from_spirv() was done only once. See line 483 in compiler.py.

In the current main it's been done everytime when __call__() function was invoked from JitKernel (See here). Therefore we are now caching the kernel_bundle to avoid the repeated call of create_program_from_spirv().

Have you provided a meaningful PR description?
Have you added a test, reproducer or referred to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
If this PR is a work in progress, are you filing the PR as a draft?

chudur-budur · 2023-02-02T23:00:50Z

@adarshyoga Could you please test this PR to see if you are getting any slow down?

adarshyoga · 2023-02-03T16:32:23Z

Are you caching both llvm bit code and sycl program generated from spirv?

mingjie-intel

Please let Adarsh test before merge. thanks.

.pre-commit-config.yaml

chudur-budur · 2023-02-03T17:09:25Z

Are you caching both llvm bit code and sycl program generated from spirv?

yes @adarshyoga

numba_dpex/core/kernel_interface/dispatcher.py

adarshyoga · 2023-02-03T20:05:46Z

The performance regression in blackscholes no longer happens with this fix. See below times from blackscholes which are close to what we were getting with 0.19.0.

numba_dpex/core/kernel_interface/dispatcher.py

numba_dpex/tests/kernel_tests/test_arg_types.py

numba_dpex/core/caching.py

numba_dpex/tests/kernel_tests/test_arg_types.py

diptorupd · 2023-02-11T06:52:00Z

numba_dpex/tests/kernel_tests/test_arg_types.py

    expected = a * c
+
    device = dpctl.SyclDevice(filter_str)


We want to get away from this pattern as it leads to lots of skipped tests when the filter string is not supported. Instead of using the filter string parameter, just set the device to dpctl.select_default_device(). In future, we will remove all filter string params and control default device using SYCL environment variables.

diptorupd · 2023-02-11T06:53:27Z

numba_dpex/tests/kernel_tests/test_arg_types.py

+
+    kernel = dpex.kernel(check_bool_kernel)
+
+    kernel[dpex.Range(a.size)](da, True)


This test can also be done using a scalar and setting range to 1, in effect launching a sycl single_task.

Also, you can directly use dpctl constructors and do not need to copy data from NumPy.

diptorupd · 2023-02-11T06:55:39Z

@chudur-budur Looks good! Need some minor changes to the test case. Can you also squash your changes into just two commits: one with the caching changes and the other with the test case changes.

Store exec_queue along with kernel_bundle Properly initialize self._kernel_bundle_cache with NullCache() Save kernel_bundle with exec_queue as a key

chudur-budur · 2023-02-13T00:36:58Z

@chudur-budur Looks good! Need some minor changes to the test case. Can you also squash your changes into just two commits: one with the caching changes and the other with the test case changes.

Thanks! I think it's good to go.

diptorupd · 2023-02-13T05:18:41Z

@ fcharras Can you please evaluate the PR to see if you see performance improvements in your benchmarks?

fcharras · 2023-02-13T09:11:24Z

After #898 my benchmarks work again and I can compare properly performances before and after #804 .

This PR definitely fixes a major slowdown that would have made the JIT near unusable for me, it's a 👍 for merge

After this PR the performances before and after #804 have the same order of magnitude and is usable again.

But the python overhead is not as negligible as it used to be before #804 . I would not say it completely fixes #886 . As suggested there, maybe the caching instructions can be made a bit more efficient (e.g attaching the codegen to the jitkernel instances). (or any other bottleneck that a profiler can highlight).

The added overhead I see in our KMeans is about 1sc per run, for small workload it's noticeable.

(edit: NB: a KMeans run is several hundred kernel calls so the overhead is about 0.005 sc per kernel call)

Solving perferomance regression issue by caching the kernel_bundle 8d0c8ea

chudur-budur mentioned this pull request Feb 2, 2023

Perormance regressions introduced by latest changes to main #886

Closed

chudur-budur marked this pull request as ready for review February 2, 2023 23:02

chudur-budur requested review from mingjie-intel and diptorupd as code owners February 2, 2023 23:02

chudur-budur self-assigned this Feb 2, 2023

mingjie-intel reviewed Feb 3, 2023

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

chudur-budur force-pushed the github-886 branch from a06f8f0 to 691338f Compare February 3, 2023 18:53

adarshyoga reviewed Feb 3, 2023

View reviewed changes

numba_dpex/core/kernel_interface/dispatcher.py Show resolved Hide resolved

diptorupd reviewed Feb 3, 2023

View reviewed changes

numba_dpex/core/kernel_interface/dispatcher.py Outdated Show resolved Hide resolved

adarshyoga suggested changes Feb 3, 2023

View reviewed changes

numba_dpex/core/kernel_interface/dispatcher.py Show resolved Hide resolved

chudur-budur mentioned this pull request Feb 4, 2023

JitKernel needs to be cleaned up #901

Closed

chudur-budur force-pushed the github-886 branch 4 times, most recently from 71d8cb1 to e6c9cd1 Compare February 10, 2023 07:46

diptorupd reviewed Feb 10, 2023

View reviewed changes

numba_dpex/core/kernel_interface/dispatcher.py Outdated Show resolved Hide resolved

diptorupd reviewed Feb 10, 2023

View reviewed changes

numba_dpex/tests/kernel_tests/test_arg_types.py Outdated Show resolved Hide resolved

chudur-budur force-pushed the github-886 branch from d2f007d to 59da1d9 Compare February 10, 2023 23:17

diptorupd reviewed Feb 11, 2023

View reviewed changes

numba_dpex/core/caching.py Outdated Show resolved Hide resolved

diptorupd reviewed Feb 11, 2023

View reviewed changes

numba_dpex/tests/kernel_tests/test_arg_types.py Show resolved Hide resolved

diptorupd reviewed Feb 11, 2023

View reviewed changes

chudur-budur added 2 commits February 12, 2023 18:33

Caching kernel_bundle after create_program_from_spirv()

f4d67dc

Store exec_queue along with kernel_bundle Properly initialize self._kernel_bundle_cache with NullCache() Save kernel_bundle with exec_queue as a key

Test with CFD and usm_ndarray

3958d13

chudur-budur force-pushed the github-886 branch from 59da1d9 to 3958d13 Compare February 13, 2023 00:34

diptorupd mentioned this pull request Feb 13, 2023

Minimal reproducer for regression in #804 #898

Closed

diptorupd approved these changes Feb 13, 2023

View reviewed changes

diptorupd merged commit 8d0c8ea into IntelPython:main Feb 13, 2023

diptorupd deleted the github-886 branch February 13, 2023 14:32

github-actions bot added a commit that referenced this pull request Feb 13, 2023

Merge pull request #896 from chudur-budur/github-886

115b970

Solving perferomance regression issue by caching the kernel_bundle 8d0c8ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solving perferomance regression issue by caching the kernel_bundle #896

Solving perferomance regression issue by caching the kernel_bundle #896

chudur-budur commented Feb 2, 2023

chudur-budur commented Feb 2, 2023

adarshyoga commented Feb 3, 2023

mingjie-intel left a comment

chudur-budur commented Feb 3, 2023

adarshyoga commented Feb 3, 2023

diptorupd Feb 11, 2023

diptorupd Feb 11, 2023

diptorupd Feb 11, 2023

diptorupd commented Feb 11, 2023

chudur-budur commented Feb 13, 2023

diptorupd commented Feb 13, 2023

fcharras commented Feb 13, 2023 •

edited

Loading


		kernel = dpex.kernel(check_bool_kernel)

		kernel[dpex.Range(a.size)](da, True)

Solving perferomance regression issue by caching the kernel_bundle #896

Solving perferomance regression issue by caching the kernel_bundle #896

Conversation

chudur-budur commented Feb 2, 2023

chudur-budur commented Feb 2, 2023

adarshyoga commented Feb 3, 2023

mingjie-intel left a comment

Choose a reason for hiding this comment

chudur-budur commented Feb 3, 2023

adarshyoga commented Feb 3, 2023

diptorupd Feb 11, 2023

Choose a reason for hiding this comment

diptorupd Feb 11, 2023

Choose a reason for hiding this comment

diptorupd Feb 11, 2023

Choose a reason for hiding this comment

diptorupd commented Feb 11, 2023

chudur-budur commented Feb 13, 2023

diptorupd commented Feb 13, 2023

fcharras commented Feb 13, 2023 • edited Loading

fcharras commented Feb 13, 2023 •

edited

Loading