-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always use size 16 sub-groups in single work-group radix sort if supported #1833
Conversation
This change is added to avoid a bug where IGC cannot compile SIMD32 kernels with -O0 compilation flags. No performance impact is observed. Signed-off-by: Matthew Michel <matthew.michel@intel.com>
Signed-off-by: Matthew Michel <matthew.michel@intel.com>
…ely" This reverts commit 7572f84. After experimentation with cold cache benchmarks, benefit of up to ~15% was observed requiring size 16 sub-groups for the larger single work-group cases. IGC compiles the kernels with sub-group size 32 here so leaving the requirement is needed to maximize performance.
There is a comment on line 819 of |
Signed-off-by: Matthew Michel <matthew.michel@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'd like @MikeDvorskiy to also be able to look at this.
I made a few tweaks to the comment to match the new behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind regarding the changes, if it doesn't make the performance worse. (As written in PR description - it doesn't)
…orted (#1833) * Only use sub-group sizes of 16 in one-wg radix sort This change is added to avoid a bug where IGC cannot compile SIMD32 kernels with -O0 compilation flags. No performance impact is observed. Signed-off-by: Matthew Michel <matthew.michel@intel.com> --------- Signed-off-by: Matthew Michel <matthew.michel@intel.com>
…orted (#1833) * Only use sub-group sizes of 16 in one-wg radix sort This change is added to avoid a bug where IGC cannot compile SIMD32 kernels with -O0 compilation flags. No performance impact is observed. Signed-off-by: Matthew Michel <matthew.michel@intel.com> --------- Signed-off-by: Matthew Michel <matthew.michel@intel.com>
Previously, an IGC workaround has caused kernels compiled with
[[sycl::reqd_sub_group_size(32)]]
to be silently compiled with size 16 sub-groups with-O0
on certain devices. To comply with the SYCL spec, this case will now throw a synchronous exception at JIT time.Single work-group radix sort uses a sub-group sizes of 32 for the smallest inputs and 16 for larger inputs to minimize register pressure. I have experimented with the current mainline version, a version only using sub-group size of 16, and a version removing the requirement altogether. The first two versions performed similarly with no measurable difference, and explicitly requiring a sub-group size of 16 offered up to a ~15% benefit over the version with no requirement at all.
By using only sub-group sizes of 16 in single work-group radix sort, we are able to avoid the IGC issue while not impacting performance.