-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enables adding overload to DpexExpKernelTarget and fully inline them into the final module. #1230
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
diptorupd
force-pushed
the
experimental/enable_overloads
branch
3 times, most recently
from
November 28, 2023 04:41
83806ee
to
c22f8a3
Compare
Unit tests design sketch
|
diptorupd
force-pushed
the
experimental/enable_overloads
branch
from
November 28, 2023 21:58
0685c43
to
cf04418
Compare
ZzEeKkAa
reviewed
Nov 29, 2023
ZzEeKkAa
reviewed
Nov 29, 2023
diptorupd
force-pushed
the
experimental/enable_overloads
branch
from
December 1, 2023 15:33
cf04418
to
788053d
Compare
- A new target option was added for the DpexKernelTarget target to compile functions using the experimental KernelDispatcher differently based on whether they are "kernels" or "device functions". kernels have the spir_kernel calling convention, cannot return a value, enforce execution queue equivalence, and are always compiled down to device IR (SPIR-V). device functions have the spir_func calling convention, do not have the same restrictions on return value and input arguments and are only compiled to LLVM bitcode. - A device_func decorator was added to experimental module. The new decorator is roughly equivalent to numba_dpex.func but uses the new KernelDispatcher and the compilation mode of device function. The `device_func` decorator is registered to compile overloads in DpexExpkernelTarget. - In the kernel compilation mode the final LLVM module is now "finaliozed" before conversion to SPIR-V. During finalization all overload calls are linked into the main (kernel) module and optionally inlined.
- A new target option "inlining threshold" was added to DpexKernelTarget to define how the LLVM inlining passes should optimize the final codegen library. The decorator-level option will supersede any global configuration setting.
diptorupd
force-pushed
the
experimental/enable_overloads
branch
from
December 4, 2023 01:46
788053d
to
f4533b4
Compare
- Addresses review comments on how to properly set the inline_threshold target option
diptorupd
force-pushed
the
experimental/enable_overloads
branch
from
December 7, 2023 00:00
f4533b4
to
ca1ef70
Compare
diptorupd
force-pushed
the
experimental/enable_overloads
branch
from
December 7, 2023 04:15
ca1ef70
to
0af4414
Compare
adarshyoga
approved these changes
Dec 7, 2023
github-actions bot
added a commit
that referenced
this pull request
Dec 7, 2023
Enables adding overload to DpexExpKernelTarget and fully inline them into the final module. d1700ff
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Have you provided a meaningful PR description?
Add a new
device_func
decorator to compileDpexKernelTarget
overloadsgenerate_device_ir
option forDpexTargetDescriptor
is introduced. The new option can be used to prevent a module to be compiled to device IR binary.device_func
decorator that is registered to compile overloads inDpexExpKernelTarget
. Thedevice_func
decoratedfunctions are not compiled to device IR.
kernel
decorator is updated to compile a finalized module to device IR.Have you added a test, reproducer or referred to an issue with a reproducer?
Have you tested your changes locally for CPU and GPU devices?
Have you made sure that new changes do not introduce compiler warnings?
If this PR is a work in progress, are you filing the PR as a draft?