-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UR][L0] Unify use of large allocation in L0 adapter #1099
Conversation
31a5796
to
3482cdf
Compare
@smaslov-intel : please review |
static const bool UseLargeAllocations = [this] { | ||
const char *UrRet = std::getenv("UR_L0_ALLOW_LARGE_ALLOCATIONS"); | ||
if (!UrRet) | ||
return (this->isPVC() ? true : false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this isPVC check ?
It is not required on PVC.
PVC Level Zero driver does it by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichalMrozek : this is so the rest of the adapter has "large allocation behavior".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But PVC is already in large allocation behavior by default.
There is no need to do any special for this device.
There is no need to add compiler option and it is quite dangerous to bypass max memory allocation size limits by default.
By using those variables it indicates that PVC requires some special handling for large allocations which in fact is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @MichalMrozek . I have changed code to use defaults on PVC.
// On some Intel GPUs, this influences how kernels are compiled. | ||
// If large allocations (>4GB) are requested, then kernels are | ||
// compiled with stateless access. | ||
// If small allocations (<4GB) are requested, then kernels are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not accurate, even if -ze-opt-greater-than-4GB-buffer-required is not specified, kernels may still be compiled in stateless mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I will remove the comment
// If small allocations (<4GB) are requested, then kernels are | ||
// compiled with stateful access, with potential performance | ||
// improvements. | ||
// Some GPUs support only one mode, such us Intel(R) Data Center GPU Max, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel(R) Data Center GPU Max supports both stateful and stateless modes.
Level Zero implementation for Intel(R) Data Center GPU Max allows only stateless mode for this device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix comment indicating is the driver that has only that uspport.
@nrspruit: please review. |
843dd22
to
c36c2bf
Compare
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
Intel(R) GPUs have two modes of operation in terms of allocations: Stateful and stateless mode. Stateful optimizes memory accesses through pointer arithmetic. This can be done as long as allocations used by the allocation are smaller than 4GB. Stateless disables such pointer-arithmetic optimization to allow the kernel to use allocations larger than 4GB. Currently, L0 adapter dynamically and automatically requests the L0 driver large allocations if it detects an allocation size is larger than 4GB. This creates a problem if a kernel has been previously compiled for stateful access. This ultimately means the adapter mixes stateful and stateless behavior, which is not a user-friendly experience. This patch aims at correcting this behavior by defining a default one. On Intel(R) GPUs previous to Intel(R) Data Center GPU Max, default behavior is now stateless, meaning all allocations are only allowed by default. Users can opt-in for stateful mode setting a new environment variable UR_L0_USE_OPTIMIZED_32BIT_ACCESS=1. Addresses: https://stackoverflow.com/questions/75621264/sycl-dot-product-code-gives-wrong-results Signed-off-by: Jaime Arteaga <jaime.a.arteaga.molina@intel.com>
c36c2bf
to
28590a8
Compare
// ze-opt-greater-than-4GB-buffer-required to disable | ||
// stateful optimizations and be able to use larger than | ||
// 4GB allocations on these kernels. | ||
if (Context->Devices[0]->useOptimized32bitAccess() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be in the Exp function and have the Exp function have the Compile code now? That way you check the specific device being passed in and not just the Context->Devices[0] since you might be on a non-uniform system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @nrspruit . Good idea. However, urProgramCompileExp
at this moment is unimplemented, so adding implementation for urProgramCompileExp
on top of these changes would make this PR too big. I think it is better we merge this patch, then we add the support for urProgramCompileExp
, including using the functionality from this patch. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that would be fine, a follow-up patch would be good improvement on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me
I have updated the target branch of this PR from the |
I'm going to create a combined intel/llvm PR which will include this and a few other PR's so we can get things merged quicker. |
[UR][L0] Unify use of large allocation in L0 adapter
Combines the changes of the follow Unified Runtime pull requests: * oneapi-src/unified-runtime#1108 * oneapi-src/unified-runtime#988 * oneapi-src/unified-runtime#1071 * oneapi-src/unified-runtime#916 * oneapi-src/unified-runtime#1099
Combines the changes of the follow Unified Runtime pull requests: * oneapi-src/unified-runtime#1108 * oneapi-src/unified-runtime#988 * oneapi-src/unified-runtime#1071 * oneapi-src/unified-runtime#916 * oneapi-src/unified-runtime#1099
[UR][L0] Unify use of large allocation in L0 adapter
Combines the changes of the follow Unified Runtime pull requests: * oneapi-src/unified-runtime#1108 * oneapi-src/unified-runtime#988 * oneapi-src/unified-runtime#1071 * oneapi-src/unified-runtime#916 * oneapi-src/unified-runtime#1099
Intel(R) GPUs have two modes of operation in terms of allocations:
Stateful and stateless mode.
Stateful optimizes memory accesses through pointer arithmetic.
This can be done as long as allocations used by the allocation
are smaller than 4GB.
Stateless disables such pointer-arithmetic optimization to
allow the kernel to use allocations larger than 4GB.
Currently, L0 adapter dynamically and automatically requests
the L0 driver large allocations if it detects an allocation size
is larger than 4GB. This creates a problem if a kernel has been
previously compiled for stateful access. This ultimately means
the adapter mixes stateful and stateless behavior, which is not
a user-friendly experience.
This patch aims at correcting this behavior by defining a default
one. On Intel(R) GPUs previous to Intel(R) Data Center GPU Max,
default behavior is now stateless, meaning all allocations are
only allowed by default. Users can opt-in for stateful mode setting
a new environment variable UR_L0_USE_OPTIMIZED_32BIT_ACCESS=1.
Addresses:
https://stackoverflow.com/questions/75621264/sycl-dot-product-code-gives-wrong-results
intel/llvm testing: intel/llvm#11958