-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][CUDA] Initial CUDA backend support #1091
Conversation
clang/lib/Driver/ToolChains/Cuda.cpp
Outdated
@@ -61,7 +61,7 @@ static CudaVersion ParseCudaVersionFile(llvm::StringRef V) { | |||
return CudaVersion::CUDA_92; | |||
if (Major == 10 && Minor == 0) | |||
return CudaVersion::CUDA_100; | |||
if (Major == 10 && Minor == 1) | |||
if (Major == 10 && (Minor == 1 || Minor == 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please put some comment with explanations why it is okay to map 10.2
to CUDA_101
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ToT Clang lags behind on released CUDA versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are hoping upstream clang catches up with 10.2 support, this change allows to use it on the meantime. Their release notes and our testing shows 10.2 can work as if 10.1 and there are no regressions. Otherwise, the default installation of many people simply fails to compile any SYCL for CUDA application.
if (Major == 10 && (Minor == 1 || Minor == 2)) | |
// Enable CUDA 10.2 toolkit acting as 10.1 until tip clang catches up | |
if (Major == 10 && (Minor == 1 || Minor == 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, upstream llvm patch llvm/llvm-project@12fefee (https://reviews.llvm.org/D73231) is a better fix, so we won't need once this lands into the intel/llvm fork
if (!getenv("DISABLE_INFER_AS")) | ||
Builder.defineMacro("__SYCL_ENABLE_INFER_AS__", "1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In presence of #893, this change is not needed anymore
llvm/CMakeLists.txt
Outdated
@@ -462,6 +462,10 @@ if( LLVM_USE_PERF ) | |||
endif( NOT CMAKE_SYSTEM_NAME MATCHES "Linux" ) | |||
endif( LLVM_USE_PERF ) | |||
|
|||
option(SYCL_BUILD_PI_CUDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this should be in llvm/sycl/CMakeLists.txt
. Is there any reason it cannot be put in there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ifdef in/out SYCL NVPTX support in the clang Driver based on this variable, which is why we put it in llvm/CMakeLists.txt
rather than in llvm/sycl/CMakeLists.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the option is confusing in this case. PI has nothing to do with the driver.
Why do we need ifdef out SYCL NVPTX support in the clang driver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to, but it helps prevent users accidentally compiling kernels to the wrong target for the plugins they have available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to compile kernels for PTX w/o CUDA toolchain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I so, we could use this for testing SYCL for CUDA compiler using cross-compilation on Intel HW.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe clang will fail to produce PTX kernels without a CUDA toolchain installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could enable support for the SYCL cuda toolchain if the NVPTX target is added, so it has the same requirements as the CUDA support in llvm? I suggest we do that on a separate patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, the code added under ifdef is not covered by CI system, so it might be broken.
It seems to me that in order to enable compilation for NVPTX target we need just a few changes. Am I wrong?
sycl/source/CMakeLists.txt
Outdated
@@ -4,6 +4,7 @@ | |||
#cmake_policy(SET CMP0057 NEW) | |||
#include(AddLLVM) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sycl/source/cg.cpp
Outdated
} | ||
|
||
} // sycl | ||
} // sycl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} // sycl | |
} // cl |
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
namespace cl { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace cl { | |
#include <CL/sycl/detail/defines.hpp> | |
__SYCL_INLINE namespace cl { |
should fix the build error
[ 2%] Building CXX object tools/sycl/plugins/cuda/CMakeFiles/pi_cuda.dir/pi_cuda.cpp.o
In file included from /data/user/fwyzard/sycl/llvm/sycl/include/CL/sycl/detail/pi.hpp:13,
from /data/user/fwyzard/sycl/llvm/sycl/plugins/cuda/pi_cuda.cpp:10:
/data/user/fwyzard/sycl/llvm/sycl/include/CL/sycl/detail/common.hpp:25:25: error: inline namespace must be specified at initial definition
__SYCL_INLINE namespace cl {
^~
In file included from /data/user/fwyzard/sycl/llvm/sycl/plugins/cuda/pi_cuda.cpp:9:
/data/user/fwyzard/sycl/llvm/sycl/include/CL/sycl/backend/cuda.hpp:9:11: note: 'cl' defined here
namespace cl {
^~
0c22091
to
31fd1bb
Compare
@@ -0,0 +1,479 @@ | |||
//===-- pi_cuda.hpp - CUDA Plugin -----------------------------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this file to plugins/cuda/ ? It is the better place for it.
sycl/source/context.cpp
Outdated
@@ -23,25 +23,31 @@ | |||
|
|||
__SYCL_INLINE namespace cl { | |||
namespace sycl { | |||
context::context(const async_handler &AsyncHandler) | |||
: context(default_selector().select_device(), AsyncHandler) {} | |||
context::context(const async_handler &AsyncHandler, bool usePrimaryContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
context::context(const async_handler &AsyncHandler, bool usePrimaryContext) | |
context::context(const async_handler &AsyncHandler, bool UsePrimaryContext) |
Same applies to the rest of the file.
@@ -190,7 +193,6 @@ add_subdirectory( source ) | |||
# SYCL toolchain builds all components: compiler, libraries, headers, etc. | |||
add_custom_target( sycl-toolchain | |||
DEPENDS ${SYCL_RT_LIBS} | |||
pi_opencl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removing OpenCL support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pi_opencl dependency is added to sycl-toolchain in the opencl plugin cmake sycl/plugins/opencl/CMakeLists.txt
in 884459b#diff-edef34e019c51a57812a69f7a111afdcR22
sycl/include/CL/sycl/detail/cg.hpp
Outdated
std::function<void(cl::sycl::interop_handler)> MFunc; | ||
|
||
public: | ||
InteropTask(std::function<void(cl::sycl::interop_handler)> Func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InteropTask(std::function<void(cl::sycl::interop_handler)> Func) | |
InteropTask(function_class<void(cl::sycl::interop_handler)> Func) |
@@ -48,7 +48,7 @@ class queue_impl { | |||
QueueOrder Order, const property_list &PropList) | |||
: queue_impl(Device, | |||
detail::getSyclObjImpl( | |||
context(createSyclObjFromImpl<device>(Device))), | |||
context(createSyclObjFromImpl<device>(Device), {}, true)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: {}
and true
seem to me some "magic" values. Can you please add comments here?
@@ -77,7 +77,7 @@ class Scheduler { | |||
// releaseHostAccessor is called. | |||
// Returns an event which indicates when these nodes are completed and host | |||
// accessor is ready for using. | |||
EventImplPtr addHostAccessor(Requirement *Req); | |||
EventImplPtr addHostAccessor(Requirement *Req, const bool destructor = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
EventImplPtr addHostAccessor(Requirement *Req, const bool destructor = false); | |
EventImplPtr addHostAccessor(Requirement *Req, const bool Destructor = false); |
sycl/include/CL/sycl/handler.hpp
Outdated
@@ -771,6 +771,15 @@ class handler { | |||
#endif | |||
} | |||
|
|||
// Similar to single_task, but passed lambda will be executed on host |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can you please make doxygen comment out of this?
sycl/source/CMakeLists.txt
Outdated
@@ -80,6 +81,7 @@ set(SYCL_SOURCES | |||
"detail/usm/usm_dispatch.cpp" | |||
"detail/usm/usm_impl.cpp" | |||
"detail/util.cpp" | |||
"cg.cpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The corresponding header file resides in CL/sycl/detail/
. Have you considered putting cg.cpp
to source/detail
?
sycl/source/detail/accessor_impl.cpp
Outdated
@@ -14,8 +14,9 @@ namespace sycl { | |||
namespace detail { | |||
|
|||
AccessorImplHost::~AccessorImplHost() { | |||
if (MBlockedCmd) | |||
if (MBlockedCmd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, it's common practice in LLVM not to wrap single line statement in curly brackets.
sycl/include/CL/sycl/context.hpp
Outdated
/// @param useCUDAPrimaryContext is a bool determining whether to use the | ||
/// primary context in the CUDA backend. | ||
explicit context(const async_handler &AsyncHandler = {}, | ||
bool useCUDAPrimaryContext = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
bool useCUDAPrimaryContext = false); | |
bool UseCUDAPrimaryContext = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please also split this PR into several commits? Even if you created just two commits (with libclc
changes and the rest), it would be easier to review
Thanks for the feedback, we are working in fixing some lit-testing regressions and rebasing on top of the latest changes. |
@@ -80,6 +80,9 @@ | |||
#cmakedefine01 CLANG_ENABLE_OBJC_REWRITER | |||
#cmakedefine01 CLANG_ENABLE_STATIC_ANALYZER | |||
|
|||
/* Define if we have SYCL PI CUDA support */ | |||
#cmakedefine SYCL_HAVE_PI_CUDA ${SYCL_HAVE_PI_CUDA} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#cmakedefine SYCL_HAVE_PI_CUDA ${SYCL_HAVE_PI_CUDA} | |
#cmakedefine01 SYCL_HAVE_PI_CUDA |
According to the docs it should do the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this define? Can we have "SYCL PI CUDA support" unconditionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to have PI CUDA support unconditionally the cuda toolchain will always be required for compilation. We decided to make it optional to allow people who only use the OpenCL plugin to compile the project without a cuda toolchain on their system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were to have PI CUDA support unconditionally the cuda toolchain will always be required for compilation. We decided to make it optional to allow people who only use the OpenCL plugin to compile the project without a cuda toolchain on their system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to my understanding we need CUDA toolchain to build CUDA plugin only.
Could you clarify why we should require CUDA toolchain to build the driver?
https://llvm.org/docs/CompileCudaWithLLVM.html - doesn't seem to require some custom driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, we only need CUDA toolchain for building of the plugin, but we limit the valid SYCL triples in the clang driver based if PI CUDA support is available or not.
https://github.com/intel/llvm/pull/1091/files#diff-beaf25b0cdf8830dd4ea165404b00671R618
static bool isValidSYCLTriple(llvm::Triple T) {
#ifdef SYCL_HAVE_PI_CUDA
// NVPTX is valid for SYCL.
if (T.isNVPTX())
return true;
#endif
// Check for invalid SYCL device triple values.
// Non-SPIR arch.
if (!T.isSPIR())
return false;
// SPIR arch, but has invalid SubArch for AOT.
StringRef A(T.getArchName());
if (T.getSubArch() == llvm::Triple::NoSubArch &&
((T.getArch() == llvm::Triple::spir && !A.equals("spir")) ||
(T.getArch() == llvm::Triple::spir64 && !A.equals("spir64"))))
return false;
return true;
}
We can remove this limitation though and always allow nvptx triples for compilation, regardless of it the CUDA plugin is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for removing.
// The translator tries to convert these function to the same name as the | ||
// SPIR-V built-in for | ||
// another function but with a different signature which results in an | ||
// attempt to re-define a | ||
// function. | ||
"read_imagei", // conflicts with `read_imagef` | ||
"read_imageui", // conflicts with `read_imagef` | ||
"read_imageh", // conflicts with `read_imagef` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, this should be fixed by KhronosGroup/SPIRV-LLVM-Translator#408
@@ -288,11 +288,31 @@ class OCL20ToSPIRV : public ModulePass, public InstVisitor<OCL20ToSPIRV> { | |||
Module *M; | |||
LLVMContext *Ctx; | |||
unsigned CLVer; /// OpenCL version as major*10+minor | |||
unsigned CLLang; /// OpenCL language, see `spv::SourceLanguage`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that this repo contains a copy of KrhonosGroup/SPIRV-LLVM-Translator which is updated from time to time and usually, by simple copy-pasting the whole directory. So, there is a huge chance that these changes will be overwritten occasionally
Please at least extract these changes to a separate commit, so it can be re-applied at each sync, or (which is much better) commit them directly to KhronosGroup/SPIRV-LLVM-Translator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will try to commit these directly to the Khronos repository, but will need the changes here until it is merged on the Khronos repository and pulled back down for the CUDA backend to work.
For now the changes to llvm-spirv are separated into 09b3b2e
ee44fa9
to
5f5fa8c
Compare
The codeplaysoftware:cuda-dev branch fails to build if configured with
The intel:sycl branch does build fine. |
buildbot/configure.py
Outdated
"-DCMAKE_BUILD_TYPE={}".format(args.build_type), | ||
"-DLLVM_ENABLE_ASSERTIONS={}".format(llvm_enable_assertions), | ||
"-DLLVM_TARGETS_TO_BUILD={}".format(llvm_targets_to_build), | ||
"-DLLVM_EXTERNAL_PROJECTS=sycl;llvm-spirv", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opencl-aot project is missed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed for the overall project as well or just for the testing? Its not mentioned on the Getting Started Guide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ruyk opencl-aot is an optional tool to enable AOT compilation for SYCL. It is not a requirement for end users, but it is being tested in CI.
buildbot/configure.py
Outdated
icd_loader_lib = '' | ||
icd_loader_lib = os.path.join(args.obj_dir, "OpenCL-ICD-Loader", "build") | ||
llvm_targets_to_build = 'X86' | ||
llvm_enable_projects = 'clang;llvm-spirv;sycl' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opencl-aot project is missed
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#define __SPIRV_FUNCTION __spirv_ocl_fmod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see definition of __spirv_ocl_fmod
function. Is it a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may not be a definition for __spirv_ocl_fmod
yet. We haven't finished implementing all of the builtins yet, but have some internal patches with additional builtins in progress to raise later.
5f5fa8c
to
dedaaec
Compare
We are currently working on fixing the default device selection. The current implementation does not take into account the multiple backends, and causes several lit tests to fail. We should have a fix soon. |
ea6e850
to
683faf0
Compare
We've pushed some more fixes for the CUDA backend but are still finding issues relating to the new plugin interface introduced in #1030. We've found that the new plugin interface is unstable when multiple plugins are available. Though we've fixed some issues related to mismatching devices/platforms/plugins when doing info queries and device selection, we are still finding device selection and some of the plugin internals unstable. As a result we are still seeing some test failures. The most notable is basic_tests/queue.cpp, which visibly fails for us when trying to use the OpenCL or CUDA plugins. We also found a number of tests failing silently by catching exceptions from the sycl runtime, but not erroring after they occur. We've added a new target, check-sycl-cuda, for specifically attempting to test with CUDA devices by passing SYCL_BE=PI_CUDA. We've also added a matching flag to check-sycl to ensure a path to test OpenCL devices. Without SYCL_BE it is nondeterministic which device is chosen by the default_selector used by many tests when multiple plugins are available. We have also been testing without SYCL_BUILD_PI_CUDA enabled or CUDA available on the system with success. For now we can either
To enable building/testing of the CUDA backend you will need to add SYCL_BUILD_PI_CUDA=ON to the buildbot CMake invocation and check-sycl-cuda as a test target to the buildbots. |
That sounds weird. Isn't the
I'd prefer it this way, and work further to stabilize the plugins mechanism (tagging @garimagu) |
44af06f
to
3cf8625
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's test for regressions.
Remove approve to avoid unintentional merge.
Signed-off-by: Alexander Johnston <alexander@codeplay.com>
Synchronise the CUDA backend with the general SYCL changes from #1121. Signed-off-by: Andrea Bocci <andrea.bocci@cern.ch>
Signed-off-by: Alexander Johnston <alexander@codeplay.com>
To ensure that the check-sycl targets test OpenCL devices, pass SYCL_BE=PI_OPENCL. This mirrors the check-sycl-cuda target which passes SYCL_BE=PI_CUDA. Without this it is nondeterministic which device is tested by check-sycl. Signed-off-by: Alexander Johnston <alexander@codeplay.com>
Removes PI_CUDA specific code paths and tests from clang, opting to always enable them. Signed-off-by: Alexander Johnston <alexander@codeplay.com>
Signed-off-by: Alexander Johnston <alexander@codeplay.com>
Fix platform string comparison for CUDA platform detection. Fix device info platform query so that it uses the device's plugin, rather than the GlobalPlugin. Signed-off-by: Alexander Johnston <alexander@codeplay.com>
Signed-off-by: Alexander Johnston <alexander@codeplay.com>
b7ed055
to
cdab838
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unexpected Passing Tests (3):
SYCL :: basic_tests/image.cpp
SYCL :: basic_tests/stream/stream.cpp
SYCL :: hier_par/hier_par_basic.cpp
Let's remove XFAIL
from these tests.
Are these failing on NV HW only?
Fix minor test and build configuration issues introduced in the development of the CUDA backend. Signed-off-by: Alexander Johnston <alexander@codeplay.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Start testing.
@Alexander-Johnston, I'm going to do more thorough code review post-commit. This PR is too big to review in one shot. |
…ages_docs * origin/sycl: (1092 commits) [CI] Add clang-format checker to pre-commit checks (intel#1163) [SYCL][CUDA] Initial CUDA backend support (intel#1091) [USM] Align OpenCL USM extension header with the specification (intel#1162) [SYCL][NFC] Fix unreferenced variable warning (intel#1158) [SYCL] Fix __spirv_GroupBroadcast overloads (intel#1152) [SYCL] Add llvm/Demangle link dependency for llvm-no-spir-kernel (intel#1156) [SYCL] LowerWGScope pass should not be skipped when -O0 is used [SYCL][Doc][USM] Add refactored pointer and device queries to USM spec (intel#1118) [SYCL] Update the kernel parameter rule to is-trivially-copy-construc… (intel#1144) [SYCL] Move internal headers to source dir (intel#1136) [SYCL] Forbid declaration of non-const static variables inside kernels (intel#1141) [SYCL][NFC] Remove idle space (intel#1148) [SYCL] Improve the error mechanism of llvm-no-spir-kernel (intel#1068) [SYCL] Added CTS test config (intel#1063) [SYCL] Implement check-sycl-deploy target (intel#1142) [SYCL] Preserve original message and code of kernel/program build result (intel#1108) [SYCL] Fix LIT after LLVM change in community Translate LLVM's cmpxchg instruction to SPIR-V Add volatile qualifier for atom_ builtins Fix -Wunused-variable warnings ...
cl_mem MemArg = (cl_mem)AllocaCmd->getMemAllocation(); | ||
Plugin.call<PiApiKind::piKernelSetArg>(Kernel, Arg.MIndex, | ||
sizeof(cl_mem), &MemArg); | ||
Plugin.call<PiApiKind::piKernelSetArg>(Kernel, Arg.MIndex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bader @Alexander-Johnston Was this extra API added by mistake?
Is there a reason for adding it twice?
FYI @rbegam .
MKernelProgramCache.setContextPtr(this); | ||
} | ||
|
||
context_impl::context_impl(const vector_class<cl::sycl::device> Devices, | ||
async_handler AsyncHandler) | ||
async_handler AsyncHandler, bool UseCUDAPrimaryContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this variable? It is unused.
The flag -nostdlib added in "cf4d4e366a2 libclc: Compile with -nostdlib" was moved to the file AddLibclc.cmake, which was added in: "7a9a4251f57 [SYCL][CUDA] Initial CUDA backend support (intel#1091)" CONFLICT (content): Merge conflict in libclc/CMakeLists.txt
PR #1091 has refactor the logic of prepare_builtins into utils/CMakeLists.txt. We don't need to duplicate logic in CMakeLists.txt now. Before we upstream the utils/CMakeLists.txt, we should just remove the duplicate code.
Initial support of CUDA backend bringing roughly 40% of SYCL 1.2.1 CTS conformance on devices that support CUDA 10.2.