Skip to content

Commit

Permalink
[SYCL] Support sycl::kernel_bundle for multi-device scenario
Browse files Browse the repository at this point in the history
This PR includes:
* Changes in the program manager methods to be able to properly
  create/build UR program for multiple devices. So far, we were mostly
  using only the first device in the vector to create/build UR program
  which made UR program unusable on other devices.

* UR tag update brings the version of urProgramCreateWithBinary which allows
  to create UR program from multiple device binaries.

* Our program cache key allowed only a single device. I have changed it
  to contain a set of devices. If UR program is created and built for a
  set of devices then the same UR program is usable whenver we have any
  subset of this set. That's why if we have a program built for a set of
  devices then add all subsets to the cache. Before we were adding a
  record to the cache for each device from the set which is incorrect.
  For example, if someone requests a UR program for {dev2, dev3} from
  the cache then it is expected that this UR progam must be usable to
  submit a kernel to dev3. But we could get a program for
  {dev1, dev2} from the cache which is unusable on dev3.
  • Loading branch information
againull committed Oct 1, 2024
1 parent a04915e commit 080bd06
Show file tree
Hide file tree
Showing 13 changed files with 537 additions and 194 deletions.
10 changes: 2 additions & 8 deletions sycl/cmake/modules/FetchUnifiedRuntime.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -116,14 +116,8 @@ if(SYCL_UR_USE_FETCH_CONTENT)
CACHE PATH "Path to external '${name}' adapter source dir" FORCE)
endfunction()

set(UNIFIED_RUNTIME_REPO "https://github.com/oneapi-src/unified-runtime.git")
# commit 22962057df1b9d538e08088a7b75d9d8e7c29f90 (HEAD, origin/main, origin/HEAD)
# Merge: e824ddc2 f0a1c433
# Author: aarongreig <aaron.greig@codeplay.com>
# Date: Fri Sep 27 16:54:04 2024 +0100
# Merge pull request #2017 from nrspruit/new_sysman_init
# [L0] Use zesInit for SysMan API usage
set(UNIFIED_RUNTIME_TAG 22962057df1b9d538e08088a7b75d9d8e7c29f90)
set(UNIFIED_RUNTIME_REPO "https://github.com/againull/unified-runtime")
set(UNIFIED_RUNTIME_TAG 584869eb05a0e6e208d1cbc60e156f8d21f8d5f2)

set(UMF_BUILD_EXAMPLES OFF CACHE INTERNAL "EXAMPLES")
# Due to the use of dependentloadflag and no installer for UMF and hwloc we need
Expand Down
3 changes: 2 additions & 1 deletion sycl/source/detail/context_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include <sycl/property_list.hpp>

#include <algorithm>
#include <set>

namespace sycl {
inline namespace _V1 {
Expand Down Expand Up @@ -490,7 +491,7 @@ std::optional<ur_program_handle_t> context_impl::getProgramForDevImgs(
auto &Cache = LockedCache.get().Cache;
ur_device_handle_t &DevHandle = getSyclObjImpl(Device)->getHandleRef();
for (std::uintptr_t ImageIDs : ImgIdentifiers) {
auto OuterKey = std::make_pair(ImageIDs, DevHandle);
auto OuterKey = std::make_pair(ImageIDs, std::set{DevHandle});
size_t NProgs = KeyMap.count(OuterKey);
if (NProgs == 0)
continue;
Expand Down
6 changes: 3 additions & 3 deletions sycl/source/detail/helpers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,8 @@ retrieveKernelBinary(const QueueImplPtr &Queue, const char *KernelName,
auto DeviceImpl = Queue->getDeviceImplPtr();
auto Device = detail::createSyclObjFromImpl<device>(DeviceImpl);
ur_program_handle_t Program =
detail::ProgramManager::getInstance().createURProgram(**DeviceImage,
Context, Device);
detail::ProgramManager::getInstance().createURProgram(
**DeviceImage, Context, {Device});
return {*DeviceImage, Program};
}

Expand Down Expand Up @@ -94,7 +94,7 @@ retrieveKernelBinary(const QueueImplPtr &Queue, const char *KernelName,
DeviceImage = &detail::ProgramManager::getInstance().getDeviceImage(
KernelName, Context, Device);
Program = detail::ProgramManager::getInstance().createURProgram(
*DeviceImage, Context, Device);
*DeviceImage, Context, {Device});
}
return {DeviceImage, Program};
}
Expand Down
8 changes: 5 additions & 3 deletions sycl/source/detail/kernel_program_cache.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include <atomic>
#include <condition_variable>
#include <mutex>
#include <set>
#include <type_traits>

#include <boost/unordered/unordered_flat_map.hpp>
Expand Down Expand Up @@ -118,9 +119,10 @@ class KernelProgramCache {
* when debugging environment variables are set and we can just ignore them
* since all kernels will have their build options overridden with the same
* string*/
using ProgramCacheKeyT =
std::pair<std::pair<SerializedObj, std::uintptr_t>, ur_device_handle_t>;
using CommonProgramKeyT = std::pair<std::uintptr_t, ur_device_handle_t>;
using ProgramCacheKeyT = std::pair<std::pair<SerializedObj, std::uintptr_t>,
std::set<ur_device_handle_t>>;
using CommonProgramKeyT =
std::pair<std::uintptr_t, std::set<ur_device_handle_t>>;

struct ProgramCache {
::boost::unordered_map<ProgramCacheKeyT, ProgramBuildResultPtr> Cache;
Expand Down
Loading

0 comments on commit 080bd06

Please sign in to comment.