Skip to content

Commit

Permalink
[SYCL] Support sycl::kernel_bundle for multi-device scenario
Browse files Browse the repository at this point in the history
This PR includes:
* Changes in the program manager methods to be able to properly
  create/build UR program for multiple devices. So far, we were mostly
  using only the first device in the vector to create/build UR program
  which made UR program unusable on other devices.

* For SPIRV case we already have all necessary UR fuctions for multi-device case:
  urProgramBuildExp, urProgramLinkExp. For AOT case we need to add new
  function urProgramCreateWithBinaryExp which allows to create UR
  program from multiple device binaries. Hence the UR tag update.

* Our program cache key allowed only a single device. I have changed it
  to contain a set of devices. If UR program is created and built for a
  set of devices then the same UR program is usable whenver we have any
  subset of this set. That's why if we have a program built for a set of
  devices then add all subsets to the cache. Before we were adding a
  record to the cache for each device from the set which is incorrect.
  For example, if someone requests a UR program for {dev2, dev3} from
  the cache then it is expected that this UR progam must be usable to
  submit a kernel to dev3. But we could get a program for
  {dev1, dev2} from the cache which is unusable on dev3.
  • Loading branch information
againull committed Sep 30, 2024
1 parent 8fc9aa5 commit 3cfe9b7
Show file tree
Hide file tree
Showing 15 changed files with 546 additions and 201 deletions.
10 changes: 2 additions & 8 deletions sycl/cmake/modules/FetchUnifiedRuntime.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -116,14 +116,8 @@ if(SYCL_UR_USE_FETCH_CONTENT)
CACHE PATH "Path to external '${name}' adapter source dir" FORCE)
endfunction()

set(UNIFIED_RUNTIME_REPO "https://github.com/oneapi-src/unified-runtime.git")
# commit ce0325da13048af40acd61bd43ef88bafd84c6b3
# Merge: 2ad32681 668c5e5d
# Author: Piotr Balcer <piotr.balcer@intel.com>
# Date: Thu Sep 26 10:51:57 2024 +0200
# Merge pull request #2134 from nrspruit/fix_multi_device_event_driver_in_order_syclos
# [L0] Fix Multi Device Event handling and remove unhandled events from in order wait list
set(UNIFIED_RUNTIME_TAG ce0325da13048af40acd61bd43ef88bafd84c6b3)
set(UNIFIED_RUNTIME_REPO "https://github.com/againull/unified-runtime")
set(UNIFIED_RUNTIME_TAG 2b1cf6ee0bb43e0a94359eda5e163445d691b42a)

set(UMF_BUILD_EXAMPLES OFF CACHE INTERNAL "EXAMPLES")
# Due to the use of dependentloadflag and no installer for UMF and hwloc we need
Expand Down
3 changes: 2 additions & 1 deletion sycl/source/detail/context_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include <sycl/property_list.hpp>

#include <algorithm>
#include <set>

namespace sycl {
inline namespace _V1 {
Expand Down Expand Up @@ -490,7 +491,7 @@ std::optional<ur_program_handle_t> context_impl::getProgramForDevImgs(
auto &Cache = LockedCache.get().Cache;
ur_device_handle_t &DevHandle = getSyclObjImpl(Device)->getHandleRef();
for (std::uintptr_t ImageIDs : ImgIdentifiers) {
auto OuterKey = std::make_pair(ImageIDs, DevHandle);
auto OuterKey = std::make_pair(ImageIDs, std::set{DevHandle});
size_t NProgs = KeyMap.count(OuterKey);
if (NProgs == 0)
continue;
Expand Down
6 changes: 3 additions & 3 deletions sycl/source/detail/helpers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,8 @@ retrieveKernelBinary(const QueueImplPtr &Queue, const char *KernelName,
auto DeviceImpl = Queue->getDeviceImplPtr();
auto Device = detail::createSyclObjFromImpl<device>(DeviceImpl);
ur_program_handle_t Program =
detail::ProgramManager::getInstance().createURProgram(**DeviceImage,
Context, Device);
detail::ProgramManager::getInstance().createURProgram(
**DeviceImage, Context, {Device});
return {*DeviceImage, Program};
}

Expand Down Expand Up @@ -94,7 +94,7 @@ retrieveKernelBinary(const QueueImplPtr &Queue, const char *KernelName,
DeviceImage = &detail::ProgramManager::getInstance().getDeviceImage(
KernelName, Context, Device);
Program = detail::ProgramManager::getInstance().createURProgram(
*DeviceImage, Context, Device);
*DeviceImage, Context, {Device});
}
return {DeviceImage, Program};
}
Expand Down
8 changes: 5 additions & 3 deletions sycl/source/detail/kernel_program_cache.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include <atomic>
#include <condition_variable>
#include <mutex>
#include <set>
#include <type_traits>

#include <boost/unordered/unordered_flat_map.hpp>
Expand Down Expand Up @@ -118,9 +119,10 @@ class KernelProgramCache {
* when debugging environment variables are set and we can just ignore them
* since all kernels will have their build options overridden with the same
* string*/
using ProgramCacheKeyT =
std::pair<std::pair<SerializedObj, std::uintptr_t>, ur_device_handle_t>;
using CommonProgramKeyT = std::pair<std::uintptr_t, ur_device_handle_t>;
using ProgramCacheKeyT = std::pair<std::pair<SerializedObj, std::uintptr_t>,
std::set<ur_device_handle_t>>;
using CommonProgramKeyT =
std::pair<std::uintptr_t, std::set<ur_device_handle_t>>;

struct ProgramCache {
::boost::unordered_map<ProgramCacheKeyT, ProgramBuildResultPtr> Cache;
Expand Down
Loading

0 comments on commit 3cfe9b7

Please sign in to comment.