[SYCL] Optional kernel features: implement split based on reqd-work-group-size #8056

dm-vodopyanov · 2023-01-19T17:13:16Z

This patch implements device code split based on reqd-work-group-size attribute, enables generation of "reqd_work_group_size" property in "SYCL/device requirements" property set, and adds support of reqd-work-group-size to sycl::is_compatible

Design:
https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OptionalDeviceFeatures.md#changes-to-the-device-code-split-algorithm
E2E tests: intel/llvm-test-suite#1528

This patch make these SYCL CTS tests pass:

kernel_bundle (6 tests now passed)
- has_kernel_bundle_core_reqd_work_group_size_dev_and_k_id
- has_kernel_bundle_core_reqd_work_group_size_dev_and_k_name
- has_kernel_bundle_core_reqd_work_group_size_k_id
- has_kernel_bundle_core_reqd_work_group_size_k_name
- sycl_is_compatible
  - for a kernel with [[sycl::reqd_work_group_size(4294967295)]]
  - for CHECK( sycl::is_compatible(builtinKernelIds, device) == (size_t(device.get_info<sycl::info::device::max_work_item_sizes<1>>()) > 4294967295) )

…-work-group-size This patch implements device code split based on reqd-work-group-size attribute, enables generation of "reqd_work_group_size" property in "SYCL/device requirements" property set, and adds support of reqd-work-group-size to sycl::is_compatible Design: https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OptionalDeviceFeatures.md#changes-to-the-device-code-split-algorithm E2E tests: TBA

Impl: intel/llvm#8056

dm-vodopyanov · 2023-01-19T17:20:13Z

/verify with intel/llvm-test-suite#1528

sycl/source/detail/program_manager/program_manager.cpp

dm-vodopyanov · 2023-01-20T12:29:36Z

llvm/test/tools/sycl-post-link/sycl-esimd-large-grf.ll

-; RUN: FileCheck %s -input-file=%t_0.sym --check-prefixes CHECK-SYCL-SYM
-; RUN: FileCheck %s -input-file=%t_esimd_0.sym --check-prefixes CHECK-ESIMD-SYM
-; RUN: FileCheck %s -input-file=%t_esimd_large_grf_1.sym --check-prefixes CHECK-ESIMD-LargeGRF-SYM
+; RUN: FileCheck %s -input-file=%t_esimd_large_grf_0.ll --check-prefixes CHECK-ESIMD-LargeGRF-IR


It is ok to change this test and test sycl-large-grf.ll below: by adding new optional kernel feature reqd-work-group-size to internal data structures, re-hashing happened which affected name generation logic. This does not affect customers as customers don't use these temp files directly, and didn't break anything in the pipeline.

dm-vodopyanov · 2023-01-20T12:53:56Z

For some reason these tests fail in CI but pass locally. Investigating this.

  SYCL-Unit :: SYCL2020/./SYCL2020Tests/IsCompatible/CPUInvalidReqdWGSize1D
  SYCL-Unit :: SYCL2020/./SYCL2020Tests/IsCompatible/CPUInvalidReqdWGSize2D

Upd: fixed

llvm/test/tools/sycl-post-link/device-code-split/per-reqd-wg-size-split-2.ll

llvm/test/tools/sycl-post-link/device-requirements/reqd-work-group-size.ll

sycl/source/detail/program_manager/program_manager.cpp

AlexeySachkov

sycl-post-link part LGTM

sycl/source/detail/program_manager/program_manager.cpp

dm-vodopyanov · 2023-01-26T14:25:06Z

/verify with intel/llvm-test-suite#1528

steffenlarsen

Runtime changes LGTM!

dm-vodopyanov · 2023-01-26T14:42:45Z

Failed Tests (1):
  SYCL :: dword_atomic_smoke.cpp

Known issue: #8098

dm-vodopyanov · 2023-01-27T11:23:28Z

SYCL :: USM/copy2d.cpp in Jenkins/llvm-test-suite (Windows, Level Zero) is unrelated to this patch and looks flaky, because in the same time the same test passed in intel/llvm-test-suite#1528 in Jenkins/llvm-test-suite
Issue: #8126

…#1528) Impl: intel/llvm#8056

…intel/llvm-test-suite#1528) Impl: intel#8056

) #### Intro This is a refactoring of how we perform device code split in `sycl-post-link`, which is intended to solve several existing issues with the current implementation: 1. increased peak RAM consumption by `sycl-post-link` 2. bad scaling with more and more split "dimensions" being added 3. increased tests maintenance cost due to non-deterministic order (between commits) of output files produced by `sycl-post-link` #### A bit more context about the issues above: (1) Increase peak RAM consumption is caused by the fact that we currently preserve **all** splits in-memory, even though we can process them on-by-one and discard them as soon as we stored them to a disk. This was implemented as a memory consumption optimization in #5021, but it got accidentally reverted in #7302 as an attempt to workaround (2). (2) is pretty much summarized in our source code: https://github.com/intel/llvm/blob/afebb2543ccecb89f83c84b68fba7616bbab89ac/llvm/tools/sycl-post-link/sycl-post-link.cpp#L806-L811 (3) is caused by a bad implementation decision made in #7302: because every split is now identified by a hash, every time you add a new split "dimension"/new feature to an account, it results in different hashes for existing tests. Just look how many unrelated tests had to be updated in #7512, #8056 and #8167 #### Now to the PR itself: It introduces a new infrastructure for categorizing/grouping kernel functions: instead of using hashes, we now build a string description for each kernel function and then group kernels with the same description string together. String description is built by a new entity: it accepts a set of rules, where each rule is a simple function which returns a string for passed `llvm::Function`. Results of all rules are concatenated together and rules are invoked in a stable order of their registration. There is a simple API for building those rules. It provides some predefined rules for the most popular use cases like turning a function attribute or a metadata into a string descriptor for the function. There is also a possibility to pass a custom callback there to implement more complicated logic. #### How does this PR help with issues above? (1) and (2) are fixed in conjunction: `sycl-post-link` was refactored to avoid storing more than one split module at a time and that is possible because the PR unifies per-scope and optional-kernel-features splitters into a single generic splitter. The new API for kernels categorization seems to be flexible enough to provide that infrastructure so merged splitters still look OK code-wise. (3) is caused by using string identifiers instead of hashes as well as by using a data structure which sorts identifiers. #### Any other benefits from this PR? About 50 lines of code less to support :) Extending device code split for more optional features would be even easier than it is now: instead of adding several changes to various places around `UsedOptionalFeatures` structure, it will be just adding a 1-3 lines of code. Please also note that `UsedOptionalFeatures` contains tons of inconsistencies in its implementation, which will all gone with this PR: in `operator==` we don't use hash and instead compare certain fields directly (and we do miss some of them); `generateModuleName` method skips some of optional features and ignores them. Cross-module `device_global` usages checks should now work at all split dimensions (except for ESIMD). #### Any potential downsides? With current `UsedOptionalFeatures` there is a possibility to embed various information (used aspects, `large-grf` flag, etc.) directly during device code split to avoid re-gathering that information later when we generate properties. With the suggested approach, it would be harder to do, because it doesn't seem to naturally fit to the proposed infrastructure: see changes I did around `large-grf` in this PR. However, we have never actually implemented this and re-querying some metadata from function doesn't seem like a bottleneck, so it should really be a very minor and only theoretical downside.

dm-vodopyanov requested review from a team as code owners January 19, 2023 17:13

dm-vodopyanov requested a review from steffenlarsen January 19, 2023 17:13

dm-vodopyanov marked this pull request as draft January 19, 2023 17:13

dm-vodopyanov added a commit to dm-vodopyanov/llvm-test-suite that referenced this pull request Jan 19, 2023

[SYCL] Add test for sycl::is_compatible supports reqd_work_group_size

08aa9d2

Impl: intel/llvm#8056

dm-vodopyanov mentioned this pull request Jan 19, 2023

[SYCL] Add test for sycl::is_compatible supports reqd_work_group_size intel/llvm-test-suite#1528

Merged

dm-vodopyanov requested a review from a team January 19, 2023 17:31

dm-vodopyanov temporarily deployed to aws January 19, 2023 18:37 — with GitHub Actions Inactive

AlexeySachkov reviewed Jan 20, 2023

View reviewed changes

Apply CR comments, fix some bugs

96d5ed6

dm-vodopyanov commented Jan 20, 2023

View reviewed changes

dm-vodopyanov temporarily deployed to aws January 20, 2023 12:35 — with GitHub Actions Inactive

Debug unit tests

b91018d

dm-vodopyanov temporarily deployed to aws January 20, 2023 13:54 — with GitHub Actions Inactive

Debug unit tests

4e77396

dm-vodopyanov temporarily deployed to aws January 20, 2023 14:13 — with GitHub Actions Inactive

Fix unit tests, remove debug lines

f3aa423

dm-vodopyanov temporarily deployed to aws January 20, 2023 17:57 — with GitHub Actions Inactive

Add sycl-post-link test for device requirements prop

7354e98

dm-vodopyanov temporarily deployed to aws January 20, 2023 19:05 — with GitHub Actions Inactive

dm-vodopyanov temporarily deployed to aws January 20, 2023 19:52 — with GitHub Actions Inactive

Add tests for splitting

6bae850

dm-vodopyanov temporarily deployed to aws January 23, 2023 16:23 — with GitHub Actions Inactive

dm-vodopyanov marked this pull request as ready for review January 23, 2023 16:23

dm-vodopyanov requested a review from AlexeySachkov January 23, 2023 16:23

dm-vodopyanov changed the title ~~[SYCL][Draft] Optional kernel features: implement split based on reqd-work-group-size~~ [SYCL] Optional kernel features: implement split based on reqd-work-group-size Jan 23, 2023

dm-vodopyanov temporarily deployed to aws January 23, 2023 16:56 — with GitHub Actions Inactive

AlexeySachkov reviewed Jan 24, 2023

View reviewed changes

llvm/test/tools/sycl-post-link/device-code-split/per-reqd-wg-size-split-2.ll Outdated Show resolved Hide resolved

llvm/test/tools/sycl-post-link/device-requirements/reqd-work-group-size.ll Outdated Show resolved Hide resolved

Apply CR comments

662172a

dm-vodopyanov requested a review from AlexeySachkov January 24, 2023 12:33

steffenlarsen reviewed Jan 24, 2023

View reviewed changes

sycl/source/detail/program_manager/program_manager.cpp Show resolved Hide resolved

Small clean code

1291f63

AlexeySachkov approved these changes Jan 24, 2023

View reviewed changes

dm-vodopyanov temporarily deployed to aws January 24, 2023 13:38 — with GitHub Actions Inactive

Fix a test

0e72663

dm-vodopyanov temporarily deployed to aws January 24, 2023 15:04 — with GitHub Actions Inactive

Apply CR comments

a2b50fa

dm-vodopyanov commented Jan 26, 2023

View reviewed changes

sycl/source/detail/program_manager/program_manager.cpp Show resolved Hide resolved

dm-vodopyanov requested a review from steffenlarsen January 26, 2023 12:44

steffenlarsen reviewed Jan 26, 2023

View reviewed changes

sycl/source/detail/program_manager/program_manager.cpp Show resolved Hide resolved

sycl/source/detail/program_manager/program_manager.cpp Outdated Show resolved Hide resolved

dm-vodopyanov temporarily deployed to aws January 26, 2023 13:08 — with GitHub Actions Inactive

dm-vodopyanov temporarily deployed to aws January 26, 2023 13:39 — with GitHub Actions Inactive

Apply CR comments

8147ca4

dm-vodopyanov requested a review from steffenlarsen January 26, 2023 14:08

steffenlarsen approved these changes Jan 26, 2023

View reviewed changes

dm-vodopyanov temporarily deployed to aws January 26, 2023 14:32 — with GitHub Actions Inactive

dm-vodopyanov temporarily deployed to aws January 26, 2023 15:04 — with GitHub Actions Inactive

dm-vodopyanov mentioned this pull request Jan 27, 2023

[SYCL][Flaky] SYCL :: USM/copy2d.cpp failed on Level Zero Windows on unrelated changes #8126

Open

dm-vodopyanov merged commit 6464785 into intel:sycl Jan 27, 2023

dm-vodopyanov added a commit to intel/llvm-test-suite that referenced this pull request Jan 27, 2023

[SYCL] Add test for sycl::is_compatible supports reqd_work_group_size (…

4c8ccf4

…#1528) Impl: intel/llvm#8056

aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Mar 27, 2023

[SYCL] Add test for sycl::is_compatible supports reqd_work_group_size (…

59bd07a

…intel/llvm-test-suite#1528) Impl: intel#8056

AlexeySachkov mentioned this pull request Mar 28, 2023

[SYCL][NFCI] Refactor device code split implementation once again #8833

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Optional kernel features: implement split based on reqd-work-group-size #8056

[SYCL] Optional kernel features: implement split based on reqd-work-group-size #8056

dm-vodopyanov commented Jan 19, 2023 •

edited

Loading

dm-vodopyanov commented Jan 19, 2023

dm-vodopyanov Jan 20, 2023

dm-vodopyanov commented Jan 20, 2023 •

edited

Loading

AlexeySachkov left a comment

dm-vodopyanov commented Jan 26, 2023

steffenlarsen left a comment

dm-vodopyanov commented Jan 26, 2023

dm-vodopyanov commented Jan 27, 2023

[SYCL] Optional kernel features: implement split based on reqd-work-group-size #8056

[SYCL] Optional kernel features: implement split based on reqd-work-group-size #8056

Conversation

dm-vodopyanov commented Jan 19, 2023 • edited Loading

dm-vodopyanov commented Jan 19, 2023

dm-vodopyanov Jan 20, 2023

Choose a reason for hiding this comment

dm-vodopyanov commented Jan 20, 2023 • edited Loading

AlexeySachkov left a comment

Choose a reason for hiding this comment

dm-vodopyanov commented Jan 26, 2023

steffenlarsen left a comment

Choose a reason for hiding this comment

dm-vodopyanov commented Jan 26, 2023

dm-vodopyanov commented Jan 27, 2023

dm-vodopyanov commented Jan 19, 2023 •

edited

Loading

dm-vodopyanov commented Jan 20, 2023 •

edited

Loading