Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Optional kernel features: implement split based on reqd-sub-group-size #8167

Conversation

dm-vodopyanov
Copy link
Contributor

@dm-vodopyanov dm-vodopyanov commented Jan 31, 2023

This patch implements device code split based on reqd-sub-group-size attribute, enables generation of "reqd_sub_group_size" property in "SYCL/device requirements" property set, and adds support of reqd_sub_group_size to sycl::is_compatible

Design:
https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OptionalDeviceFeatures.md#changes-to-the-device-code-split-algorithm
E2E tests: intel/llvm-test-suite#1569

…oup-size

This patch implements device code split based on reqd-sub-group-size
attribute, enables generation of "reqd_sub_group_size" property in
"SYCL/device requirements" property set, and adds support of
reqd_sub_group_size to sycl::is_compatible

Design:
https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OptionalDeviceFeatures.md#changes-to-the-device-code-split-algorithm
E2E tests: TBA
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
; RUN: FileCheck %s -input-file=%t_2.ll --check-prefixes CHECK-M2-IR \
; RUN: --implicit-check-not kernel0 --implicit-check-not kernel1
; RUN: FileCheck %s -input-file=%t_1.ll --check-prefixes CHECK-M2-IR \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ok to change this test, per-reqd-wg-size-split-2.ll, sycl-esimd-large-grf.ll and sycl-large-grf.ll below: by adding new optional kernel feature reqd-sub-group-size to internal data structures, re-hashing happened which affected name generation logic. This does not affect customers as customers don't use these temp files directly, and didn't break anything in the pipeline.

@dm-vodopyanov dm-vodopyanov requested a review from a team January 31, 2023 22:15
@dm-vodopyanov dm-vodopyanov temporarily deployed to aws January 31, 2023 22:16 — with GitHub Actions Inactive
sycl/source/detail/program_manager/program_manager.cpp Outdated Show resolved Hide resolved
sycl/source/detail/program_manager/program_manager.cpp Outdated Show resolved Hide resolved
sycl/source/detail/program_manager/program_manager.cpp Outdated Show resolved Hide resolved
Comment on lines -72 to -78
} // namespace detail
} // __SYCL_INLINE_VER_NAMESPACE(_V1)
} // namespace sycl

namespace sycl {
__SYCL_INLINE_VER_NAMESPACE(_V1) {
namespace detail {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Normally, I'd prefer those simplifications to be a separate PR.

sycl/unittests/helpers/PiImage.hpp Outdated Show resolved Hide resolved
@dm-vodopyanov
Copy link
Contributor Author

@v-klochkov no-sycl-esimd-split-shared-func.ll test started to fail because this patch adds device code splitting for reqd_sub_group_size. Can you please advise, how to update this test? Or probably it doesn't make sense now?

FAIL: LLVM :: tools/sycl-post-link/sycl-esimd/no-sycl-esimd-split-shared-func.ll (44963 of 45993)
******************** TEST 'LLVM :: tools/sycl-post-link/sycl-esimd/no-sycl-esimd-split-shared-func.ll' FAILED ********************
Script:
--
: 'RUN: at line 8';   sycl-post-link -lower-esimd -symbols -split=auto -S /__w/llvm/llvm/src/llvm/test/tools/sycl-post-link/sycl-esimd/no-sycl-esimd-split-shared-func.ll -o /__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp.table
: 'RUN: at line 9';   /__w/llvm/llvm/build/bin/FileCheck /__w/llvm/llvm/src/llvm/test/tools/sycl-post-link/sycl-esimd/no-sycl-esimd-split-shared-func.ll -input-file=/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp.table --check-prefixes CHECK-TABLE
: 'RUN: at line 10';   /__w/llvm/llvm/build/bin/FileCheck /__w/llvm/llvm/src/llvm/test/tools/sycl-post-link/sycl-esimd/no-sycl-esimd-split-shared-func.ll -input-file=/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_0.ll
: 'RUN: at line 11';   /__w/llvm/llvm/build/bin/FileCheck /__w/llvm/llvm/src/llvm/test/tools/sycl-post-link/sycl-esimd/no-sycl-esimd-split-shared-func.ll -input-file=/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_0.sym --check-prefixes CHECK-SYM
--
Exit Code: 1

Command Output (stderr):
--
/__w/llvm/llvm/src/llvm/test/tools/sycl-post-link/sycl-esimd/no-sycl-esimd-split-shared-func.ll:16:21: error: CHECK-TABLE-EMPTY: is not on the line after the previous match
; CHECK-TABLE-EMPTY:
                    ^
/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp.table:5:1: note: 'next' match was here

^
/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp.table:2:330: note: previous match ended here
/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_0.ll|/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_0.prop|/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_0.sym
                                                                                                                                                                                                                                                                                                                                         ^
/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp.table:3:1: note: non-matching line after previous match is here
/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_1.ll|/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_1.prop|/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_1.sym
^

Input file: /__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp.table
Check file: /__w/llvm/llvm/src/llvm/test/tools/sycl-post-link/sycl-esimd/no-sycl-esimd-split-shared-func.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: [Code|Properties|Symbols] 
          2: /__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_0.ll|/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_0.prop|/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_0.sym 
          3: /__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_1.ll|/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_1.prop|/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_1.sym 
          4: /__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_esimd_2.ll|/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_esimd_2.prop|/__w/llvm/llvm/build/test/tools/sycl-post-link/sycl-esimd/Output/no-sycl-esimd-split-shared-func.ll.tmp_esimd_2.sym 
          5: 
empty:16     ! error: match on wrong line
>>>>>>

@dm-vodopyanov dm-vodopyanov temporarily deployed to aws January 31, 2023 23:01 — with GitHub Actions Inactive
dm-vodopyanov added a commit to dm-vodopyanov/llvm-test-suite that referenced this pull request Feb 2, 2023
@dm-vodopyanov dm-vodopyanov temporarily deployed to aws February 2, 2023 14:44 — with GitHub Actions Inactive
dm-vodopyanov added a commit to dm-vodopyanov/llvm-test-suite that referenced this pull request Feb 2, 2023
@dm-vodopyanov
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1569

@dm-vodopyanov
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1569

@dm-vodopyanov dm-vodopyanov temporarily deployed to aws February 2, 2023 15:58 — with GitHub Actions Inactive
@v-klochkov
Copy link
Contributor

v-klochkov commented Feb 3, 2023

@v-klochkov no-sycl-esimd-split-shared-func.ll test started to fail because this patch adds device code splitting for reqd_sub_group_size. Can you please advise, how to update this test? Or probably it doesn't make sense now?

I am not sure yet if the test should stay with your fix,
but IMO the problems with this test showed some major concerns to this PR as the result doesn't look correct. with this fix/PR.

WITHOUT THE FIX:
a0) generated 1 output module
b0) SIMD_CALLEE got removed completely as it is alwaysinline
c0) SHARED_F is cloned: original SHARED_F() and new SHARED_F.esimd()
d0) SIMD_CALL_HELPER(func_ptr, 4xfloat) is transformed to SIMD_CALL_HELPER_1(4xfloat)

WITH THE FIX:
a1) Not OK: generated 3 modules - Half of ESIMD functions went to test_0.ll, another half in test_esimd2.ll
In particular, the following function was placed to test_0.ll instead of test_esimd2.ll:
define dso_local spir_kernel void @ESIMD_kernel(i8 addrspace(1)* "VCArgumentIOKind"="0" %ptr) #5 !sycl_explicit_simd !1 !intel_reqd_sub_group_size !2 {

b1) Not OK: SIMD_CALLEE is not removed and left as a dead code in
c1) Ok: SHARED_F is cloned: original SHARED_F() is in test_0.ll and new SHARED_F() is in test_esimd2.ll
d1) Not OK: SIMD_CALL_HELPER(func_ptr, 4xfloat) is transformed to SIMD_CALL_HELPER_1(4xfloat),
but the declarations of SIMD_CALL_HELPER_1 in test_0.ll and test_1.ll lost the attribute "VCFunction".

@dm-vodopyanov
Copy link
Contributor Author

dm-vodopyanov commented Feb 3, 2023

IMO the problems with this test showed some major concerns to this PR as the result doesn't look correct. with this fix/PR.

Device code split based on reqd-sub-group-size is a part of optional kernel features which are part of SYCL 2020. This esimd test uses reqd-sub-group-size attribute, so this is expected that now more than one modules are generated because we do split on reqd-sub-group-size. So I guess test should be re-written or esimd feature should be revised to take into account this SYCL 2020 feature. E.g., regarding the test, removing -split=auto from the test makes test pass as no splitting is performed but I'm not sure if this is correct.

@v-klochkov
Copy link
Contributor

Device code split based on reqd-sub-group-size is a part of optional kernel features which are part of SYCL 2020

The attribute "!sycl_explicit_simd" must have higher priority than "!intel_reqd_sub_group_size 1".
I mean the functions with "!sycl_explicit_simd" must go to test_esimd_2.ll instead of test_0.ll.

It is Ok to separate funcs with "!intel_reqd_sub_group_size 1" to a separate module,
but those func should be SYCL (not ESIMD). All ESIMD funcs need to go to ESIMD module.

The current version of PR generates wrong IR.
In particular, test_0.ll has got ESIMD funcs (with !sycl_explicit_simd attr), but test_0.prop doesn't have "isEsimdImage=1|1".

The PR cloned SHARED_F 2 times instead of 1, which is unnecessary.

Also, why modified compiler moved this func to test_esimd_2.ll:

  define weak_odr dso_local x86_regcallcc <4 x float> @SIMD_CALL_HELPER_1(<4 x float> %simd_args) #3 !sycl_explicit_simd !0 !intel_reqd_sub_group_size !1
  !1 = !{i32 1}

and this one to test_0.ll?

  define dso_local spir_kernel void @ESIMD_kernel(i8 addrspace(1)* "VCArgumentIOKind"="0" %ptr) #5 !sycl_explicit_simd !1 !intel_reqd_sub_group_size !2 {
  !2 = !{i32 1}

Copy link
Contributor

@v-klochkov v-klochkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark as request-changes to finish discussion on ESIMD part before merging this.

AlexeySachkov added a commit that referenced this pull request Apr 28, 2023
)

#### Intro

This is a refactoring of how we perform device code split in
`sycl-post-link`, which is intended to solve several existing issues
with the current implementation:
1. increased peak RAM consumption by `sycl-post-link`
2. bad scaling with more and more split "dimensions" being added
3. increased tests maintenance cost due to non-deterministic order
(between commits) of output files produced by `sycl-post-link`

#### A bit more context about the issues above:

(1) Increase peak RAM consumption is caused by the fact that we
currently preserve **all** splits in-memory, even though we can process
them on-by-one and discard them as soon as we stored them to a disk.
This was implemented as a memory consumption optimization in #5021, but
it got accidentally reverted in #7302 as an attempt to workaround (2).

(2) is pretty much summarized in our source code:

https://github.com/intel/llvm/blob/afebb2543ccecb89f83c84b68fba7616bbab89ac/llvm/tools/sycl-post-link/sycl-post-link.cpp#L806-L811

(3) is caused by a bad implementation decision made in #7302: because
every split is now identified by a hash, every time you add a new split
"dimension"/new feature to an account, it results in different hashes
for existing tests. Just look how many unrelated tests had to be updated
in #7512, #8056 and #8167

#### Now to the PR itself:

It introduces a new infrastructure for categorizing/grouping kernel
functions: instead of using hashes, we now build a string description
for each kernel function and then group kernels with the same
description string together.

String description is built by a new entity: it accepts a set of rules,
where each rule is a simple function which returns a string for passed
`llvm::Function`. Results of all rules are concatenated together and
rules are invoked in a stable order of their registration.

There is a simple API for building those rules. It provides some
predefined rules for the most popular use cases like turning a function
attribute or a metadata into a string descriptor for the function. There
is also a possibility to pass a custom callback there to implement more
complicated logic.

#### How does this PR help with issues above?

(1) and (2) are fixed in conjunction: `sycl-post-link` was refactored to
avoid storing more than one split module at a time and that is possible
because the PR unifies per-scope and optional-kernel-features splitters
into a single generic splitter. The new API for kernels categorization
seems to be flexible enough to provide that infrastructure so merged
splitters still look OK code-wise.

(3) is caused by using string identifiers instead of hashes as well as
by using a data structure which sorts identifiers.

#### Any other benefits from this PR?

About 50 lines of code less to support :)

Extending device code split for more optional features would be even
easier than it is now: instead of adding several changes to various
places around `UsedOptionalFeatures` structure, it will be just adding a
1-3 lines of code. Please also note that `UsedOptionalFeatures` contains
tons of inconsistencies in its implementation, which will all gone with
this PR: in `operator==` we don't use hash and instead compare certain
fields directly (and we do miss some of them); `generateModuleName`
method skips some of optional features and ignores them.

Cross-module `device_global` usages checks should now work at all split
dimensions (except for ESIMD).

#### Any potential downsides?

With current `UsedOptionalFeatures` there is a possibility to embed
various information (used aspects, `large-grf` flag, etc.) directly
during device code split to avoid re-gathering that information later
when we generate properties. With the suggested approach, it would be
harder to do, because it doesn't seem to naturally fit to the proposed
infrastructure: see changes I did around `large-grf` in this PR.

However, we have never actually implemented this and re-querying some
metadata from function doesn't seem like a bottleneck, so it should
really be a very minor and only theoretical downside.
AlexeySachkov added a commit that referenced this pull request Jun 27, 2023
…oup-size (#9928)

Based off #8167

---------

Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com>
@AlexeySachkov
Copy link
Contributor

Superseded by #9928

Chenyang-L pushed a commit that referenced this pull request Jul 11, 2023
…oup-size (#9928)

Based off #8167

---------

Co-authored-by: Alexey Sachkov <alexey.sachkov@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants