Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Doc] Simplify non-uniform group design #14604

Merged

Conversation

Pennycook
Copy link
Contributor

@Pennycook Pennycook commented Jul 17, 2024

This commit simplifies the design of the non-uniform group extension as follows:

  • The _group suffix is removed from user-constructed groups to reduce verbosity.

  • get_*_group() functions are replaced with partitioning functions, to clarify that users are not getting an existing group but rather creating a new one.

  • Fixed-size groups are renamed to "chunks". The previous name did not convey that the group is contiguous as well as having a fixed size.

  • Ballot-groups and opportunistic-groups have been combined into a single group type called a "fragment", focusing on what the groups represent rather than how they were constructed.

  • Synchronization behavior is added to all partitioning functions, to reduce the cognitive burden (of remembering which partitioning functions synchronize).

  • An implicit cast from chunks to fragments is introduced. A fragment is a more general representation than a chunk, and so this conversion is always valid.

  • Split tangle into its own extension. The implementation requirements for tangle are different to other non-uniform groups, which may limit their availability to certain device types. Splitting them out into a separate extension will make it easier to set user expectations and to track implementation status across backends.

This commit simplifies the design of the non-uniform group extension as
follows:

- The _group suffix is removed from user-constructed groups to reduce
  verbosity.

- get_*_group() functions are replaced with partitioning functions, to clarify
  that users are not getting an existing group but rather creating a new one.

- Fixed-size groups are renamed to "chunks". The previous name did not convey
  that the group is contiguous as well as having a fixed size.

- Ballot-groups and opportunistic-groups have been combined into a single group
  type called a "fragment", focusing on what the groups represent rather than
  how they were constructed.

- Synchronization behavior is added to all partitioning functions, to reduce the
  cognitive burden (of remembering which partitioning functions synchronize).

- An implicit cast from chunks to fragments is introduced. A fragment is a more
  general representation than a chunk, and so this conversion is always valid.

Signed-off-by: John Pennycook <john.pennycook@intel.com>
@Pennycook Pennycook added the spec extension All issues/PRs related to extensions specifications label Jul 17, 2024
@Pennycook Pennycook requested a review from steffenlarsen July 17, 2024 17:26
The implementation requirements for tangle are different to other non-uniform
groups, which may limit their availability to certain device types. Splitting
them out into a separate extension will make it easier to set user extensions
and to track implementation status across backends.

Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
@Pennycook Pennycook marked this pull request as ready for review September 19, 2024 08:52
@Pennycook Pennycook requested a review from a team as a code owner September 19, 2024 08:52
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked at the tangle doc yet


[source, c++]
----
namespace sycl::ext::oneapi::experimental {

template <typename ParentGroup>
class tangle_group {
template <std::size_t ChunkSize, typename ParentGroup>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably update linear_id_type definition here, see the discussion about recorded inconsistency in #13702. Linear id type is smaller than chunk size, meaning that in certain cases we won't be able to enumerate all work-items in a chunk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a few possible fixes here and I'm not sure what is best:

  1. linear_id_type = std::size_t
    Always matches, no problems.

  2. linear_id_type = ParentGroup::linear_id_type
    Sets uint32_t for sub_group and size_t for group. The situation where somebody provides a ChunkSize larger than the ParentGroup would prevent enumeration of all work-items in the chunk, but that situation is already UB.

  3. linear_id_type = /* smallest unsigned integer with log2(ChunkSize) bits */
    Could use std::uint8_t for ChunkSize < 256. Might lead to future ABI issues if we get something like a std::uint4_t in future (but that seems unlikely, given the number of bits in a byte).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean towards (2) here, because it makes chunk consistent with its parent group. I.e. if someone had a code for sub-groups, but then introduced some specialization for chunks, types of all IDs they may have in use would still match what APIs return. And this will also be true when we extend chunks to support group.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. I made the change in 0f2a471.

Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tangle part review

Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Copy link
Contributor

@AlexeySachkov AlexeySachkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my feedback has been addressed, I don't have any further comments. That issue about liner_id_type of chunk is not a high-priority one to be resolved immediately, because we don't have a device which supports sub-group sizes bigger than uint32_t and potential issue I've highlighted there can't really be encountered right now.

Signed-off-by: John Pennycook <john.pennycook@intel.com>
@AlexeySachkov
Copy link
Contributor

@intel/dpcpp-specification-reviewers, any further comments before I merge this PR?

@AlexeySachkov
Copy link
Contributor

@intel/dpcpp-specification-reviewers, any further comments before I merge this PR?

I treat lack of response as a no, so merging the PR

@AlexeySachkov AlexeySachkov merged commit 761d45d into intel:sycl Oct 4, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec extension All issues/PRs related to extensions specifications
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants