-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Doc] Simplify non-uniform group design #14604
[SYCL][Doc] Simplify non-uniform group design #14604
Conversation
This commit simplifies the design of the non-uniform group extension as follows: - The _group suffix is removed from user-constructed groups to reduce verbosity. - get_*_group() functions are replaced with partitioning functions, to clarify that users are not getting an existing group but rather creating a new one. - Fixed-size groups are renamed to "chunks". The previous name did not convey that the group is contiguous as well as having a fixed size. - Ballot-groups and opportunistic-groups have been combined into a single group type called a "fragment", focusing on what the groups represent rather than how they were constructed. - Synchronization behavior is added to all partitioning functions, to reduce the cognitive burden (of remembering which partitioning functions synchronize). - An implicit cast from chunks to fragments is introduced. A fragment is a more general representation than a chunk, and so this conversion is always valid. Signed-off-by: John Pennycook <john.pennycook@intel.com>
The implementation requirements for tangle are different to other non-uniform groups, which may limit their availability to certain device types. Splitting them out into a separate extension will make it easier to set user extensions and to track implementation status across backends. Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Show resolved
Hide resolved
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't looked at the tangle
doc yet
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Show resolved
Hide resolved
|
||
[source, c++] | ||
---- | ||
namespace sycl::ext::oneapi::experimental { | ||
|
||
template <typename ParentGroup> | ||
class tangle_group { | ||
template <std::size_t ChunkSize, typename ParentGroup> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably update linear_id_type
definition here, see the discussion about recorded inconsistency in #13702. Linear id type is smaller than chunk size, meaning that in certain cases we won't be able to enumerate all work-items in a chunk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a few possible fixes here and I'm not sure what is best:
-
linear_id_type = std::size_t
Always matches, no problems. -
linear_id_type = ParentGroup::linear_id_type
Setsuint32_t
forsub_group
andsize_t
forgroup
. The situation where somebody provides aChunkSize
larger than theParentGroup
would prevent enumeration of all work-items in the chunk, but that situation is already UB. -
linear_id_type = /* smallest unsigned integer with log2(ChunkSize) bits */
Could usestd::uint8_t
forChunkSize
< 256. Might lead to future ABI issues if we get something like astd::uint4_t
in future (but that seems unlikely, given the number of bits in a byte).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I lean towards (2) here, because it makes chunk
consistent with its parent group. I.e. if someone had a code for sub-groups, but then introduced some specialization for chunks, types of all IDs they may have in use would still match what APIs return. And this will also be true when we extend chunks to support group
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. I made the change in 0f2a471.
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_non_uniform_groups.asciidoc
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tangle
part review
sycl/doc/extensions/experimental/sycl_ext_oneapi_tangle.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_tangle.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_tangle.asciidoc
Outdated
Show resolved
Hide resolved
sycl/doc/extensions/experimental/sycl_ext_oneapi_tangle.asciidoc
Outdated
Show resolved
Hide resolved
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All my feedback has been addressed, I don't have any further comments. That issue about liner_id_type
of chunk
is not a high-priority one to be resolved immediately, because we don't have a device which supports sub-group sizes bigger than uint32_t
and potential issue I've highlighted there can't really be encountered right now.
Signed-off-by: John Pennycook <john.pennycook@intel.com>
@intel/dpcpp-specification-reviewers, any further comments before I merge this PR? |
I treat lack of response as a no, so merging the PR |
This commit simplifies the design of the non-uniform group extension as follows:
The _group suffix is removed from user-constructed groups to reduce verbosity.
get_*_group() functions are replaced with partitioning functions, to clarify that users are not getting an existing group but rather creating a new one.
Fixed-size groups are renamed to "chunks". The previous name did not convey that the group is contiguous as well as having a fixed size.
Ballot-groups and opportunistic-groups have been combined into a single group type called a "fragment", focusing on what the groups represent rather than how they were constructed.
Synchronization behavior is added to all partitioning functions, to reduce the cognitive burden (of remembering which partitioning functions synchronize).
An implicit cast from chunks to fragments is introduced. A fragment is a more general representation than a chunk, and so this conversion is always valid.
Split tangle into its own extension. The implementation requirements for tangle are different to other non-uniform groups, which may limit their availability to certain device types. Splitting them out into a separate extension will make it easier to set user expectations and to track implementation status across backends.