Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial OpenCL SDK review #1

Open
bashbaug opened this issue Nov 16, 2021 · 1 comment
Open

initial OpenCL SDK review #1

bashbaug opened this issue Nov 16, 2021 · 1 comment

Comments

@bashbaug
Copy link

Hello!

Here are a few brief comments regarding the changes in the reducecpp branch:

  • It looks like C++ 14 is now required whereas only C++ 11 was required previously. It's almost 2022, so that's probably ok, but worth confirming.
  • I didn't have TCLAP so I was unable to build the SDK after cloning. Can we mirror a copy of TCLAP in this repo to simplify building or add TCLAP as a submodule? If not, we need to update the README with the exact steps needed to build, to minimize barriers to entry.
  • Just FYI, I cannot update to the headers, ICD loader, and C++ bindings submodules in this branch either.
  • Several files use the abbreviation "plat" for "platform" and "dev" for "device". I would recommend spelling these out completely, to aid understanding in the SDK.
  • The prevous "add_sample" CMake function enabled testing conditionally for each sample, whereas I believe the latest changes control testing globally. Is this intended? I've found that some samples do not lend themselves well to automated testing, though having testing for all samples is a good goal.
  • If you're looking for another reduction variant, the very simplest textbook reduction usually uses atomics (although it's also usually very slow).
  • The check for subgroup support in "may_use_sub_group_reduce" doesn't quite look right, since some implementations may support subgroups but do not support cl_khr_subgroups due to lack of subgroup independent forward progress. The recommendation in the OpenCL API spec is to check for a non-zero return value from CL_DEVICE_MAX_NUM_SUB_GROUPS to check for subgroup support.
  • Given the purpose of the sample ("demonstrate how to query various extensions applicable in the context of a reduction algorithm") I think it would be worth adding more comments describing what the different queries are doing.
  • Should there be a way to force an alternate implementation (if supported) vs. choosing the default, to compare performance or correctness? For example, could I try the vanilla or subgroup variant even if my device supports work-group collective functions?
@MathiasMagnus
Copy link

Thanks @bashbaug for the throroush review.

  • I bumped to C++14 because I couldn't find a reliable std::apply backport to C++11 and I deferred fixing it. Nothing else is C++14 specific, other than the C++ CLI parser utility which invokes the user-define parse and comprehend functions from an argument pack using std::apply. If anyone can cook up a working backport of it, we can revert to C++11 if need be. Then again, as you said it's almost 2022 and it's a set of modern samples.
  • I'm mildly against self-hosting dependencies and as it turned out, we did some of it for the C samples (portable, non-POSIX command-line arg parsing in C for eg.) because C libraries are even more notoriously missing from package managers, or simply getopt has spoiled everyone and there isn't 1,000 CLI parsers in C available. Note that TCLAP is one apt install libtclap-dev away. Starting to self-host deps means we'd need to self-host SFML, GLEW, GLU, GLM as well...
    • The ROCm libraries we maintain went down this rabbit hole of self-hosting deps but allowing users to provide their pre-built ones too... I am not a big fan of Dependencies.cmake because it's a timesink of limited value. I can understand the sentiment in wanting to provide a solution that can be built in vacuum, but it comes at a price.
    • Generally I'd vote for promoting any package manager and try catering to their users.
  • Updating the headers, etc. are currently wired to our CI, those URLs will change when we upstream. That's where the CLHPP fixes reside for eg.
  • We can fix that.
  • Testing is still enabled on a case-by-case basis (AFAIK). The samples we wrote should execute cleanly with default arguments, but we can discuss if that's universally not feasible.
    • The GUI samples for eg. aren't tested and their testing is disabled even if CI builds them.
  • Selecting from the available algorithms is something I had in mind too (to test the relative perf of built-ins vs. the vanilla implementation for eg.) and was one of the reasons I wanted to add some CLI parsing facility, so that each sample can add their unique set of args such as selecting an algorithm explicitly not based on the sample heuristics.
  • may_use_sub_group_reduce is a pickle. I was thinking hard on that one, as I am generally against bending over backwards for the sake of non-conforming extensions, but team red and green have had years of OpenCL runtime releases where 2.x features were cherry-picked and put into 1.2 base runtimes. cl_khr_subgroups is the prime example of that. I wasn't sure what happens if one starts trying to query device properties that don't exist, because they are in that "1.3" state. Querying for the extension seemed like a safer bet, but you're right that the recommendation was CL_DEVICE_MAX_NUM_SUB_GROUPS . This can certainly be fixed if that's deemed cleaner.
  • Indeed, some docs around that would be a nice addition. We'll add that to our list.
  • Yes (see earlier on CLI args).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants