CTOParallelFor with BoxND / add AnyCTO #4109

AlexanderSinn · 2024-08-26T00:09:30Z

Summary

This PR adds support for BoxND to CTOParallelFor by adding the AnyCTO function which can be used to implement compile time options with any kernel launching function such as ParallelFor, ParallelForRNG, launch, etc.

I'm not sure if AnyCTO is a good name, are there other suggestions?

Additional background

AnyCTO Examples:

    int A_runtime_option = ...;
    int B_runtime_option = ...;
    enum A_options : int { A0, A1, A2, A3 };
    enum B_options : int { B0, B1 };
    AnyCTO(TypeList<CompileTimeOptions<A0,A1,A2,A3>,
                    CompileTimeOptions<B0,B1>>{},
        {A_runtime_option, B_runtime_option},
        [&](auto cto_func){
            ParallelForRNG(N, cto_func);
        },
        [=] AMREX_GPU_DEVICE (int i, const RandomEngine& engine,
                              auto A_control, auto B_control)
        {
            ...
            if constexpr (A_control.value == A0) {
                ...
            } else if constexpr (A_control.value == A1) {
                ...
            } else if constexpr (A_control.value == A2) {
                ...
            else {
                ...
            }
            if constexpr (A_control.value != A3 && B_control.value == B1) {
                ...
            }
            ...
        }
    );


    constexpr int nthreads_per_block = ...;
    int nblocks = ...;
    AnyCTO(TypeList<CompileTimeOptions<A0,A1,A2,A3>,
                    CompileTimeOptions<B0,B1>>{},
        {A_runtime_option, B_runtime_option},
        [&](auto cto_func){
            launch<nthreads_per_block>(nblocks, Gpu::gpuStream(), cto_func);
        },
        [=] AMREX_GPU_DEVICE (auto A_control, auto B_control){
            ...
        }
    );

Additionally, .GetOptions() can be used to use the compile time options in the function that launches the kernel:

    int nthreads_per_block = ...;
    AnyCTO(TypeList<CompileTimeOptions<128,256,512,1024>>{},
        {nthreads_per_block},
        [&](auto cto_func){
            constexpr std::array<int, 1> ctos = cto_func.GetOptions();
            constexpr int c_nthreads_per_block = ctos[0];
            ParallelFor<c_nthreads_per_block>(N, cto_func);
        },
        [=] AMREX_GPU_DEVICE (int i, auto){
            ...
        }
    );


    BoxND<6> box6D = ...;
    int dims_needed = ...;
    AnyCTO(TypeList<CompileTimeOptions<1,2,3,4,5,6>>{},
        {dims_needed},
        [&](auto cto_func){
            constexpr std::array<int, 1> ctos = cto_func.GetOptions();
            constexpr int c_dims_needed = ctos[0];
            const auto box = BoxShrink<c_dims_needed>(box6D);
            ParallelFor(box, cto_func);
        },
        [=] AMREX_GPU_DEVICE (auto intvect, auto) -> decltype(void(intvect.size())) {
            ...
        }
    );

Checklist

The proposed changes:

fix a bug or incorrect behavior in AMReX
add new capabilities to AMReX
changes answers in the test suite to more than roundoff level
are likely to significantly affect the results of downstream AMReX users
include documentation in the code and/or rst files, if appropriate

AlexanderSinn · 2024-09-02T18:00:10Z

One thing I noticed is that for some use cases, multiple lambdas with CTOs can be needed (e.g. reduce scan). I think to add that capability, all the lambdas would need to be put in a GpuTuple temporarily.

AlexanderSinn added 4 commits August 25, 2024 20:00

AnyCTO

e976f42

add doc

376a30e

fix examples

1675ff3

fix clang tidy issue

a8d3785

AlexanderSinn mentioned this pull request Aug 26, 2024

Add more support for N dimensions #3955

Open

use forward

064fa3b

AlexanderSinn requested a review from WeiqunZhang August 26, 2024 17:05

AlexanderSinn added 2 commits August 28, 2024 22:33

Merge branch 'development' into AnyCTO

f78035b

fix brace

6f1273a

WeiqunZhang approved these changes Sep 2, 2024

View reviewed changes

WeiqunZhang merged commit de4dc97 into AMReX-Codes:development Sep 2, 2024
72 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CTOParallelFor with BoxND / add AnyCTO #4109

CTOParallelFor with BoxND / add AnyCTO #4109

AlexanderSinn commented Aug 26, 2024 •

edited

Loading

AlexanderSinn commented Sep 2, 2024

CTOParallelFor with BoxND / add AnyCTO #4109

CTOParallelFor with BoxND / add AnyCTO #4109

Conversation

AlexanderSinn commented Aug 26, 2024 • edited Loading

Summary

Additional background

Checklist

AlexanderSinn commented Sep 2, 2024

AlexanderSinn commented Aug 26, 2024 •

edited

Loading