Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTOParallelFor with BoxND / add AnyCTO #4109

Merged
merged 7 commits into from
Sep 2, 2024

Conversation

AlexanderSinn
Copy link
Member

@AlexanderSinn AlexanderSinn commented Aug 26, 2024

Summary

This PR adds support for BoxND to CTOParallelFor by adding the AnyCTO function which can be used to implement compile time options with any kernel launching function such as ParallelFor, ParallelForRNG, launch, etc.

I'm not sure if AnyCTO is a good name, are there other suggestions?

Additional background

AnyCTO Examples:

    int A_runtime_option = ...;
    int B_runtime_option = ...;
    enum A_options : int { A0, A1, A2, A3 };
    enum B_options : int { B0, B1 };
    AnyCTO(TypeList<CompileTimeOptions<A0,A1,A2,A3>,
                    CompileTimeOptions<B0,B1>>{},
        {A_runtime_option, B_runtime_option},
        [&](auto cto_func){
            ParallelForRNG(N, cto_func);
        },
        [=] AMREX_GPU_DEVICE (int i, const RandomEngine& engine,
                              auto A_control, auto B_control)
        {
            ...
            if constexpr (A_control.value == A0) {
                ...
            } else if constexpr (A_control.value == A1) {
                ...
            } else if constexpr (A_control.value == A2) {
                ...
            else {
                ...
            }
            if constexpr (A_control.value != A3 && B_control.value == B1) {
                ...
            }
            ...
        }
    );


    constexpr int nthreads_per_block = ...;
    int nblocks = ...;
    AnyCTO(TypeList<CompileTimeOptions<A0,A1,A2,A3>,
                    CompileTimeOptions<B0,B1>>{},
        {A_runtime_option, B_runtime_option},
        [&](auto cto_func){
            launch<nthreads_per_block>(nblocks, Gpu::gpuStream(), cto_func);
        },
        [=] AMREX_GPU_DEVICE (auto A_control, auto B_control){
            ...
        }
    );

Additionally, .GetOptions() can be used to use the compile time options in the function that launches the kernel:

    int nthreads_per_block = ...;
    AnyCTO(TypeList<CompileTimeOptions<128,256,512,1024>>{},
        {nthreads_per_block},
        [&](auto cto_func){
            constexpr std::array<int, 1> ctos = cto_func.GetOptions();
            constexpr int c_nthreads_per_block = ctos[0];
            ParallelFor<c_nthreads_per_block>(N, cto_func);
        },
        [=] AMREX_GPU_DEVICE (int i, auto){
            ...
        }
    );


    BoxND<6> box6D = ...;
    int dims_needed = ...;
    AnyCTO(TypeList<CompileTimeOptions<1,2,3,4,5,6>>{},
        {dims_needed},
        [&](auto cto_func){
            constexpr std::array<int, 1> ctos = cto_func.GetOptions();
            constexpr int c_dims_needed = ctos[0];
            const auto box = BoxShrink<c_dims_needed>(box6D);
            ParallelFor(box, cto_func);
        },
        [=] AMREX_GPU_DEVICE (auto intvect, auto) -> decltype(void(intvect.size())) {
            ...
        }
    );

Checklist

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • include documentation in the code and/or rst files, if appropriate

@AlexanderSinn
Copy link
Member Author

One thing I noticed is that for some use cases, multiple lambdas with CTOs can be needed (e.g. reduce scan). I think to add that capability, all the lambdas would need to be put in a GpuTuple temporarily. 

@WeiqunZhang WeiqunZhang merged commit de4dc97 into AMReX-Codes:development Sep 2, 2024
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants