You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is an idea of organizing the code to improve the GPU execution (and maybe the CPU). At the moment it only works in the theory, remains pending thinking about the optimal form to apply the idea. In any case, it should be nice to think about it for the C++ 2.0 implementation.
The idea is to parallelize (apart from the reactions) also the species loop located inside the rxn_gpu_arrhenius_calc_deriv_contrib type functions. Inside these functions exists a loop that iterates over all the species present in the reaction to calculate the rate for each species. Theoretically, we can parallelize this loop on the GPU without cost (since we have almost all the threads we want available).
The problem is that it needs to restructure the data and the function will look more different. As an advantage, it can be tested first in the GPU code version.
This optimization will allow accessing the data from a higher interface level (GPU interface), since the loop will be moved from the RXN files to this interface, facilitating the data treatment. Moreover, it will help to devise more optimizations, like executing a part of code or another depending on the input data read (for example, if there are few reactions to compute, use the CPU instead of the GPU and otherwise, or at least advise the user)
The text was updated successfully, but these errors were encountered:
This issue is an idea of organizing the code to improve the GPU execution (and maybe the CPU). At the moment it only works in the theory, remains pending thinking about the optimal form to apply the idea. In any case, it should be nice to think about it for the C++ 2.0 implementation.
The idea is to parallelize (apart from the reactions) also the species loop located inside the rxn_gpu_arrhenius_calc_deriv_contrib type functions. Inside these functions exists a loop that iterates over all the species present in the reaction to calculate the rate for each species. Theoretically, we can parallelize this loop on the GPU without cost (since we have almost all the threads we want available).
The problem is that it needs to restructure the data and the function will look more different. As an advantage, it can be tested first in the GPU code version.
This optimization will allow accessing the data from a higher interface level (GPU interface), since the loop will be moved from the RXN files to this interface, facilitating the data treatment. Moreover, it will help to devise more optimizations, like executing a part of code or another depending on the input data read (for example, if there are few reactions to compute, use the CPU instead of the GPU and otherwise, or at least advise the user)
The text was updated successfully, but these errors were encountered: