-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve RXN vectorization: Reorder and swap loops type_reactions and n_cells #116
Comments
@mattldawson What do you think? |
@cguzman95 - I think this is a good idea for optimization, but I think there will be several incompatibilities with the current model structure that we'll have to work through before implementing something like this. One of these I think I've mentioned before is that the reason we loop on cells first is that for each cell the matrices |
But, we should talk about this and come up with a design that works and uses a structure like you're proposing if that will make things faster. I would prefer though, if we hold of on the development until after the first model papers are done. Then we can work on implementing this optimization for your paper. We could actually think about doing this as part of the porting to c++ What do you think? |
What do you need for the paper version? I mention this because I have to adapt the rxn code and structures to the GPU version and I don't want to follow so much the actual structure if we will change it soon. It's fine for you if I do only necessary changes to make GPU part works and adapt the structure after we finish the development? (I mean thinks like use the same variable names, passing the same variables in the function, and other similar things) |
I think that would be fine - could you come up with some pseudo-code for the I just want to avoid you doing a lot of work on something that will have to be changed significantly to work with all the model elements. For example, when you initially put the loops over cells in the Another thing to remember is that optimizing a simple mechanism of just gas-phase reactions is great, but in practice the mechanism will be more complex, with some reactions taking a lot more time to calculate than others, and will include sub model calculations that will be much fewer in number but require a lot more computation than the reactions. So, as part of the design process, it might be worthwhile to take a step back and really think about realistic applications and how these will be affected by the optimizations. |
It might be worth thinking about moving to c++ first, to make this easier. If you look at the |
I think move to C++ will complicate it, first, it would be better to make a structure similar to classes, and then pass to C++ |
ok, after an offline discussion with @cguzman95 - he will start a test branch to test some different structures for calculating the derivatives and Jacobians using GPUs. But, until we come up with a fully thought-out plan for making this work for all the model elements, not just some reactions, we won't start development on code we plan to push to |
Yes, as we were talking, since I need to adapt the GPU structure and code, I will probably try new structures there. It will be a function like set_data_gpu that will change the structure to the best (it already exists and it makes a modification on rxn to improve only the GPU case, and also has a half-finished reordering of the reactions in rxn that could be nice too on CPU). |
The current state of chem_mod calc_deriv and calc_jac follows this loop order:
For n_cells:
....
For n_rxn:
Update rxn variables (rxn_env_idx, num_react...)
...
end
end
With the multi-cell implementation, we expect to have more cells than reactions (usually 10,000 cells in front of 200 reactions). In the reactions loop, we need to load specific rxn variables like rxn_env_idx, which can take different variables on each iteration (for example, it can be 1,1,2,1,2,... and go on, depending on the type and the order). Also, the sizes of int_data and float_data will change. This case blocks vectorization and is one of the reasons that the MONARCH_1 test runs significantly faster in front cb05.
A partial solution is to order the reaction data (doing first all the reactions of one type, then the next type and go on). However, if we have a mechanism with few reactions and some different reaction types, the fails on vectorization actually will be repeated for all the number of cells
The best solution to this, apart from order the RXN structure, is to compute first all the reactions of the same type, then the next type and go on. This means select first the reaction type and then compute all the number of cells (in other means, swap the loops n_rxn and n_cells)
This will also simplify some variables. For example, we don't need the array rxn_env_idx, since we will compute like 10,000 cell-times the same reaction using the same rxn_env_idx (correct me if I'm wrong and some reaction type can have different rxn_env_idx, or in other meanings different number of RATE_CONSTANTS). So, we only need to store a variable N_RATE_CONSTANTS for each reaction (it could be defined on each rxn file or it could be an array defined at the start in the common.h file)
This will be also in concordance with classes structure for a future C++ implementation
The text was updated successfully, but these errors were encountered: