Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update submodules for extra performance (amrex, amrex-hydro) #1146

Merged
merged 1 commit into from
Jul 22, 2024

Conversation

marchdf
Copy link
Contributor

@marchdf marchdf commented Jul 22, 2024

Summary

Update amrex and hydro submodules.

This will give a big performance boost for our input files (we have very large input files) due to the parmparse refactor. For example (case with O(100) refinement zones around turbines):

developement amrex: amr-wind::RefineCriteriaManager::initialize (where the parmparse happen) is slow

Time spent in InitData():    267.2245975
Time spent in Evolve():      15.92839264


TinyProfiler total time across processes [min...avg...max]: 283.2 ... 283.2 ... 283.2

-----------------------------------------------------------------------------------------------------------------
Name                                                              NCalls  Excl. Min  Excl. Avg  Excl. Max   Max %
-----------------------------------------------------------------------------------------------------------------
amr-wind::RefineCriteriaManager::initialize                            1       76.2      114.3      123.8  43.71%
DistributionMapping::LeastUsedCPUs()                                   5   0.001156      9.399      48.07  16.97%
FabArray::ParallelCopy_finish()                                     2143       5.03      12.48      38.56  13.61%
FillBoundary_finish()                                              19340      10.57      17.86      37.98  13.41%
MLNodeLaplacian::Fsmooth()                                          1744      23.14       23.7      24.04   8.49%
amr-wind::godunov::compute_fluxes                                   1092      6.334      14.28      19.71   6.96%
FillBoundary_nowait()                                              19340      7.097      13.26      15.91   5.62%
MLPoisson::Fsmooth()                                                3328      4.796      8.599      10.42   3.68%
amrex::Copy()                                                      14053      4.366      7.875       9.27   3.27%
FabArray::setVal()                                                  2149      3.892      7.017      8.112   2.86%
MLABecLaplacian::Fsmooth()                                           288      2.452      4.981       5.85   2.07%
MLTensorOp::apply()                                                  135      2.529      3.632      4.946   1.75%
MLABecLaplacian::Fapply()                                            171      2.028       3.85      4.513   1.59%
FabArray::Xpay()                                                    7168      1.618       3.29      3.887   1.37%
amrex::Add()                                                         419       1.23      2.807      3.371   1.19%
amr-wind::godunov::predict_weno                                     8082      3.065      3.121      3.215   1.14%
amr-wind::incflo::ApplyProjection                                     10      1.806      2.754      3.068   1.08%
MLNodeLinOp::applyBC()                                              8222     0.9706      1.719      2.792   0.99%

First refactor (AMReX-Codes/amrex#4031) : amr-wind::RefineCriteriaManager::initialize goes 10X faster

Time spent in InitData():    200.5347159
Time spent in Evolve():      15.9474072


TinyProfiler total time across processes [min...avg...max]: 238.3 ... 238.3 ... 238.3

-----------------------------------------------------------------------------------------------------------------
Name                                                              NCalls  Excl. Min  Excl. Avg  Excl. Max   Max %
-----------------------------------------------------------------------------------------------------------------
FabArray::ParallelCopy_finish()                                     2143      7.752      35.56         57  23.92%
amr-wind::godunov::compute_fluxes                                   1092      11.36      35.79      45.48  19.09%
FillBoundary_finish()                                              19340      11.17      18.28      38.39  16.11%
MultiFab::Multiply()                                               12730     0.4379      2.323      25.61  10.75%
amr-wind::ICNS-Godunov::compute_advection_term                         9     0.4147      3.084       25.6  10.75%
MLNodeLaplacian::Fsmooth()                                          1744      23.17      23.72      24.08  10.10%
FabArray::ParallelCopy_nowait()                                     2143    0.05451     0.4348      22.45   9.42%
amr-wind::main()                                                       1      21.85      21.99      22.09   9.27%
FillBoundary_nowait()                                              19340      7.107      13.28      16.47   6.91%
amr-wind::RefineCriteriaManager::initialize                            1      4.403      9.977      12.83   5.39%
MLPoisson::Fsmooth()                                                3328      4.797      8.596      10.48   4.40%
amrex::Copy()                                                      14053      4.362      7.879      9.313   3.91%
DistributionMapping::LeastUsedCPUs()                                   5  0.0005778      2.894      8.646   3.63%
FabArray::setVal()                                                  2149      3.892      6.962      8.072   3.39%
MLABecLaplacian::Fsmooth()                                           288      2.355      4.999      5.827   2.45%
MLTensorOp::apply()                                                  135       2.54      3.658      4.944   2.07%
MLABecLaplacian::Fapply()                                            171      1.974      3.855      4.517   1.90%
FabArray::Xpay()                                                    7168      1.628       3.29      3.984   1.67%
amrex::Add()                                                         419      1.203      2.757      3.325   1.40%
amr-wind::godunov::predict_weno                                     8082      3.067      3.123      3.219   1.35%
MLNodeLinOp::applyBC()                                              8222     0.9782      1.706       2.77   1.16%

Second refactor (AMReX-Codes/amrex#4035): amr-wind::RefineCriteriaManager::initialize goes completely away (stuffed into Other in the profiling output). Also known as ♾️ speedup

Time spent in InitData():    140.108108
Time spent in Evolve():      15.80087798


TinyProfiler total time across processes [min...avg...max]: 155.9 ... 155.9 ... 155.9

-----------------------------------------------------------------------------------------------------------------
Name                                                              NCalls  Excl. Min  Excl. Avg  Excl. Max   Max %
-----------------------------------------------------------------------------------------------------------------
FillBoundary_finish()                                              19340      11.17       18.1      38.22  24.51%
FabArray::ParallelCopy_finish()                                     2143       6.09      13.41      34.61  22.20%
MLNodeLaplacian::Fsmooth()                                          1744      23.14      23.71      24.06  15.43%
amr-wind::godunov::compute_fluxes                                   1092       9.73      15.69      21.13  13.55%
FillBoundary_nowait()                                              19340      7.141       13.3      16.24  10.42%
MLPoisson::Fsmooth()                                                3328      4.852      8.615      10.41   6.68%
amrex::Copy()                                                      14053      4.367      7.883      9.422   6.04%
FabArray::setVal()                                                  2149      3.874      6.988       8.08   5.18%
MLABecLaplacian::Fsmooth()                                           288      2.326      4.984      5.863   3.76%
MLTensorOp::apply()                                                  135      2.558      3.641      4.951   3.18%
MLABecLaplacian::Fapply()                                            171      1.964      3.855      4.484   2.88%
FabArray::Xpay()                                                    7168        1.6      3.281      3.903   2.50%
amrex::Add()                                                         419      1.228      2.846      3.456   2.22%
amr-wind::godunov::predict_weno                                     8082      3.065      3.128      3.334   2.14%
MLNodeLinOp::applyBC()                                              8222     0.9699      1.708      2.756   1.77%
MLNodeLaplacian::Fapply()                                           6528      1.448      1.611      1.929   1.24%
amr-wind::ICNS-Godunov::compute_source_term                           18      0.787      1.355      1.766   1.13%
MLCellLinOp::applyBC()                                              9154     0.5692     0.9041      1.424   0.91%

Thanks to @WeiqunZhang for this!

Pull request type

Please check the type of change introduced:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe): submodule update, performance

Checklist

The following is included:

  • new unit-test(s)
  • new regression test(s)
  • documentation for new capability

This PR was tested by running:

  • the unit tests
    • on GPU
    • on CPU
  • the regression tests
    • on GPU
    • on CPU

Additional background

Closes #1145
Closes #1144

@marchdf marchdf changed the title Update submodules (amrex, amrex-hydro) Update submodules for extra performance (amrex, amrex-hydro) Jul 22, 2024
@marchdf marchdf marked this pull request as ready for review July 22, 2024 15:32
Copy link
Contributor

@jrood-nrel jrood-nrel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like performance gains.

@marchdf marchdf merged commit ee35336 into Exawind:main Jul 22, 2024
13 checks passed
@marchdf marchdf deleted the update-submods branch July 22, 2024 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants