Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taal: AMR allocations #228

Merged
merged 16 commits into from
Oct 26, 2020
Merged

Conversation

ranocha
Copy link
Member

@ranocha ranocha commented Oct 15, 2020

I've used ideas proposed in #205 to reduce the memory allocations for AMR significantly - @gregorgassner nerd-sniped me on Slack...

nerd sniping

For example, I get the following results for trixi_include("examples/2d/elixir_euler_blast_wave_shockcapturing_amr.jl")
(after running it once to remove all compilation overhead).

On dev, AMR uses 5.34GiB and takes 2.46s.

--------------------------------------------------------------------------------
Trixi simulation run finished.    Final time: 12.5    Time steps: 1598
--------------------------------------------------------------------------------

 --------------------------------------------------------------------------------
             Trixi.jl                    Time                   Allocations
                                 ----------------------   -----------------------
        Tot / % measured:             40.3s / 98.0%           5.54GiB / 100%

 Section                 ncalls     time   %tot     avg     alloc   %tot      avg
 --------------------------------------------------------------------------------
 rhs!                     7.99k    36.4s  92.2%  4.55ms   78.9MiB  1.40%  10.1KiB
   volume integral        7.99k    29.9s  75.9%  3.74ms   21.7MiB  0.38%  2.79KiB
     blended DG-FV        7.99k    21.1s  53.5%  2.64ms   7.19MiB  0.13%     944B
     pure DG              7.99k    7.06s  17.9%   883μs   7.07MiB  0.13%     928B
     blending factors     7.99k    1.65s  4.18%   206μs   7.44MiB  0.13%     976B
   interface flux         7.99k    3.77s  9.56%   472μs   6.58MiB  0.12%     864B
   mortar flux            7.99k    1.04s  2.64%   130μs   11.2MiB  0.20%  1.44KiB
   surface integral       7.99k    443ms  1.12%  55.5μs   6.95MiB  0.12%     912B
   prolong2interfaces     7.99k    415ms  1.05%  52.0μs   6.58MiB  0.12%     864B
   prolong2mortars        7.99k    400ms  1.01%  50.0μs   11.2MiB  0.20%  1.44KiB
   Jacobian               7.99k    182ms  0.46%  22.8μs   7.07MiB  0.13%     928B
   reset ∂u/∂t            7.99k    166ms  0.42%  20.8μs     0.00B  0.00%    0.00B
   prolong2boundaries     7.99k   22.8ms  0.06%  2.85μs   6.58MiB  0.12%     864B
   boundary flux          7.99k    623μs  0.00%  78.0ns     0.00B  0.00%    0.00B
   source terms           7.99k    590μs  0.00%  73.8ns     0.00B  0.00%    0.00B
 AMR                        319    2.46s  6.24%  7.72ms   5.34GiB  96.8%  17.2MiB
   refine                   319    1.20s  3.04%  3.75ms   2.71GiB  49.1%  8.70MiB
     mesh                   311    738ms  1.87%  2.37ms    191MiB  3.39%   630KiB
     solver                 311    459ms  1.16%  1.47ms   2.52GiB  45.7%  8.31MiB
   coarsen                  319    1.18s  3.00%  3.70ms   2.61GiB  47.3%  8.39MiB
     mesh                   319    741ms  1.88%  2.32ms   2.29MiB  0.04%  7.35KiB
     solver                 319    396ms  1.00%  1.24ms   2.29GiB  41.4%  7.35MiB
   indicator                319   70.2ms  0.18%   220μs    628KiB  0.01%  1.97KiB
 calculate dt             1.60k    321ms  0.81%   200μs     0.00B  0.00%    0.00B
 analyze solution            17    219ms  0.55%  12.9ms   51.2MiB  0.90%  3.01MiB
 I/O                         18   59.5ms  0.15%  3.31ms   53.1MiB  0.94%  2.95MiB
 initial condition AMR        1   3.03ms  0.01%  3.03ms    210KiB  0.00%   210KiB
   AMR                        1    558μs  0.00%   558μs    208KiB  0.00%   208KiB
     indicator                1    500μs  0.00%   500μs   66.1KiB  0.00%  66.1KiB
     refine                   1    605ns  0.00%   605ns     80.0B  0.00%    80.0B
     coarsen                  1    518ns  0.00%   518ns     80.0B  0.00%    80.0B
 --------------------------------------------------------------------------------

In this PR, AMR uses 1.25GiB and takes 1.84s.

--------------------------------------------------------------------------------
Trixi simulation run finished.    Final time: 12.5    Time steps: 1598
--------------------------------------------------------------------------------

 --------------------------------------------------------------------------------
             Trixi.jl                    Time                   Allocations      
                                 ----------------------   -----------------------
        Tot / % measured:             39.3s / 98.0%           1.44GiB / 98.9%    

 Section                 ncalls     time   %tot     avg     alloc   %tot      avg
 --------------------------------------------------------------------------------
 rhs!                     7.99k    36.0s  93.6%  4.51ms   78.9MiB  5.40%  10.1KiB
   volume integral        7.99k    29.6s  77.0%  3.71ms   21.7MiB  1.49%  2.79KiB
     blended DG-FV        7.99k    20.9s  54.2%  2.61ms   7.19MiB  0.49%     944B
     pure DG              7.99k    6.99s  18.2%   875μs   7.07MiB  0.48%     928B
     blending factors     7.99k    1.65s  4.28%   206μs   7.44MiB  0.51%     976B
   interface flux         7.99k    3.75s  9.74%   469μs   6.58MiB  0.45%     864B
   mortar flux            7.99k    1.03s  2.68%   129μs   11.2MiB  0.77%  1.44KiB
   surface integral       7.99k    446ms  1.16%  55.8μs   6.95MiB  0.48%     912B
   prolong2interfaces     7.99k    399ms  1.04%  49.9μs   6.58MiB  0.45%     864B
   prolong2mortars        7.99k    396ms  1.03%  49.5μs   11.2MiB  0.77%  1.44KiB
   Jacobian               7.99k    180ms  0.47%  22.6μs   7.07MiB  0.48%     928B
   reset ∂u/∂t            7.99k    161ms  0.42%  20.2μs     0.00B  0.00%    0.00B
   prolong2boundaries     7.99k   21.6ms  0.06%  2.71μs   6.58MiB  0.45%     864B
   boundary flux          7.99k    578μs  0.00%  72.4ns     0.00B  0.00%    0.00B
   source terms           7.99k    393μs  0.00%  49.1ns     0.00B  0.00%    0.00B
 AMR                        319    1.84s  4.78%  5.76ms   1.25GiB  87.5%  4.01MiB
   coarsen                  319    892ms  2.32%  2.80ms    729MiB  49.8%  2.28MiB
     mesh                   319    634ms  1.65%  1.99ms   2.29MiB  0.16%  7.35KiB
     solver                 319    220ms  0.57%   690μs    395MiB  27.0%  1.24MiB
   refine                   319    866ms  2.25%  2.71ms    539MiB  36.8%  1.69MiB
     mesh                   311    632ms  1.64%  2.03ms    191MiB  13.1%   630KiB
     solver                 311    233ms  0.61%   751μs    347MiB  23.7%  1.12MiB
   indicator                319   70.7ms  0.18%   222μs    628KiB  0.04%  1.97KiB
 calculate dt             1.60k    319ms  0.83%   199μs     0.00B  0.00%    0.00B
 analyze solution            17    225ms  0.58%  13.2ms   51.2MiB  3.50%  3.01MiB
 I/O                         18   70.5ms  0.18%  3.92ms   53.1MiB  3.63%  2.95MiB
 initial condition AMR        1   3.01ms  0.01%  3.01ms    210KiB  0.01%   210KiB
   AMR                        1    556μs  0.00%   556μs    208KiB  0.01%   208KiB
     indicator                1    493μs  0.00%   493μs   66.1KiB  0.00%  66.1KiB
     refine                   1    777ns  0.00%   777ns     80.0B  0.00%    80.0B
     coarsen                  1    311ns  0.00%   311ns     80.0B  0.00%    80.0B
 --------------------------------------------------------------------------------

I think there are still a some possibilities to reduce allocations and speed up AMR, cf. #161.

@ranocha ranocha requested a review from sloede October 15, 2020 07:40
@ranocha ranocha closed this Oct 15, 2020
@ranocha ranocha reopened this Oct 15, 2020
@codecov
Copy link

codecov bot commented Oct 15, 2020

Codecov Report

Merging #228 into dev will increase coverage by 0.06%.
The diff coverage is 94.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #228      +/-   ##
==========================================
+ Coverage   89.59%   89.66%   +0.06%     
==========================================
  Files          60       60              
  Lines       10485    10555      +70     
==========================================
+ Hits         9394     9464      +70     
  Misses       1091     1091              
Impacted Files Coverage Δ
src/solvers/dg/1d/dg.jl 92.82% <50.00%> (ø)
src/solvers/dg/2d/dg.jl 93.18% <62.50%> (ø)
src/solvers/dg/2d/containers.jl 92.48% <95.00%> (+2.18%) ⬆️
src/solvers/dg/1d/containers.jl 92.77% <95.08%> (+2.60%) ⬆️
src/callbacks/amr.jl 87.50% <100.00%> (+0.35%) ⬆️
src/callbacks/amr_dg1d.jl 95.04% <100.00%> (+0.25%) ⬆️
src/callbacks/amr_dg2d.jl 95.30% <100.00%> (+0.26%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58ac9bd...6a09170. Read the comment docs.

Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this generally looks good to me, and great work on reducing allocations 💪 I think it would be good to add more commentary (especially such that new users better understand what's going on), but then it can be merged.

src/solvers/dg/1d/containers.jl Show resolved Hide resolved
src/solvers/dg/1d/containers.jl Show resolved Hide resolved
@sloede sloede mentioned this pull request Oct 20, 2020
45 tasks
@ranocha ranocha requested a review from sloede October 24, 2020 06:58
src/solvers/dg/1d/containers.jl Outdated Show resolved Hide resolved
@ranocha ranocha requested a review from sloede October 26, 2020 04:32
@ranocha ranocha merged commit e098af0 into trixi-framework:dev Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants