Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inlining everything #183

Merged
merged 1 commit into from
Jan 23, 2024
Merged

Inlining everything #183

merged 1 commit into from
Jan 23, 2024

Conversation

charleskawczynski
Copy link
Member

This PR attempts to systematically inline all thermo functions, to see the impact of performance.

Copy link

codecov bot commented Jan 19, 2024

Codecov Report

Attention: 15 lines in your changes are missing coverage. Please review.

Comparison is base (805d172) 92.91% compared to head (ecfb71d) 92.96%.

Files Patch % Lines
src/config_numerical_method.jl 65.00% 7 Missing ⚠️
src/relations.jl 96.57% 6 Missing ⚠️
src/states.jl 94.28% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #183      +/-   ##
==========================================
+ Coverage   92.91%   92.96%   +0.04%     
==========================================
  Files          10       10              
  Lines        1144     1152       +8     
==========================================
+ Hits         1063     1071       +8     
  Misses         81       81              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@charleskawczynski
Copy link
Member Author

charleskawczynski commented Jan 22, 2024

I took a look at the buoyancy kernel, and I'm seeing a speedup on this branch (on the CPU):

Main branch

******************* thermo_kernel
BenchmarkTools.Trial: 236 samples with 1 evaluation.
 Range (min  max):  20.429 ms   28.884 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     20.910 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   21.210 ms ± 876.287 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▄▃█▆▁                                                      
  ▃▄███████▅▄▃▄▅▂▃▄▃▁▂▂▁▂▃▃▄▃▄▂▂▃▂▃▃▃▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ▃
  20.4 ms         Histogram: frequency by time         24.8 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
******************* buoyancy_gradients_kernel
BenchmarkTools.Trial: 27 samples with 1 evaluation.
 Range (min  max):  180.741 ms  196.250 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     183.876 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   185.915 ms ±   4.569 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▃    ▃   ██▃                                                  
  ▇█▁▁▇▁█▁▁▁███▇▇▇▇▁▁▇▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▇▁▁▇▁▇▁▁▁▁▁▁▁▇▁▁▁▁▁▇▁▇ ▁
  181 ms           Histogram: frequency by time          196 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

This branch:

******************* thermo_kernel
BenchmarkTools.Trial: 368 samples with 1 evaluation.
 Range (min  max):  13.501 ms   14.498 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     13.544 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.592 ms ± 143.209 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁██▇▆▅▆▄▃▃▂▁                                                  
  ████████████▇▆▁▁▄▄▁▁▁▆▁▇▄▁▆▁▁▆▄▁▁▁▁▄▄▁▄▁▄▄▁▁▁▄▁▁▁▁▁▁▄▆▆▇▆▄▄▆ ▇
  13.5 ms       Histogram: log(frequency) by time      14.1 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
******************* buoyancy_gradients_kernel
BenchmarkTools.Trial: 36 samples with 1 evaluation.
 Range (min  max):  138.371 ms  142.465 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     139.539 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   139.659 ms ± 890.868 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

               ▂   █▂  ▂                                         
  ▅▁▅▁▅▅█▅▅▁▅▅▁█▅▅▅██▁▁█▁▅▅▁▅▅▁▁▁▁▁▁▁▅▁▅▁▁▁▁▁▁▁▁▅▁▁▁▅▁▁▁▁▁▁▁▁▁▅ ▁
  138 ms           Histogram: frequency by time          142 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

I'm not sure if the input parameters are performing saturation adjustment, but it doesn't matter in the case of the buoyancy gradients kernel, since SA is never called regardless of what the inputs are.

cc @glwagner, @szy21. I'm comfortable moving forward with this.

Interestingly, I didn't realize this until now, it looks like the buoyancy kernel is allocating 🤔

@trontrytel
Copy link
Member

This looks great! So maybe it will speed up the quadratures also...

@charleskawczynski
Copy link
Member Author

This looks great! So maybe it will speed up the quadratures also...

Yeah, I was thinking that, too

@glwagner
Copy link
Member

Good news.

@charleskawczynski
Copy link
Member Author

Also, from the updated benchmark, which I think may reflect the impact better for the CPU (we won't see differences on the gpu where we use always_inline = true):

main

BenchmarkTools.Trial: 1273 samples with 1 evaluation.
 Range (min  max):  3.875 ms   4.541 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     3.881 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.928 ms ± 93.551 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
  █▃▁ ▃▃▁                ▃
  ███▆███▇████▆▆▅▇▆▆▃▅▅▄▇█▇█▅▅▇▆▆▅▅▄▁▁▄▄▁▃▄▃▃▄▅▁▁▁▁▄▁▃▅▃▁▁▁▄ ▇
  3.87 ms      Histogram: log(frequency) by time     4.34 ms <
 Memory estimate: 544 bytes, allocs estimate: 10.
BenchmarkTools.Trial: 1390 samples with 1 evaluation.
 Range (min  max):  3.563 ms   4.238 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     3.569 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.596 ms ± 74.865 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
  █▂  ▄▃▁
  ██▇████▇█▇▇█▇▄▆▆▇▆▅▅▆▇▄▇▅▄▅▃▅▃▃▅▁▁▃▃▁▁▁▃▁▃▁▁▃▁▃▁▁▁▃▁▃▁▁▄▁▄ ▇
  3.56 ms      Histogram: log(frequency) by time     4.03 ms <
 Memory estimate: 0 bytes, allocs estimate: 0.

This branch:

BenchmarkTools.Trial: 1690 samples with 1 evaluation.
 Range (min  max):  2.874 ms   3.264 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     2.965 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.957 ms ± 40.091 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
                    ▂▄          █▁▃
  ▂▂▂▁▂▄▆▃▃▄▄▄▄▅▄▃▆▄██▆▃▃▃▂▃▅▄▂▅███▄▃▃▃▄▄▇█▃▅▆▄▅▃▂▃▂▂▂▁▁▁▁▁▁ ▃
  2.87 ms        Histogram: frequency by time        3.05 ms <
 Memory estimate: 544 bytes, allocs estimate: 10.
BenchmarkTools.Trial: 1817 samples with 1 evaluation.
 Range (min  max):  2.626 ms   3.289 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     2.735 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.750 ms ± 69.871 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%
            ▄  ▃▄█▅▁ ▁▁▅  
  ▁▂▃▂▂▅▄▅▅▇██▇██████████▆▅▅▇▇▅▃▅▆▆▅▃▄▅▄▅▅▃▂▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
  2.63 ms        Histogram: frequency by time        2.94 ms <
 Memory estimate: 0 bytes, allocs estimate: 0.

@charleskawczynski charleskawczynski merged commit 99dce77 into main Jan 23, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Launch Buildkite Add label to launch Buildkite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants