Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SplittingPool: CPU allocations #117

Closed
maleadt opened this issue Oct 1, 2019 · 0 comments
Closed

SplittingPool: CPU allocations #117

maleadt opened this issue Oct 1, 2019 · 0 comments
Labels
cuda array Stuff about CuArray. performance How fast can we go?

Comments

@maleadt
Copy link
Member

maleadt commented Oct 1, 2019

Looks like the new pool performs quite some CPU allocations (even though I did go through and made the code type stable):

[ Info: Epoch 2
 95.020749 seconds (223.41 M CPU allocations: 11.091 GiB, 7.04% gc time) (1.56 M GPU allocations: 2.084 TiB, 23.15% gc time of which 0.88% spent allocating)
 ────────────────────────────────────────────────────────────────────────────
                                     Time                   Allocations      
                             ──────────────────────   ───────────────────────
      Tot / % measured:           95.0s / 22.7%           11.1GiB / 24.7%    

 Section             ncalls     time   %tot     avg     alloc   %tot      avg
 ────────────────────────────────────────────────────────────────────────────
 pooled alloc         1.56M    21.6s   100%  13.8μs   2.74GiB  100%   1.83KiB
   1.1a repopulate        1   2.44ms  0.01%  2.44ms   2.08MiB  0.07%  2.08MiB
   1.1b compact           1   7.71ms  0.04%  7.71ms   1.63MiB  0.06%  1.63MiB
   1.2 scan           1.56M    1.46s  6.76%   933ns    408MiB  14.6%     273B
   1.3 alloc            256    347ms  1.61%  1.36ms   12.8KiB  0.00%    51.3B
   1.4a reclaim         747    120ms  0.56%   160μs   59.9KiB  0.00%    82.1B
   1.4b alloc           747    842ms  3.90%  1.13ms   23.4KiB  0.00%    32.0B
   1.5a reclaim         249    104ms  0.48%   418μs   58.4KiB  0.00%     240B
   1.5b alloc           249    271ms  1.25%  1.09ms   7.78KiB  0.00%    32.0B
   2.0 gc(false)        249    10.7s  49.6%  43.0ms    481MiB  17.2%  1.93MiB
     pooled free      1.56M    586ms  2.71%   376ns     0.00B  0.00%    0.00B
   2.1a repopulate      249    544ms  2.52%  2.19ms    568MiB  20.3%  2.28MiB
   2.1b compact         249    1.96s  9.06%  7.86ms    437MiB  15.6%  1.75MiB
   2.2 scan             249    693μs  0.00%  2.78μs    193KiB  0.01%     796B
 pooled free          5.70k   2.58ms  0.01%   452ns     0.00B  0.00%    0.00B
 ────────────────────────────────────────────────────────────────────────────
 ──────────────────────────────────────────────────────────────────
                           Time                   Allocations      
                   ──────────────────────   ───────────────────────
 Tot / % measured:      95.0s / 1.68%           11.1GiB / 0.21%    

 Section   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────
 alloc       391k    1.53s  95.6%  3.91μs   23.8MiB  100%     63.9B
 free        390k   70.5ms  4.40%   181ns      192B  0.00%    0.00B
 ──────────────────────────────────────────────────────────────────

vs

[ Info: Epoch 2
 89.118641 seconds (177.10 M CPU allocations: 8.993 GiB, 6.44% gc time) (1.56 M GPU allocations: 2.084 TiB, 13.51% gc time of which 1.48% spent allocating)
 ──────────────────────────────────────────────────────────────────────────
                                   Time                   Allocations      
                           ──────────────────────   ───────────────────────
     Tot / % measured:          89.1s / 13.2%           8.99GiB / 7.45%    

 Section           ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────────────
 background task        1   20.3ms  0.17%  20.3ms    226KiB  0.03%   226KiB
   pooled free        626    272μs  0.00%   435ns   29.3KiB  0.00%    48.0B
   reclaim              1   1.22μs  0.00%  1.22μs     0.00B  0.00%    0.00B
   scan                 1    717ns  0.00%   717ns      336B  0.00%     336B
 pooled alloc       1.56M    11.8s   100%  7.53μs    686MiB  100%      460B
   1. try alloc       294    447ms  3.79%  1.52ms   13.8KiB  0.00%    48.0B
   2. gc(false)       294    9.62s  81.6%  32.7ms    555MiB  80.9%  1.89MiB
     pooled free    1.56M    527ms  4.47%   337ns   71.6MiB  10.4%    48.0B
 ──────────────────────────────────────────────────────────────────────────
 ──────────────────────────────────────────────────────────────────
                           Time                   Allocations      
                   ──────────────────────   ───────────────────────
 Tot / % measured:      89.1s / 0.66%           8.99GiB / 0.26%    

 Section   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────
 alloc       390k    509ms  86.0%  1.30μs   23.8MiB  100%     64.0B
 free        390k   82.8ms  14.0%   212ns     0.00B  0.00%    0.00B
 ──────────────────────────────────────────────────────────────────
@maleadt maleadt transferred this issue from JuliaGPU/CuArrays.jl May 27, 2020
@maleadt maleadt added cuda array Stuff about CuArray. performance How fast can we go? labels May 27, 2020
@maleadt maleadt closed this as completed Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda array Stuff about CuArray. performance How fast can we go?
Projects
None yet
Development

No branches or pull requests

1 participant