Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More scalable pool allocator #50137

Merged
merged 1 commit into from
Jun 25, 2023
Merged

Conversation

d-netto
Copy link
Member

@d-netto d-netto commented Jun 12, 2023

Still very experimental and depends on #49644. Mostly for later comparisons with #48969 (e.g. if concurrent sweeping would perform better on a potentially less contended pool allocator).

Seems to perform better than master on some multithreaded allocation benchmarks (e.g. tree_mutable.jl), but a lot more benchmarking is needed:

  • master:
../julia-master/julia run_benchmarks.jl multithreaded binary_tree tree_mutable -n5 -t16 --gcthreads=1
bench = "tree_mutable.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       7058 │    4033 │      2414 │       1619 │          350 │              1518 │     1631 │         57 │
│  median │       7535 │    4302 │      2484 │       1772 │          384 │              1820 │     1642 │         58 │
│ maximum │       7805 │    4848 │      2710 │       2369 │          415 │              1868 │     1667 │         62 │
│   stdev │        298 │     324 │       121 │        300 │           24 │               161 │       15 │          2 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
  • PR:
../julia-allocator/julia run_benchmarks.jl multithreaded binary_tree tree_mutable -n5 -t16 --gcthreads=1
bench = "tree_mutable.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       6344 │    3894 │      1967 │       1922 │          356 │              1487 │     1950 │         59 │
│  median │       6486 │    4021 │      2058 │       2029 │          359 │              1645 │     1971 │         62 │
│ maximum │       7225 │    4321 │      2230 │       2232 │          413 │              1854 │     1982 │         63 │
│   stdev │        368 │     189 │       107 │        120 │           24 │               146 │       13 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘

@d-netto d-netto marked this pull request as draft June 12, 2023 00:39
@d-netto d-netto requested review from gbaraldi and vchuravy June 12, 2023 00:39
@d-netto d-netto added the GC Garbage collector label Jun 12, 2023
src/gc.h Outdated Show resolved Hide resolved
@d-netto d-netto force-pushed the dcn/allocator branch 6 times, most recently from b01d688 to 369ced8 Compare June 14, 2023 00:48
@oscardssmith
Copy link
Member

With all the changes since the beginning, are the benchmarks still the same?

@d-netto d-netto force-pushed the dcn/allocator branch 2 times, most recently from 65a2b4d to 2983c05 Compare June 14, 2023 20:41
@d-netto
Copy link
Member Author

d-netto commented Jun 14, 2023

On 2983c05:

  • master:
../julia-master/julia run_benchmarks.jl multithreaded binary_tree -n10 -t32 --gcthreads=1
bench = "tree_immutable.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       4392 │    2295 │      1280 │        868 │          183 │              1884 │      828 │         52 │
│  median │       4580 │    2435 │      1435 │       1058 │          203 │              2183 │      849 │         54 │
│ maximum │       4778 │    2624 │      1502 │       1138 │          212 │              2256 │      893 │         55 │
│   stdev │        128 │      99 │        66 │         82 │           10 │               111 │       20 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
bench = "tree_mutable.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       7178 │    4143 │      2173 │       1939 │          403 │              1734 │     1802 │         58 │
│  median │       7512 │    4377 │      2229 │       2142 │          431 │              1852 │     1872 │         59 │
│ maximum │       7968 │    4903 │      2376 │       2661 │          468 │              2078 │     2007 │         62 │
│   stdev │        316 │     292 │        67 │        265 │           21 │               102 │       64 │          2 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
  • PR:
../julia-allocator/julia run_benchmarks.jl multithreaded binary_tree -n10 -t32 --gcthreads=1
bench = "tree_immutable.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3890 │    2215 │      1059 │       1091 │          198 │              1457 │      947 │         57 │
│  median │       3980 │    2382 │      1176 │       1206 │          227 │              1591 │      968 │         60 │
│ maximum │       4258 │    2615 │      1219 │       1435 │          245 │              2832 │      981 │         61 │
│   stdev │        114 │     113 │        48 │        118 │           14 │               411 │        9 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
bench = "tree_mutable.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       6526 │    4093 │      1983 │       2037 │          412 │              1428 │     2109 │         63 │
│  median │       6919 │    4439 │      2053 │       2387 │          465 │              1607 │     2127 │         64 │
│ maximum │       7981 │    5336 │      2456 │       3020 │          489 │              3584 │     2142 │         67 │
│   stdev │        518 │     402 │       151 │        275 │           25 │               647 │       11 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘

@oscardssmith
Copy link
Member

that looks pretty good!

src/gc.h Outdated Show resolved Hide resolved
@d-netto d-netto marked this pull request as ready for review June 15, 2023 18:46
@d-netto
Copy link
Member Author

d-netto commented Jun 15, 2023

Seeing some regressions on this PR on the single-threaded benchmarks. Marking as draft until further investigated.

@d-netto d-netto marked this pull request as draft June 15, 2023 22:46
@d-netto
Copy link
Member Author

d-netto commented Jun 17, 2023

Benchmarks look fine on the latest commit (the goal of this PR is be performance-neutral on the serial benchmarks and to provide better scalability when there are multiple threads allocating).

Serial benchmarks (master)
../julia-master/julia run_benchmarks.jl serial all -n10
category = "TimeZones"
bench = "TimeZones.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       1369 │     648 │       643 │          5 │          326 │                51 │     5099 │         47 │
│  median │       1384 │     663 │       657 │          5 │          333 │                61 │     5099 │         48 │
│ maximum │       2108 │    1268 │      1261 │         18 │          692 │                87 │     5099 │         60 │
│   stdev │        246 │     212 │       210 │          4 │          141 │                10 │        0 │          5 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "append"
bench = "append.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       1984 │      42 │        23 │         19 │           59 │               314 │     1488 │          2 │
│  median │       1998 │      43 │        23 │         20 │           59 │               373 │     1488 │          2 │
│ maximum │       2098 │      60 │        29 │         32 │           89 │               407 │     1488 │          3 │
│   stdev │         35 │       7 │         2 │          5 │           12 │                30 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "bigint"
bench = "pollard.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       1009 │     211 │       191 │         20 │           59 │               123 │      127 │         21 │
│  median │       1020 │     214 │       194 │         20 │           59 │               137 │      127 │         21 │
│ maximum │       1046 │     234 │       212 │         23 │           87 │               157 │      127 │         22 │
│   stdev │         11 │       7 │         6 │          1 │            9 │                 9 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "linked"
bench = "list.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       4831 │    3193 │      2913 │        277 │         1056 │               224 │     2630 │         65 │
│  median │       4844 │    3209 │      2926 │        286 │         1060 │               240 │     2631 │         66 │
│ maximum │       5489 │    3592 │      3288 │        319 │         1307 │               278 │     2631 │         67 │
│   stdev │        203 │     122 │       115 │         13 │           78 │                18 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
bench = "tree.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       4525 │     588 │       578 │         10 │          391 │                43 │      129 │         13 │
│  median │       4564 │     602 │       589 │         12 │          394 │                73 │      131 │         13 │
│ maximum │       4860 │     649 │       629 │         21 │          421 │                77 │      131 │         13 │
│   stdev │        144 │      24 │        20 │          5 │           11 │                12 │        1 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "obj_arrays"
bench = "many_refs.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2442 │    2052 │      1920 │        131 │          715 │                19 │      869 │         83 │
│  median │       2460 │    2067 │      1931 │        132 │          719 │                26 │      869 │         84 │
│ maximum │       2968 │    2475 │      2317 │        158 │          843 │                36 │      870 │         85 │
│   stdev │        168 │     137 │       129 │          9 │           39 │                 5 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
bench = "single_ref.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │        864 │     534 │       512 │         22 │          173 │                25 │      869 │         62 │
│  median │        875 │     541 │       518 │         22 │          176 │                39 │      869 │         62 │
│ maximum │       1096 │     697 │       670 │         26 │          290 │                44 │      869 │         64 │
│   stdev │         71 │      50 │        49 │          1 │           36 │                 7 │        0 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "strings"
bench = "strings.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      23712 │    3470 │      3093 │        376 │           62 │               890 │      581 │         14 │
│  median │      24002 │    3491 │      3112 │        380 │           74 │              1279 │      581 │         15 │
│ maximum │      24757 │    3705 │      3291 │        421 │           87 │              1481 │      581 │         15 │
│   stdev │        325 │      82 │        67 │         16 │           10 │               177 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘

Serial benchmarks (PR)
../julia-allocator/julia run_benchmarks.jl serial all -n10
category = "TimeZones"
bench = "TimeZones.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       1399 │     650 │       644 │          5 │          324 │                60 │     5099 │         43 │
│  median │       1427 │     672 │       664 │          6 │          333 │                66 │     5099 │         47 │
│ maximum │       2083 │     989 │       985 │         19 │          649 │                90 │     5099 │         55 │
│   stdev │        227 │     124 │       123 │          4 │          113 │                 9 │        0 │          4 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "append"
bench = "append.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       1952 │      44 │        24 │         20 │           60 │               330 │     1488 │          2 │
│  median │       2034 │      45 │        24 │         21 │           60 │               374 │     1488 │          2 │
│ maximum │       2282 │      62 │        33 │         31 │           91 │               458 │     1488 │          3 │
│   stdev │         92 │       7 │         3 │          4 │           14 │                44 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "bigint"
bench = "pollard.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       1010 │     208 │       187 │         21 │           60 │               122 │      127 │         20 │
│  median │       1024 │     211 │       190 │         21 │           60 │               138 │      127 │         21 │
│ maximum │       1101 │     241 │       214 │         26 │           96 │               171 │      127 │         22 │
│   stdev │         27 │      11 │         9 │          2 │           14 │                13 │        0 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "linked"
bench = "list.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       4868 │    3232 │      2954 │        276 │         1075 │               165 │     2630 │         65 │
│  median │       4897 │    3261 │      2976 │        283 │         1077 │               244 │     2631 │         67 │
│ maximum │       5342 │    3462 │      3160 │        365 │         1117 │               294 │     2631 │         67 │
│   stdev │        143 │      71 │        64 │         27 │           12 │                37 │        1 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
bench = "tree.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       4583 │     625 │       615 │          9 │          425 │                53 │      130 │         14 │
│  median │       4625 │     629 │       619 │         10 │          427 │                76 │      131 │         14 │
│ maximum │       4757 │     647 │       634 │         13 │          429 │                86 │      131 │         14 │
│   stdev │         47 │       6 │         5 │          1 │            1 │                11 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "obj_arrays"
bench = "many_refs.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       2455 │    2061 │      1931 │        130 │          720 │                25 │      869 │         84 │
│  median │       2471 │    2074 │      1942 │        131 │          723 │                30 │      869 │         84 │
│ maximum │       2802 │    2355 │      2215 │        146 │          853 │                33 │      869 │         85 │
│   stdev │        125 │     110 │       105 │          5 │           46 │                 2 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
bench = "single_ref.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │        869 │     527 │       505 │         21 │          171 │                26 │      869 │         51 │
│  median │        879 │     537 │       515 │         22 │          175 │                38 │      869 │         61 │
│ maximum │       1140 │     680 │       654 │         26 │          273 │                42 │      869 │         62 │
│   stdev │        108 │      46 │        45 │          2 │           31 │                 6 │        0 │          3 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
category = "strings"
bench = "strings.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      23607 │    3500 │      3109 │        390 │           63 │               963 │      581 │         15 │
│  median │      24122 │    3553 │      3159 │        393 │           76 │              1319 │      581 │         15 │
│ maximum │      24898 │    3705 │      3273 │        432 │           91 │              1453 │      581 │         15 │
│   stdev │        323 │      58 │        46 │         13 │           11 │               169 │        0 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
Multithreaded benchmarks (master)
../julia-master/julia run_benchmarks.jl multithreaded binary_tree -n10 -t32 --gcthreads=1
bench = "tree_immutable.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       4370 │    2455 │      1310 │       1093 │          185 │              1706 │      798 │         55 │
│  median │       4624 │    2565 │      1386 │       1166 │          208 │              1948 │      914 │         57 │
│ maximum │       4854 │    2809 │      1474 │       1409 │          235 │              1998 │     1005 │         58 │
│   stdev │        162 │     127 │        52 │        126 │           15 │                97 │       60 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
bench = "tree_mutable.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       7251 │    4282 │      2150 │       2092 │          400 │              1575 │     1806 │         58 │
│  median │       7776 │    4644 │      2327 │       2313 │          416 │              1844 │     1905 │         60 │
│ maximum │       8286 │    5207 │      2442 │       2824 │          450 │              2543 │     1964 │         63 │
│   stdev │        361 │     296 │       104 │        222 │           15 │               275 │       45 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
Multithreaded benchmarks (PR)
../julia-allocator/julia run_benchmarks.jl multithreaded binary_tree -n10 -t32 --gcthreads=1
bench = "tree_immutable.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       3963 │    2355 │      1089 │       1216 │          200 │              1450 │      912 │         58 │
│  median │       4203 │    2467 │      1193 │       1303 │          222 │              1523 │      954 │         59 │
│ maximum │       4566 │    2779 │      1225 │       1575 │          269 │              7082 │      969 │         63 │
│   stdev │        163 │     129 │        42 │        125 │           20 │              1750 │       18 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
bench = "tree_mutable.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │       6814 │    4322 │      1930 │       2385 │          432 │              1300 │     2077 │         63 │
│  median │       7081 │    4644 │      2021 │       2634 │          466 │              1449 │     2120 │         66 │
│ maximum │       7752 │    5257 │      2082 │       3197 │          487 │              1641 │     2142 │         68 │
│   stdev │        278 │     264 │        54 │        237 │           17 │                85 │       18 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
Machine information
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  128
  On-line CPU(s) list:   0-127
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7502 32-Core Processor
    CPU family:          23
    Model:               49
    Thread(s) per core:  2
    Core(s) per socket:  32
    Socket(s):           2
    Stepping:            0
    Frequency boost:     enabled
    CPU max MHz:         2500.0000
    CPU min MHz:         1500.0000
    BogoMIPS:            4999.53
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonsto
                         p_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse
                         4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase
                          bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdp
                         ru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov
                          succor smca sme sev sev_es
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   2 MiB (64 instances)
  L1i:                   2 MiB (64 instances)
  L2:                    32 MiB (64 instances)
  L3:                    256 MiB (16 instances)
NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-31,64-95
  NUMA node1 CPU(s):     32-63,96-127
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Mitigation; untrained return thunk; SMT enabled with STIBP protection
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

@d-netto d-netto marked this pull request as ready for review June 17, 2023 15:40
@d-netto d-netto changed the title Less contended pool allocator More scalable pool allocator Jun 17, 2023
@d-netto d-netto added this to the 1.10 milestone Jun 17, 2023
@d-netto d-netto added the performance Must go faster label Jun 17, 2023
@d-netto d-netto force-pushed the dcn/allocator branch 2 times, most recently from 4fe05fa to acb4110 Compare June 17, 2023 22:51
@vchuravy
Copy link
Member

@nanosoldier runtests(configuration = (buildflags=["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"],), vs_configuration = (buildflags = ["LLVM_ASSERTIONS=1", "FORCE_ASSERTIONS=1"],))

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected.
A full report can be found here.

@vchuravy vchuravy merged commit 5939e2d into JuliaLang:master Jun 25, 2023
@gbaraldi
Copy link
Member

This made the gcext tests fail, they are allow fail but it might deserve a loock (the tests might just be doing some wrong assumptions)

@vchuravy
Copy link
Member

Yeah it's on my to-do list

@d-netto
Copy link
Member Author

d-netto commented Jun 26, 2023

Also taking a look at this.

src/gc.c Show resolved Hide resolved
src/gc.c Show resolved Hide resolved
src/gc.c Show resolved Hide resolved
src/gc.h Show resolved Hide resolved
src/gc.c Show resolved Hide resolved
d-netto added a commit that referenced this pull request Jun 30, 2023
…ect pool freelist

* addresses follow-up comments from #50137, particularly #50137 (comment)
d-netto added a commit to RelationalAI/julia that referenced this pull request Aug 6, 2023
d-netto added a commit to RelationalAI/julia that referenced this pull request Aug 6, 2023
kpamnany pushed a commit to RelationalAI/julia that referenced this pull request Oct 19, 2023
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Oct 20, 2023
kpamnany pushed a commit to RelationalAI/julia that referenced this pull request Oct 21, 2023
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Oct 23, 2023
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Nov 1, 2023
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Nov 2, 2023
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Nov 7, 2023
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Nov 10, 2023
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Nov 14, 2023
DelveCI pushed a commit to RelationalAI/julia that referenced this pull request Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants