Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DATAS tuning changes for small HCs #100390

Merged
merged 1 commit into from
Apr 3, 2024
Merged

Conversation

Maoni0
Copy link
Member

@Maoni0 Maoni0 commented Mar 28, 2024

changes included -

  • currently we have a very simplistic formula for actually adapting to the size and this basically just makes all the asp.net benchmarks with low surv rate adjust to the min 2.5 mb gen0 budget, while those run ok with such a small budget on a 28 core machine, it doesn't work if we limit the heap count to a small number, eg, 4. what happens is the % time in GC is very high, some benchmarks run with 20% to 40% time in GC. this is obviously not desirable. I reworked this to make it actually adapting to the size. here's a chart for different gen2 sizes -
image

and we'll take the min of this and what we calculated without DATAS.

  • the formula I had previously did not handle small HCs well so I also adjust that.

  • got rid of the adjusting to cache size in gc1 for DATAS, this just makes things unpredictable especially for small workloads.

I will add data for the final build soon. there's more work to do that I will address in future PRs -

  • do some refactoring as I've been adding a few tuning things that are of the same nature so it'd be nice to have a utility class to do that;
  • make setting the min/max gen0 budget for DATAS part of the static data calculation.
  • the gen0 budget computation should take conserve_mem_setting into consideration.
  • need to get rid of more adjustments we do for gen0 budget that make things volatile.
  • we do have this time tuning thing for WKS GC but not for SVR, we should bring that to DATAS to make sure we are not going for a very long time without doing a higher gen GC (especially a gen2 GC)).
  • there are also a couple of things related to small HC tuning that I need to fix but I think this PR is at a good place.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

@Maoni0
Copy link
Member Author

Maoni0 commented Apr 2, 2024

this obviously increases the memory footprint but increases RPS. this is a run with max HC set to 4 -

bench b_max_mem max_mem_% b_p50_mem p50_mem_% b_rps rps_%
Stage1GrpcServerGC 70.75 73.50 59.00 48.73 518.28 79.17
Stage1Grpc 69.00 60.51 59.50 48.32 524.22 77.56
FortunesPlatformEF 87.00 84.48 76.75 37.46 191.91 59.39
FortunesEf 58.00 93.97 55.50 40.99 137.16 40.50
FortunesDapper 47.50 52.63 43.50 50.00 139.56 38.75
PlaintextMvc 43.50 9.77 41.00 11.59 1671.11 25.77
Stage1TrimR2RSingleFile 29.50 51.69 28.25 49.56 599.50 25.19
Stage1 34.25 50.36 31.75 32.28 601.09 25.08
CachingPlatform 45.75 54.64 44.00 30.68 342.20 19.99
Stage2 62.00 38.31 59.00 24.58 168.18 18.03
ConnectionCloseHttpSys 25.25 76.24 23.50 48.94 81.11 15.66
SingleQueryPlatform 86.25 33.62 58.00 36.21 375.61 15.12
JsonMvc 44.25 32.20 42.75 25.73 457.95 12.69
PlaintextWithParametersNoFilter 44.00 21.59 43.00 17.44 2910.62 10.86
UpdatesPlatform 69.50 100.00 59.50 100.00 25.29 10.76
MultipleQueriesPlatform 63.75 100.78 61.25 93.88 40.96 7.65
PlaintextNoParametersEmptyFilter 35.25 28.37 33.75 15.56 3806.39 6.46
JsonHttps 51.25 43.41 47.00 15.43 653.70 -1.73

for example -

run benchmark gen0 pause gen1 pause ngc2 pause bgc pause allocMB alloc/gc pct in GC
baseline_0 CachingPlatform 3282 1.7 8 3.07 1 5.49 2 1.87 9371.31 2.85 14.83
baseline_1 CachingPlatform 3539 1.9 5 3.73 2 4.6 1 1.61 8838.93 2.49 18.13
baseline_2 CachingPlatform 3572 1.84 6 3.58 1 5.51 2 1.82 8943.07 2.5 17.7
baseline_3 CachingPlatform 3629 1.76 6 3.49 1 5.44 2 1.89 9083.47 2.5 15.94
fix_0 CachingPlatform 1348 1.29 5 4.28 1 5.6 2 1.32 10911.95 8.05 4.66
fix_1 CachingPlatform 1283 1.23 5 4.21 1 5.83 2 1.39 10984.42 8.51 4.3
fix_2 CachingPlatform 1346 1.27 6 3.93 1 5.43 2 1.4 10842.49 8 4.38
fix_3 CachingPlatform 1397 1.22 6 3.92 1 5.34 2 1.93 10797 7.68 4.55
run benchmark gen0 pause gen1 pause ngc2 pause bgc pause allocMB alloc/gc pct in GC
baseline_0 ConnectionCloseHttpSys 2463 1.55 1228 1.15 1 3.82 0 0 9289.57 2.52 17.33
baseline_1 ConnectionCloseHttpSys 2471 1.51 1230 1.11 1 4.13 0 0 9316.42 2.52 16.9
baseline_2 ConnectionCloseHttpSys 2481 1.52 1238 1.09 1 4.33 0 0 9364.03 2.52 15.92
baseline_3 ConnectionCloseHttpSys 2459 1.55 1228 1.15 1 4.43 0 0 9281.21 2.52 17.29
fix_0 ConnectionCloseHttpSys 580 1.6 309 1.21 1 3.08 0 0 10759.68 12.09 4.31
fix_1 ConnectionCloseHttpSys 557 1.58 297 1.21 0 0 1 0.49 10813.69 12.65 4.11
fix_2 ConnectionCloseHttpSys 562 1.59 287 1.2 0 0 1 0.45 10814.27 12.72 4.1
fix_3 ConnectionCloseHttpSys 556 1.64 286 1.21 0 0 1 0.68 10810.13 12.82 4.17

comparing with default SVR GC with max HC set to 4 -

image

without setting a small max HC, this does increase memory footprint a bit while the rps is mostly the same (+/- 5% from baseline). still much lower than default SVR GC -

image

I adjusted the formula for determining a new HC and change how we calculate the gen0 budget based on gen2 size.
@Maoni0 Maoni0 force-pushed the small_hc_checkin branch from e0f7ca2 to e760bf0 Compare April 2, 2024 06:29
@Maoni0 Maoni0 merged commit 89bd910 into dotnet:main Apr 3, 2024
92 checks passed
matouskozak pushed a commit to matouskozak/runtime that referenced this pull request Apr 30, 2024
I adjusted the formula for determining a new HC and change how we calculate the gen0 budget based on gen2 size.

changes included -

+ currently we have a very simplistic formula for actually adapting to the size and this basically just makes all the asp.net benchmarks with low surv rate adjust to the min 2.5 mb gen0 budget, while those run ok with such a small budget on a 28 core machine, it doesn't work if we limit the heap count to a small number, eg, 4. what happens is the % time in GC is very high, some benchmarks run with 20% to 40% time in GC. this is obviously not desirable. I reworked this to make it actually adapting to the size. and we'll take the min of this and what we calculated without DATAS.

+ the formula I had previously did not handle small HCs well so I also adjust that.

+ got rid of the adjusting to cache size in gc1 for DATAS, this just makes things unpredictable especially for small workloads.
@github-actions github-actions bot locked and limited conversation to collaborators May 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants