Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed amount of work for rb_tree.jl #61

Merged
merged 3 commits into from
May 8, 2023
Merged

Fixed amount of work for rb_tree.jl #61

merged 3 commits into from
May 8, 2023

Conversation

d-netto
Copy link
Collaborator

@d-netto d-netto commented Apr 26, 2023

I believe the previous version was imposing a time cutoff for the benchmark.

This PR eliminates the time cutoffs and sets a lower iteration count.

N=10_000_000 was chosen in order to make the benchmark last ~1min on my machine:

┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      84248 │   11706 │     11431 │        276 │         3330 │                 9 │     1010 │         14 │
│  median │      84248 │   11706 │     11431 │        276 │         3330 │                 9 │     1010 │         14 │
│ maximum │      84248 │   11706 │     11431 │        276 │         3330 │                 9 │     1010 │         14 │
│   stdev │        NaN │     NaN │       NaN │        NaN │          NaN │               NaN │      NaN │        NaN │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘

@d-netto d-netto requested a review from vchuravy April 26, 2023 23:12
@d-netto
Copy link
Collaborator Author

d-netto commented Apr 26, 2023

@kpamnany: could you take a look as well? Can't assign you as a reviewer for some reason.

Copy link
Contributor

@kpamnany kpamnany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really needs to run at least 10 minutes to expose the extent of the pause time increase. That's why it's in slow/. Suggest finding an N corresponding to that amount of time.

@d-netto
Copy link
Collaborator Author

d-netto commented Apr 26, 2023

N = 20_000_000 makes the benchmark last a bit more than 10min on my machine.

@vchuravy
Copy link
Member

This really needs to run at least 10 minutes to expose the extent of the pause time increase.

I am not sure there is useful information in there after the first ~60s

elapsed=10.000182151794434s, 2358200 current points, 2358200 total, 235815.0 per second
elapsed=20.000406980514526s, 3896300 current points, 3896300 total, 194811.0 per second
elapsed=30.000055074691772s, 5569000 current points, 5569000 total, 185632.0 per second
elapsed=40.00019717216492s, 7022500 current points, 7022500 total, 175561.0 per second
elapsed=50.00018310546875s, 9029300 current points, 9029300 total, 180585.0 per second
elapsed=64.95979309082031s, 10940500 current points, 10940500 total, 168419.0 per second
...
elapsed=610.0001201629639s, 87594200 current points, 87594200 total, 143597.0 per second
elapsed=620.0000710487366s, 89046400 current points, 89046400 total, 143623.0 per second
elapsed=630.0003559589386s, 90528100 current points, 90528100 total, 143695.0 per second
elapsed=640.0005731582642s, 92004200 current points, 92004200 total, 143756.0 per second
elapsed=650.000018119812s, 93471400 current points, 93471400 total, 143802.0 per second
elapsed=660.0004801750183s, 94934100 current points, 94934100 total, 143839.0 per second
elapsed=670.000039100647s, 96403900 current points, 96403900 total, 143886.0 per second
elapsed=680.0004360675812s, 97875400 current points, 97875400 total, 143934.0 per second
elapsed=690.0000729560852s, 99332400 current points, 99332400 total, 143959.0 per second

For a useful benchmark the absolute length of GC is not that interesting for me.
Here I can see that the points per second processed decrease over time, and with instrumentation I could maybe
see that the reason for that is GC taking longer and longer as a percentage of the time-step. E.g. that GC performs scales worse
with the number of items .

For a good benchmark my rules are:

  1. Is it representative of a particular workload
  2. Is the shape of the heap useful
  3. Is it targeted (some of the benchmarks here seem silly but target specific aspects of the GC subsystem)
  4. Can I use it to improve the GC
  5. Is the workload dependent on heap size

A benchmark that needs 10 minutes to run exceeds the threshold of "Can I use it to improve the GC".

So the question is what is the "goal" of the benchmark.
Is it to be representative (then it doesn't need to run 10minutes)?
Does it show behavior only exhibited by a long running workload?
Does it demonstrate a particular scenario when we reach a certain amount of memory used?

@vchuravy
Copy link
Member

Maybe the answer is that we need two versions of this benchmark (and of the other slow benchmarks)

A version fast enough to be used for tuning the GC and another that is more of the verification of the GC.

@kpamnany
Copy link
Contributor

For a useful benchmark the absolute length of GC is not that interesting for me.

The length of GC is a "feature" of GC behavior. Knowing how big this can get is absolutely an interesting datum. G1, Shenandoah, LXR, etc. all make claims about maximum pause time. What's the maximum pause time for our GC? If we don't have an answer to that question, then you can't make a meaningful comparison with another GC.

@kpamnany
Copy link
Contributor

Also worth noting is that this is not a "torture" test. It is designed to be representative of a real workload. That it takes a long time to create this heap shape is practically irrelevant to the point, which is to show GC behavior for this size and shape of heap.

I understand that it becomes inconvenient when the runtime is long, but the solution is not to discard or break the benchmark but perhaps to find a way to quickly recreate the heap shape at 10 minutes. Which is probably beyond scope for us.

I'm not sure there is much value to the benchmark if it runs for a minute or two -- at that point it isn't too different from the tree benchmarks?

@d-netto
Copy link
Collaborator Author

d-netto commented May 8, 2023

I ran this benchmark on a small aarch64 laptop and also on a large x86_64 server. On both of them, 50M nodes was enough to make the benchmark last for more than 10 minutes and to cause pathological (more than 20 seconds) GC pauses, which should be more than enough to exercise the desired behavior.

If there are no objections, I'll be this merging by tomorrow night.

@vchuravy vchuravy merged commit ff744ce into main May 8, 2023
@vchuravy vchuravy deleted the dcn/rb_tree branch May 8, 2023 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants