Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recording distribution of outputBuffer utilization #13463

Conversation

radek-kondziolka
Copy link
Contributor

@radek-kondziolka radek-kondziolka commented Aug 2, 2022

This pull request adds the exposure of the distribution of outputBuffer's utilization as the outputBuffers.utilization parameter in the operator stats.

Copy link
Member

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that a lot of code is copied between the *OutputBuffer classes, including the new code introduced in this commit. Please take a look if there is a chance to deduplicate some of it.

@radek-kondziolka radek-kondziolka force-pushed the rk/add_buffer_utilization_distribution branch 2 times, most recently from 7b6875c to cd44061 Compare August 4, 2022 07:02
@radek-kondziolka
Copy link
Contributor Author

radek-kondziolka commented Aug 4, 2022

@skrzypo987 , that's true. However, that pull request just adds measuring the output buffer utilization and it just adds the field to store state, the update of the state and the exposition of the state. I think it is not too much and it is very readable.

Obviously, we could add a super class but on my eye it is not a good idea here to separate out the abstract class only for stats. Sometimes it is better to have duplicated code for readability and to diminish complexity, I think so at least.

@radek-kondziolka radek-kondziolka force-pushed the rk/add_buffer_utilization_distribution branch 4 times, most recently from 6ed60bd to d41bf05 Compare August 11, 2022 08:19
Copy link
Member

@skrzypo987 skrzypo987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you manually checked that stats are there?

@radek-kondziolka
Copy link
Contributor Author

radek-kondziolka commented Aug 11, 2022

Have you manually checked that stats are there?

@sopel39 , it was hard to me to catch the difference. I've run query insert into ... select ... with two configurations:

  • sink.max-buffer-size=1024
    "min" : 0.0,
    "max" : 1.0587518122047186,
    "p01" : 0.0,
    "p05" : 0.3941453357877656,
    "p10" : 0.8341365753791842
    "p25" : 1.007034594562806,
    "p50" : 1.021537570494948,
    "p75" : 1.036792696828272,
    "p90" : 1.0482016705239794,
    "p95" : 1.0556658666995862,
    "p99" : 1.0582062227524862,
    "total" : 1595430429266,
  • sink.max-buffer-size=8192MB
    "min" : 0.0,
    "max" : 1.0073409504257143,
    "p01" : 0.0,
    "p05" : 0.05224226690431327,
    "p10" : 0.12749997785610248
    "p25" : 0.3531256313305722,
    "p50" : 0.7062936963112328,
    "p75" : 0.9783176548063535,
    "p90" : 1.0037902268993106,
    "p95" : 1.0052850667382625,
    "p99" : 1.007157468224605,
    "total" : 1653018955249,

We see the difference between smaller and bigger max-buffer-size (for lower percentiles).

Before we merge I need to wait for benchmark results.
However, for smaller sink.max-buffer-size (like 32MB by default) I was observing utilization ~200%. Maybe a such overutilization is possible and even expected in some scenarios?

@radek-kondziolka radek-kondziolka force-pushed the rk/add_buffer_utilization_distribution branch 2 times, most recently from 2ae0830 to d938757 Compare August 11, 2022 13:28
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

@radek-kondziolka radek-kondziolka force-pushed the rk/add_buffer_utilization_distribution branch from d938757 to 5b9bece Compare August 19, 2022 07:49
@radek-kondziolka
Copy link
Contributor Author

@sopel39 ,
all your comments were addressed. I wait for macrobenchmark results.

@radek-kondziolka radek-kondziolka force-pushed the rk/add_buffer_utilization_distribution branch 3 times, most recently from ee6a954 to 1a37705 Compare August 19, 2022 12:47
@radek-kondziolka radek-kondziolka force-pushed the rk/add_buffer_utilization_distribution branch from 1a37705 to 7fbd00c Compare August 23, 2022 07:56
@radek-kondziolka radek-kondziolka force-pushed the rk/add_buffer_utilization_distribution branch from afd266c to 787127d Compare August 31, 2022 12:38
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments.

@sopel39
Copy link
Member

sopel39 commented Aug 31, 2022

CI are failing

@radek-kondziolka radek-kondziolka force-pushed the rk/add_buffer_utilization_distribution branch 2 times, most recently from f1d033d to e35f0ed Compare September 2, 2022 06:59
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments.

  1. How do you extract histogram for query?
  2. Is lock conjestion not a problem (how fast is TDigest.copyOf(bufferUtilization))?

@radek-kondziolka
Copy link
Contributor Author

radek-kondziolka commented Sep 5, 2022

@sopel39 ,

How do you extract histogram for query?

Just by downloading the JSON with stats

Is lock conjestion not a problem (how fast is TDigest.copyOf(bufferUtilization))?

It does not look like (basing on JFR profile)

@radek-kondziolka radek-kondziolka force-pushed the rk/add_buffer_utilization_distribution branch from e35f0ed to 4b20fbe Compare September 5, 2022 09:57
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comment

@radek-kondziolka radek-kondziolka force-pushed the rk/add_buffer_utilization_distribution branch 2 times, most recently from d5e5c56 to d9b5d25 Compare September 6, 2022 12:47
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind compilation errors

@radek-kondziolka radek-kondziolka force-pushed the rk/add_buffer_utilization_distribution branch 2 times, most recently from a413907 to 7d495ef Compare September 7, 2022 09:58
Add outputBufferUtilization field to the OutputBufferInfo to
expose distribution of output buffer utilization.
@radek-kondziolka radek-kondziolka force-pushed the rk/add_buffer_utilization_distribution branch from 7d495ef to c3fe23f Compare September 7, 2022 12:04
@sopel39
Copy link
Member

sopel39 commented Sep 7, 2022

Could you paste example utilization report for query where source stage is/isn't a bottleneck?

@sopel39 sopel39 merged commit df1dde1 into trinodb:master Sep 7, 2022
@github-actions github-actions bot added this to the 395 milestone Sep 7, 2022
@radek-kondziolka
Copy link
Contributor Author

Could you paste example utilization report for query where source stage is/isn't a bottleneck?

@sopel39 , shared offline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants