-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase node scheduler config parameters #15579
Conversation
@pettyjamesm this PR increases unacknowledged split size to 2000, I think it's fine, wdyt? |
I'm a little skeptical that raising the value to 2,000 is necessary or actually safe to pick up as the default based on the results of running These configuration properties are probably rarely configured away from their defaults in most deployments so we want to be a little cautious about picking overly-aggressive defaults. In particular, increasing I have a couple recommendations about how you might want to proceed:
|
What was the error that you've been actually observing? I'm particularly interested why 500 was chosen by default for
It's not really about TPCH/TPCDS but rather about workloads with:
|
Depends on how large the payloads get. Potentially you could trigger request timeouts because of the amount of time the worker spends parsing the task update payload JSON, but before that point you'll see high allocation and GC activity on the coordinator. In presto they added a hard limit of 16MB on the task update body and fail the task immediately to avoid some of the issues they saw at the time.
The value was chosen before the existence of the |
Quick local test:
old: 13.8509 ±0.7932 |
Test:
old: 10.0148 ±0.6281 |
* **Type:** :ref:`prop-type-integer` | ||
* **Default value:** ``2000`` | ||
|
||
Maximum number of splits that are either queued on the coordinator, but not yet sent or confirmed to have been received by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does increasing this value have any significant implications for memory usage on the coordinator ?
I assume this change implies that every table scan can now queue up 4X more splits on the coordinator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is limited by max-splits-per-node
which is still set as default to 100. A higher number is currently used when we adjust queue size if splits are processed faster than assigned. Adjustment is triggered only if the node processed its queue between call to computeAssignment
otherwise it does not impact query scheduling.
f472f97
to
8b0d84e
Compare
The concerns that @pettyjamesm mentioned can be mitigated by #15721. |
@pettyjamesm Adaptive request size was merged, do you think that increasing configuration values are OK now? |
I'm ok with increasing the value after the adaptive task update request PR, so now I suppose the question is whether 2,000 is appropriate compared to say 1,000. Is there still a noticeable difference between 1,000 and 2,000 for the test queries in your environment? How about 1,500? If so, then sure- 2,000 works for me. If not, then I might suggest erring on the side of a more conservative increase since there will still be some GC overhead on the coordinator for those unacknowledged splits, and there's some risk that this could increase processing time skew between workers when some splits are significantly more expensive than others. |
There are a few configuration options tested like max-splits-per-node, max-adjusted-pending-splits-per-task, max-unacknowledged-splits-per-task tested with range from 1000 up to 4000 with step of 500 (and others like min-pending-splits-per-task with different range) with set of simple queries. Configurations which presented best outcome were tested with tpch, tpds and concurrency test suites. Values presented in this PR presented as better overall setup.
Still there is |
Description
Change defaults of
max-unacknowledged-splits-per-task
andmax-adjusted-pending-splits-per-task
to 2000 for better overall performance.SELECT count(*) from hive.test.lineitem_parquet_64_group
old: 6,9304 s
new: 5,8872 s
SELECT count(orderkey) from hive.test.lineitem_parquet_64_group
old: 9,0065 s
new: 8,0764 s
SELECT count(orderkey),count(suppkey) from hive.test.lineitem_parquet_64_group
old: 22,1207 s
new: 21,2324 s
Concurrency benchmark (queries per hours):
new: 7698
old: 7269
Additional context and related issues
Release notes
(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: