Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient scheduling simple and quick to process splits for workers #15146

Closed
Dith3r opened this issue Nov 22, 2022 · 0 comments · Fixed by #15168
Closed

Inefficient scheduling simple and quick to process splits for workers #15146

Dith3r opened this issue Nov 22, 2022 · 0 comments · Fixed by #15168
Assignees

Comments

@Dith3r
Copy link
Member

Dith3r commented Nov 22, 2022

Scheduler assign splits in batches and check state with interval for updates. For simple assignments, worker can process them faster than it receive new ones. The way to mitigate is to increase max-splits-per-node to assign more work for node, but this comes with regression for more advanced queries during benchmarks. The best approach is to detect when node processed all assignments in check interval and adjust how many splits assign to node for a single stage.

The way to reproduce:

select count(orderkey) from sf100.lineitem;

Results:

max-splits-per-node 100: 1140.83 s (current default)
max-splits-per-node 200: 617.16 s
max-splits-per-node 1000: 414.3 s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
1 participant