Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new scheduler supporting multiple table scans within one stage #17265

Merged

Conversation

radek-kondziolka
Copy link
Contributor

@radek-kondziolka radek-kondziolka commented Apr 27, 2023

Description

This PR adds a new scheduler MultiSourcePartitionedScheduler that make it possible to run multiple source partitioned table scans within one stage. Some UNION queries can take advantage from that, for example:

  1. tpcds/q02, [orc, unpart, sf1000], before / after:

    • Duration: 14.39s / 9.35s
    • Cpu Time: 32.45m / 23.36m
    • Internal Network Data: 97.4GB / 50.0MB
  2. tpcds/q05, [orc, unpart, sf1000], before / after:

    • Duration: 25.78s / 13s
    • Cpu Time: 30.77m / 25.70m
    • Internal Network Data: 205GB / 54.5GB

Benchmark results: benchmarks.pdf

Release notes

( ) This is not user-visible or docs only and no release notes are required.
(*) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Apr 27, 2023
@radek-kondziolka radek-kondziolka force-pushed the rk/union_in_source_stage_one_catalog branch 2 times, most recently from c005dfb to 64c5403 Compare April 27, 2023 11:25
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

@radek-kondziolka radek-kondziolka force-pushed the rk/union_in_source_stage_one_catalog branch from 64c5403 to 4193651 Compare April 28, 2023 09:52
@radek-kondziolka
Copy link
Contributor Author

@sopel39 , comments addressed.

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

@radek-kondziolka radek-kondziolka force-pushed the rk/union_in_source_stage_one_catalog branch from 4193651 to f6519e0 Compare May 4, 2023 08:18
The new scheduler MultiSourcePartitionedScheduler was added. This scheduler
make it possible to run multiple source partitioned table scans within one stage.

The planner rule AddExchanges was changed for UNION to avoid not necessary
reshuffling data between nodes - it will place source partitioned table scans
witin one stage where it is possible. This kind of stage is scheduled by
MultiSourcePartitionedScheduler.
@radek-kondziolka radek-kondziolka force-pushed the rk/union_in_source_stage_one_catalog branch from f6519e0 to 9d967ed Compare May 8, 2023 10:28
@radek-kondziolka radek-kondziolka requested a review from sopel39 May 8, 2023 10:28
@radek-kondziolka radek-kondziolka force-pushed the rk/union_in_source_stage_one_catalog branch from 9d967ed to 2b4d69b Compare May 8, 2023 14:10
@radek-kondziolka radek-kondziolka force-pushed the rk/union_in_source_stage_one_catalog branch from 2b4d69b to 1c512d5 Compare May 8, 2023 20:19
@radek-kondziolka
Copy link
Contributor Author

One test is failing but it is annotated as flaky here: #16933

@sopel39 sopel39 merged commit b876508 into trinodb:master May 9, 2023
@sopel39 sopel39 mentioned this pull request May 9, 2023
@sopel39
Copy link
Member

sopel39 commented May 9, 2023

Failed due to #16933

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants