Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] Read->SplitBlocks to ensure requested read parallelism is always met #36352
[data] Read->SplitBlocks to ensure requested read parallelism is always met #36352
Changes from 26 commits
20c8d2b
4a9bc3e
24a2d55
d37383c
ac457d8
886c226
835a439
1e0b4d0
7400c2e
024b328
b7248af
4a323c3
8c3a358
d746470
705869d
914d976
285279c
8ca9cc9
6708870
d7d01a7
7926ba3
8812481
f26f4f2
c5e09dc
32c2e2f
9171b83
8fc17d9
3ee6717
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like SplitBlocks is a separate op. What about
Read(spit_blocks=N)
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, +1 for
ReadXXX(split_blocks=N)
, otherwiseDataset.__repr__
would become confusing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this--- the original proposal is that SplitBlock is supposed to be a logical operator, since it only applies to the output of the read. It seems more clear therefore using the chaining syntax of
->
instead of making it part of the Read.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I miss any context, why don't we implement
SplitBlock
as a separate logical & physical operator?The current implementation is inside
Datasource
, so it looks like part of Read & InputDataBuffer.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should, but it would get fused with Read anyways. So here we only implement it as part of Read since we have yet to decide whether it should be a general operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g., for
dynamic_repartition()
or such.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, +1 to make it a general operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the extra check if it's a
Read
op?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
fusable
method is part of the Read class only.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can define
fusable
in the base LogicalOperator class. Other op may need it as well in the future.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding, what does this change mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a test rule where ellipsis can match any value.