-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Fetch Support to CoalesceBatchesExec #9792
Comments
I can do this one |
Cross posting from #9815 as I am not sure about this proposal It seems like adding a limit to
|
We can of course inform the Limit: fetch=5 Assuming the plan above, I think |
What I was suggesting is that
|
Aggregate would need 500 rows to produce 5 rows, but we don't know that until 501st row comes to the Aggregate. So, we cannot limit the |
Yeah, in my implementation process, I was planning to directly pass fetch to StreamTableExec but found out the number should pass via CoalesceBatchesExec. |
I see -- thank you @berkaysynnada #9792 (comment) makes sense Something still feels a little off with limiting in CoalesceBatches as it seems it would always be better to do the fetch below that ExecutionPlan For example, in this plan it seems like it would be best to have the Aggregate stop after 5 rows:
This looks like there is something similar: |
so what should be a better design instead of passing fetch via CoalesceBatchesExec? |
Even better would be if every operator accept fetch, like @alamb suggests for Aggregate. I wonder for the purpose of this ticket, we can also put limit below
|
If every operators accept fetch, I guess there will be no need for LimitExec's at the final plan. It may get plans more complicated. There should be a few operator affected by internal fetching mechanism, and maybe adding them that support could be more straightforward. |
well said: I think this is exactly the tradeoff |
Is your feature request related to a problem or challenge?
The example query in repartition.slt waits until the
target_batch_size
ofCoalesceBatchesExec
fills. That causes a delay in the observation of the query result. We can push-down limit intoCoalesceBatchesExec
here.Describe the solution you'd like
There exists a similar rule in logical planning. We can have a physical optimizer rule that pushes down the limit count until facing with some limit breaker operators (joins, windows, sorts). Once the limit hits a
CoalesceBatchesExec
before that, it can set a new requested batch size.Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: