Does current SortExec consider input ordering. #7330
Replies: 2 comments
-
I do think this is the case I also think a partial sort operator would be useful in other contexts than streaming -- among other things it would use potentially much less memory to sort large datasetss I think we can take a similar approach to the (new) group by streaming and update the ExternalSorter to alternate between accumulating and producing. Maybe we could wrap the ExternalSorter 🤔 |
Beta Was this translation helpful? Give feedback.
-
Thanks @alamb for your response. I also think, updating ExternalSorter would be better option. Because existing plans will benefit from immediately. Otherwise we need to to write a rule to choose between existing |
Beta Was this translation helpful? Give feedback.
-
Consider a use case where required ordering is
(a ASC,b ASC)
, and existing ordering is(a ASC)
.As an example input is like following
expected output is like following
If we were to use information about existing ordering. We could buffer up a values until it changes like below
when 2 is received for the value of
a
. We could then sort subtable according to desired ordering (b ASC), then emit following resultI think this would enable us to use
SortExec
without breaking pipeline for some use cases (for this behaviour we can write a new operator also). Also some of the sort algorithms have friendlier paths, when their input is almost sorted. However, as far as I know currentSortExec
cannot produce results, without consuming all of its input. Is this the case, if so do you think this operator would be useful?Beta Was this translation helpful? Give feedback.
All reactions