-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: issue #8838 discard extra sort when sorted element is wrapped #9127
Conversation
Thanks @Lordworms for this PR. I have examined this PR and it is really neat. However,
where table satisfies ordering
where ordering Currently, we do not have |
@suremarc as the requester of this feature, do you have some time to review this PR? |
) | ||
LOCATION '../../testing/data/csv/aggregate_test_100.csv'; | ||
|
||
# test for substitute CAST senario |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also please add a test for times when casting may not preserve ordering?
For exmaple if the input is INT
0
1
2
10
If that is cast to UTF8
the data is now
"0"
"1"
"2"
"10"
Which is no longer sorted correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add that (maybe on the weekend)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added now
I got it, so in this case I just need to focus on CAST to bigger data type and ignore the ScalarFunction? |
Exactly |
fix: issue apache#8838 discard extra sort when sorted element is wrapped fix: issue apache#8838 discard extra sort when sorted element is wrapped
b9499fa
to
9b912a7
Compare
Thanks @Lordworms for this PR. There are some small problems with this PR. See my comments below. I have filed another PR on top of the commits on this PR. Maybe @Lordworms can bring the changes in the PR to this PR to address my comments above. However, I think we can merge this PR as is. then continue discussion in the PR. I do not think, anything is blocking for this PR to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
c_customer_sk DESC, | ||
c_current_cdemo_sk DESC | ||
) | ||
LOCATION '../../testing/data/csv/aggregate_test_100.csv'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema of the file, and this table is not consistent. Additionally file doesn't satisfy the WITH_ORDER
invariant given during creation.
SELECT | ||
CAST(c_customer_sk AS BIGINT) AS c_customer_sk_big, | ||
c_current_cdemo_sk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the select expressions were c_customer_sk, CAST(c_customer_sk AS BIGINT) AS c_customer_sk_big, c_current_cdemo_sk
planner wouldn't remove the ORDER BY c_customer_sk_big DESC, c_current_cdemo_sk DESC
sort from the plan. However, it should be possible. Because, in current implementation, substitution cannot generate more than 1 valid ordering from a single ordering.
Thanks @Lordworms for this PR. Merging this PR. We can continue discussion in the PR which build on the commits of this PR. |
fix: issue #8838 discard extra sort when sorted element is wrapped
Which issue does this PR close?
improves the state in #8838
Closes #.
Rationale for this change
In the previous design, when we wanted to construct a dependency_map, we just used the order gotten from the previous node(In this issue, when we construct ProjectionExec, we just use the ordering gotten from CsvExec), however, previous design does not support a wrapper for the existing column, In this case, when we have orderings is [a DESC, b DESC] and the projection is CAST(a as BITINT), then we should change the ordering from a to CAST(a as BITINT) in order not to generate false dependency map(In this issue, when we construct the dependency map using the original expression, we would break prematurely and lose dependencies). In order to address this, we need to substitute original expression with those monotonic new expession
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?