Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix operators that should be using IS NOT DISTINCT FROM rather than EQUALS #6472

Merged
merged 4 commits into from
Jan 7, 2021

Conversation

erichwang
Copy link
Contributor

We have a few operators that should be using IS NOT DISTINCT FROM rather than EQUALS to provide the proper grouping semantics. This results in queries incorrectly grouping values that are NaN, as well as structural types that contain null elements.

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's wait for others (@dain, @martint ) approval too. Should positionEqualsPosition and other methods be removed as they have strange semantics (nulls are equal, but NaN are not)

@sopel39
Copy link
Member

sopel39 commented Dec 30, 2020

per standard:

 If RANK is specified, then the rank of row R is defined as 1 (one) plus the number of rows that precede R and are not peers of R.
NOTE 26 — This implies that if two or more rows are not distinct with respect to the window ordering, then there will be one or more gaps in the sequential rank numbering.

@findepi
Copy link
Member

findepi commented Dec 30, 2020

Can you add query-based tests verifying correctness end-to-end as well?

@erichwang
Copy link
Contributor Author

erichwang commented Dec 30, 2020

Should positionEqualsPosition and other methods be removed as they have strange semantics (nulls are equal, but NaN are not)

@sopel39, we still need some of these because the equality semantics defined here are needed for equijoins, which use SQL equals, and not the distinct from shape. The nulls themselves are irrelevant for most of those use cases because they are assumed to be filtered out ahead of time for joins. That being said, there are certainly some variants of the equals hash strategy that are no longer needed.

@sopel39
Copy link
Member

sopel39 commented Dec 30, 2020

@sopel39, we still need some of these because the equality semantics defined here are needed for equijoins

Join use io.prestosql.operator.PagesHashStrategy#positionEqualsRowIgnoreNulls (they ignore nulls). It seems that every use of positionEqualsPosition (and other corresponding methods) is suspicious

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sopel39
Copy link
Member

sopel39 commented Dec 30, 2020

Mind automation. There is error prone check failure

@erichwang
Copy link
Contributor Author

Mind automation. There is error prone check failure

Yea saw that, but this check failure doesn't make sense to me. Where is the protobuf coming from? How "error prone" are our error prone checks? =)

Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile (default-compile) on project presto-spi: Resolution of annotationProcessorPath dependencies failed: Unable to get dependency information for com.google.protobuf:protobuf-java:jar:3.4.0: Failed to retrieve POM for com.google.protobuf:protobuf-java:jar:3.4.0: Could not transfer artifact com.google.protobuf:protobuf-java:pom:3.4.0 from/to central (https://repo.maven.apache.org/maven2): /home/runner/.m2/repository/com/google/protobuf/protobuf-java/3.4.0/protobuf-java-3.4.0.pom.part (No such file or directory)

@sopel39
Copy link
Member

sopel39 commented Dec 30, 2020

Failed to retrieve POM for com.google.protobuf:protobuf-java:jar:3.4.0: Could not transfer artifact com.google.protobuf:protobuf-java:pom:3.4.0 from/to central (https://repo.maven.apache.org/maven2): /home/runner/.m2/repository/com/google/protobuf/protobuf-java/3.4.0/protobuf-java-3.4.0.pom.part (No such file or directory)

That is intermittent

@findepi
Copy link
Member

findepi commented Dec 31, 2020

Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile (default-compile) on project presto-spi: Resolution of annotationProcessorPath dependencies failed: Unable to get dependency information for com.google.protobuf:protobuf-java:jar:3.4.0: Failed to retrieve POM for com.google.protobuf:protobuf-java:jar:3.4.0: Could not transfer artifact com.google.protobuf:protobuf-java:pom:3.4.0 from/to central (https://repo.maven.apache.org/maven2): /home/runner/.m2/repository/com/google/protobuf/protobuf-java/3.4.0/protobuf-java-3.4.0.pom.part (No such file or directory)

This looks similar to, but not the same as #6414

Generally, the maven build needs to be retried in general on CI, as communication with Maven Central seems quite common cause of flakes.
It's likely we miss some more retries there.

Would you mind filing an issue?

@erichwang erichwang force-pushed the distinctfix branch 2 times, most recently from 2ac2174 to 1bdc8a0 Compare January 4, 2021 23:21
@erichwang
Copy link
Contributor Author

Can you add query-based tests verifying correctness end-to-end as well?

Already added.

@findepi, any other comments?

@erichwang erichwang mentioned this pull request Jan 4, 2021
@findepi
Copy link
Member

findepi commented Jan 5, 2021

Can you add query-based tests verifying correctness end-to-end as well?

Already added.

@erichwang thanks for adding a test for window. Did you also intend to add one for Join (also being changed here)?

@erichwang
Copy link
Contributor Author

@findepi, this actually doesn't affect SQL joins (even though the logic is stored in something called JoinCompiler). The compiler class is very inaccurately named =), and is used in a lot of places.

@@ -577,11 +577,11 @@ private IntegrityStats verifyHeapIntegrity(long groupId, long heapNodeIndex)
verify(actualPeerGroupCount == peerGroupCount, "Recorded peer group count does not match actual");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit message is too long

@sopel39 sopel39 merged commit d4a909b into trinodb:master Jan 7, 2021
@sopel39 sopel39 mentioned this pull request Jan 7, 2021
10 tasks
@erichwang erichwang deleted the distinctfix branch January 7, 2021 22:12
@martint martint added this to the 352 milestone Jan 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants