-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix correctness issues in S3 Select pushdown #17563
Conversation
b91bdd1
to
7101e0d
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3select/IonSqlQueryBuilder.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3select/IonSqlQueryBuilder.java
Outdated
Show resolved
Hide resolved
@findepi mind starting a run with secrets? |
Never mind, let me fix the IonSqlQueryBuilder tests first |
plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3select/IonSqlQueryBuilder.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3select/IonSqlQueryBuilder.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3select/IonSqlQueryBuilder.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3select/IonSqlQueryBuilder.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3select/csv/S3SelectCsvRecordReader.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseTestHiveOnDataLake.java
Outdated
Show resolved
Hide resolved
69163b9
to
f39b46e
Compare
Updated with more Minio tests and some refactoring that makes the queries pushed into S3 a little cleaner. There are still some failures when I run that test against real s3 manually though:
I think these changes are an improvement but there are still bugs to squash |
We can run these tests agains MinIO for developers convenience, but will CI run them against real S3? |
I'd like to do both Minio and S3 tests, but honestly I've been finding enough wrong with the feature I'm tempted to say we should just take it out. |
Agreed. It's hard to take things, so lets rename to "experimental" the config toiggles and session properties |
f39b46e
to
fa43ed3
Compare
@findepi can I get a run with secrets when you get a chance? |
/test-with-secrets sha=fa43ed349ed9918bb0341250d97246e27d549043 |
fa43ed3
to
56a0f20
Compare
56a0f20
to
9235a10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another test file with the same naming: https://github.com/trinodb/trino/blob/master/plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3select/TestS3SelectPushdown.java
Would you please rename this to make sure we don't exclude the tests above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks!
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3/TestS3SelectPushdown.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3select/IonSqlQueryBuilder.java
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseTestHiveOnDataLake.java
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3select/TestIonSqlQueryBuilder.java
Show resolved
Hide resolved
@dnanuti I think the main issue around some of these edge cases with whitespace or null encoding is that the s3 select CSV encoding is mapped in Trino to Handling for CSVs with whitespace or padding could be an issue, but before these changes data doesn't round-trip properly, even when written by Trino and read back by Trino. This is a much bigger problem. |
assertS3SelectQuery("SELECT id FROM " + table.getName() + " WHERE string_t IS NOT NULL", "VALUES 1, 2, 4"); | ||
|
||
// TODO: Pushdown with equality predicates on decimal types produces incorrect results. https://github.com/trinodb/trino/issues/17775 | ||
// assertS3SelectQuery("SELECT id FROM " + table.getName() + " WHERE decimal_t = 2.2", "VALUES 2"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dnanuti for this query, against a JSON table I've tried:
SELECT s.id, s.decimal_t FROM S3Object s WHERE s.decimal_t IS NOT NULL AND s.decimal_t = 2.20000
or
SELECT s.id, s.decimal_t FROM S3Object s WHERE s.decimal_t IS NOT NULL AND CAST(s.decimal_t AS DECIMAL(10, 5)) = 2.20000
But I get no rows back from either. Here's the table files:
json_table.tar.gz
I'd expect to get back the value 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll check tomorrow!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get the chance to look into this today, as there are other priorities right now. I'll open an issue in our backlog for further investigation. Thanks for calling this out! 👍
There are some Docker tests define that were running as part of the CI after re-enabling Select pushdown, where you can find tables creation statements that are not defining all the separators you mentioned. Tables creation here: Please enhance this if possible with the scenarios that were not covered before. I'll double check about encoding tomorrow, but sounds reasonable. |
bebf596
to
cad4241
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3select/IonSqlQueryBuilder.java
Outdated
Show resolved
Hide resolved
...n/trino-hive/src/test/java/io/trino/plugin/hive/metastore/file/TestingFileHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3/TestS3SelectQueries.java
Outdated
Show resolved
Hide resolved
Thanks @findepi , all set |
f8a58b8
to
5ca6c5b
Compare
Rebased for conflicts. |
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3/TestTrinoS3FileSystemAwsS3.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"fixup! Fix correctness issues in S3 Select pushdown" lgtm
/test-with-secrets sha=5ca6c5b9fdc468f3e180138f3f1fb216256d759b |
i expect the above to fail per #17563 (comment) |
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/5268695780 |
5ca6c5b
to
8f135cd
Compare
are |
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3/TestS3SelectQueries.java
Outdated
Show resolved
Hide resolved
The IonSqlQueryBuilder would produce select queries which were not proper transormations of the given TupleDomain, leading to incorrect results when S3 Select was enabled. When reading JSON files predicates like `x IS NULL` or `x IS NOT NULL` were evaluated as `x = ''` or `x <> ''`. When reading TextFile data the query builder ignores the table's `null_format` field, instead assuming that null fields are encoded as the empty string.# Please enter the commit message for your changes. Lines starting
8f135cd
to
d41a568
Compare
Description
The IonSqlQueryBuilder would produce select queries which were not proper transormations of the given TupleDomain, leading to incorrect results when S3 Select was enabled.
When reading JSON files predicates like
x IS NULL
orx IS NOT NULL
were evaluated asx = ''
orx <> ''
.When reading TextFile data the query builder ignores the table's
null_format
field, instead assuming that null fields are encoded as the empty string.Additional context and related issues
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: