-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Put S3 Select for TEXTFILE behind an experimental flag #18102
Conversation
a61f404
to
8017b1f
Compare
/test-with-secrets sha=8017b1f1e61240a2150029cdfa4ed3e907274876 |
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/5444035845 |
8017b1f
to
38bd9f9
Compare
/test-with-secrets sha=38bd9f9a86b5a214573ff3c3359c902980e470b1 |
The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/5472865441 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize "S3 Select" in commit titles
plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3select/IonSqlQueryBuilder.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3select/TestIonSqlQueryBuilder.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3select/S3SelectRecordCursorProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseTestHiveOnDataLake.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseTestHiveOnDataLake.java
Outdated
Show resolved
Hide resolved
38bd9f9
to
e72bb89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Disable S3 Select pushdown on decimal columns"
this requires blind correctness tests.
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseTestHiveOnDataLake.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Put S3 Select for CSV files behind an experimental flag"
} | ||
|
||
@Config("hive.s3select-pushdown.experimental-pushdown-enabled") | ||
@ConfigDescription("Enable query pushdown to TEXTFILE tables using the AWS S3 Select service. Requires 'hive.s3select-pushdown.enabled' also be set.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omit Requires 'hive.s3select-pushdown.enabled' also be set.
here. It's kind of obvious.
Also, it's possible to enforce using @AssertTrue
annotation (add a test, since recent changes from javax to jakarta broke some validations)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omit Requires 'hive.s3select-pushdown.enabled' also be set. here. It's kind of obvious.
Sure
Also, it's possible to enforce using @AssertTrue annotation (add a test, since recent changes from javax to jakarta broke some validations)
I'd rather not require this, as the main toggle can also be enabled/disabled using a session property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this makes things more complicated, agreed. so in the code we cannot assume one is enabled only IF the other is
however, that doesn't mean we should be accepting catalog configuration that makes no sense. no need to & easy to address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it still makes sense to have the main on/off toggle default to off but have textfile pushdown enabled. For example if you want to test the experimental feature using the session property but want it to stay disabled by default.
...trino-hive/src/test/java/io/trino/plugin/hive/s3select/TestS3SelectRecordCursorProvider.java
Show resolved
Hide resolved
// These two should return a result, but incorrectly return nothing | ||
assertThat(query(withS3SelectPushdown, "SELECT id FROM " + table.getName() + " WHERE string_t ='a,comma'")).returnsEmptyResult(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
verify in the test that results are incorrect: run a query without s3select pushdown and assert actual results, so that the test self-validates the comment "... but incorrectly return ..."
-
We also need a blind correctness test ensuring that when S3Select is enabled (but the experimental pushdown is not), results are correct.
- In other words -- which test would fail if I simply change
HiveConfig#s3SelectExperimentalPushdownEnabled
default value? Which test prevents me from enabling it before correctness problems are fixed?
- In other words -- which test would fail if I simply change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, adding the first bullet exposed a bug in the native textfile reader implementation, submitted a separate issue: #18215
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need a blind correctness test ensuring that when S3Select is enabled (but the experimental pushdown is not), results are correct.
Added a separate test class for this, since the experimental flag is not configurable by a session property.
e72bb89
to
7d4dad8
Compare
Minio tests produced the correct results, however tests against a real S3 bucket did not.
7d4dad8
to
3557abf
Compare
Can you check the test failures? |
3557abf
to
d5aea6b
Compare
Yeah, I forgot to check my RST formatting. Should be good this time. |
/test-with-secrets sha=c65d4ba9834ed5433ba23533f35d1b0eb35cd641 |
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/5622741842 |
S3 Select queries on CSV files are shown to have correctness problems. JSON files can still be enabled/disabled using the existing config and session properties.
c65d4ba
to
eeda594
Compare
Test failure was because I forgot to update one of the tests when I changed the config property name... |
/test-with-secrets sha=eeda5941b6e05f7ec28f4d07adbf7c3b146eca7c |
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/5650768931 |
Failure is unrelated, had to do with: #18227 |
@@ -1799,14 +1800,14 @@ public void testS3SelectPushdown(String tableProperties) | |||
.setCatalogSessionProperty("hive", "insert_existing_partitions_behavior", "APPEND") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that this class grew beyond its initial purpose, this should be the class default and the insert-overwrite tests should pick OVERWRITE
Description
Fixes some correctness issues in JSON pushdown related to quote characters and gate pushdown for TEXTFILE (using CSV in S3 Select) behind a separate HiveConfig property.
Additional context and related issues
Relates to: #17775
Based on: #17563
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: