-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabled hive splits for uncompressed CSV files with S3 Select pushdown #13754
Conversation
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3select/TestS3SelectPushdown.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3select/TestS3SelectPushdown.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3select/TestS3SelectPushdown.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3select/TestS3SelectPushdown.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3select/TestS3SelectPushdown.java
Show resolved
Hide resolved
...rc/test/java/io/trino/plugin/hive/s3select/TestHiveFileSystemS3SelectPushdownWithSplits.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3select/S3SelectPushdown.java
Show resolved
Hide resolved
...rc/test/java/io/trino/plugin/hive/s3select/TestHiveFileSystemS3SelectPushdownWithSplits.java
Outdated
Show resolved
Hide resolved
...rc/test/java/io/trino/plugin/hive/s3select/TestHiveFileSystemS3SelectPushdownWithSplits.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % comments
...rino-hive-hadoop2/src/test/java/io/trino/plugin/hive/s3select/S3SelectDefaultTestConfig.java
Outdated
Show resolved
Hide resolved
...rino-hive-hadoop2/src/test/java/io/trino/plugin/hive/s3select/S3SelectDefaultTestConfig.java
Outdated
Show resolved
Hide resolved
...rino-hive-hadoop2/src/test/java/io/trino/plugin/hive/s3select/S3SelectDefaultTestConfig.java
Outdated
Show resolved
Hide resolved
...rc/test/java/io/trino/plugin/hive/s3select/TestHiveFileSystemS3SelectPushdownWithSplits.java
Outdated
Show resolved
Hide resolved
nit: Please keep the number of chars per line in the commit detail less than 80 (as described in https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#format-git-commit-messages) |
Totally missed that, thanks a lot for flagging this, updated! |
plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3select/S3SelectPushdown.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/BackgroundHiveSplitLoader.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/BackgroundHiveSplitLoader.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/BackgroundHiveSplitLoader.java
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/BackgroundHiveSplitLoader.java
Show resolved
Hide resolved
plugin/trino-hive-hadoop2/src/test/java/io/trino/plugin/hive/s3select/S3SelectTestHelper.java
Show resolved
Hide resolved
...rc/test/java/io/trino/plugin/hive/s3select/TestHiveFileSystemS3SelectPushdownWithSplits.java
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/HiveFileSystemTestUtils.java
Outdated
Show resolved
Hide resolved
...rc/test/java/io/trino/plugin/hive/s3select/TestHiveFileSystemS3SelectPushdownWithSplits.java
Show resolved
Hide resolved
Scan range allows S3 Select to query uncompressed files at a finer granularity than the entire object, by providing a byte range to SelectObjectContent requests. This change enables hive internal splits for S3 Select by sending scan range requests for uncompressed CSV files.
Description
Scan range allows S3 Select to query uncompressed files at a finer granularity than the entire object, by providing a byte range to SelectObjectContent requests. This change enables hive internal splits for S3 Select by sending scan range requests for uncompressed CSV files.
This PR is a performance optimization for Hive S3 Select connector with uncompressed CSV input, leveraging the scan range feature of the service. JSON support will be added in a separate PR.
File splitting is configurable on the client side through the already existing session properties, such as:
Hive S3 Select connector
Trino client will return results faster when S3 Select pushdown is enabled for uncompressed CSV files:
set SESSION hive.s3_select_pushdown_enabled=true;
Related issues, pull requests, and links
Accidentally closed previous PR: #13417 with a wrong fork sync.
Documentation
( ) No documentation is needed.
(x) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: