Exclude column schema when we fetch Glue partitions based on filter #14206
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
getPartitionNamesByFilter
requires only partition values, including column schema as a part of result will be an overhead. Additional call to get the table information is also avoided. This could improve the planning time for queries having too many columns (1000+).We did a local testing with a glue table having 1000 data columns, 3 partition columns and 1000 partitions -
For a query like this
EXPLAIN SELECT count(*) FROM GLUE_TABLE group by part_column_2 LIMIT 1
- with table_statistics disabled.The overall execution time before this change
7-8s (multiple runs)
The overall execution time after this change.
1.1-1.7s (multiple runs)
Non-technical explanation
Improvement in planning time for glue tables.
Release notes
( ) This is not user-visible and no release notes are required.
(x) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: