Exclude column schema when we fetch Glue partitions based on filter #14206

Praveen2112 · 2022-09-20T05:59:35Z

Description

getPartitionNamesByFilter requires only partition values, including column schema as a part of result will be an overhead. Additional call to get the table information is also avoided. This could improve the planning time for queries having too many columns (1000+).

We did a local testing with a glue table having 1000 data columns, 3 partition columns and 1000 partitions -

For a query like this EXPLAIN SELECT count(*) FROM GLUE_TABLE group by part_column_2 LIMIT 1 - with table_statistics disabled.

The overall execution time before this change

7-8s (multiple runs)

The overall execution time after this change.

1.1-1.7s (multiple runs)

Non-technical explanation

Improvement in planning time for glue tables.

Release notes

( ) This is not user-visible and no release notes are required.
(x) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Improve planning time for wide glue tables

findepi · 2022-09-20T06:54:15Z

@Praveen2112 see TestIcebergGlueCatalogAccessOperations failure.

findepi · 2022-09-20T06:54:41Z

cc @alexjo2144 @findinpath @homar

Praveen2112 · 2022-09-20T07:30:06Z

I think TestIcebergGlueCatalogAccessOperations will be fixed by this PR - #14207

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/glue/GlueHiveMetastore.java

skrzypo987

lgtm

`getPartitionNamesByFilter` requires only partition values, including column schema as a part of result will be an overhead. Additional call to get the table information is also avoided.

Praveen2112 requested review from electrum, findepi, s2lomon and skrzypo987 September 20, 2022 05:59

cla-bot bot added the cla-signed label Sep 20, 2022

github-actions bot added the tests:hive label Sep 20, 2022

findepi approved these changes Sep 20, 2022

View reviewed changes

homar approved these changes Sep 20, 2022

View reviewed changes

findinpath approved these changes Sep 20, 2022

View reviewed changes

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/glue/GlueHiveMetastore.java Outdated Show resolved Hide resolved

plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/glue/GlueHiveMetastore.java Outdated Show resolved Hide resolved

skrzypo987 approved these changes Sep 20, 2022

View reviewed changes

Exclude column schema when we fetch Glue partitions based on filter

e669529

`getPartitionNamesByFilter` requires only partition values, including column schema as a part of result will be an overhead. Additional call to get the table information is also avoided.

Praveen2112 force-pushed the praveen/minor_glue_improvement branch from 20d8dea to e669529 Compare September 20, 2022 10:56

findinpath approved these changes Sep 20, 2022

View reviewed changes

Praveen2112 merged commit 5e066e2 into master Sep 20, 2022

Praveen2112 deleted the praveen/minor_glue_improvement branch September 20, 2022 15:35

colebow mentioned this pull request Sep 20, 2022

Add Trino 397 release notes #14194

Merged

github-actions bot added this to the 397 milestone Sep 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exclude column schema when we fetch Glue partitions based on filter #14206

Exclude column schema when we fetch Glue partitions based on filter #14206

Praveen2112 commented Sep 20, 2022 •

edited

Loading

findepi commented Sep 20, 2022

findepi commented Sep 20, 2022

Praveen2112 commented Sep 20, 2022

skrzypo987 left a comment

Exclude column schema when we fetch Glue partitions based on filter #14206

Exclude column schema when we fetch Glue partitions based on filter #14206

Conversation

Praveen2112 commented Sep 20, 2022 • edited Loading

Description

Non-technical explanation

Release notes

findepi commented Sep 20, 2022

findepi commented Sep 20, 2022

Praveen2112 commented Sep 20, 2022

skrzypo987 left a comment

Choose a reason for hiding this comment

Praveen2112 commented Sep 20, 2022 •

edited

Loading