INSERT/REPLACE can omit clustering when catalog has default #16260

zachjsh · 2024-04-10T20:12:35Z

Description

This PR contains a portion of the changes from the inactive draft PR for integrating the catalog with the Calcite planner #13686 from @paul-rogers, allowing for tables that are defined in the catalog to have any defined clustering columns used in DML INSERT/REPLACE operations without needing to be specified at query time. If the user specified a clustering columns at query time, these columns are preferred to the catalog defined clustering columns.

This PR has:

…ql-type-inference

…mplex-columns

sql/src/test/java/org/apache/druid/sql/calcite/CalciteCatalogIngestionDmlTest.java

…ing-columns

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java

…mplex-columns

…' into use-catalog-clustering-columns

…ing-columns

kgyrtkirk

+1 ; just left some questions :)

kgyrtkirk · 2024-04-24T07:55:18Z

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java

+      final SqlIdentifier colIdent = new SqlIdentifier(
+          Collections.singletonList(keyCol.expr()),
+          null, SqlParserPos.ZERO,
+          Collections.singletonList(SqlParserPos.ZERO)
+      );


I was wondering what will happen in the following case:

say colunmn c is a clusterKey

we are selecting from a join which has column c on both sides

but it seems like the column in the select list will take precedence.

one more thing I was wondering about: do we have a check that all keyCols are present in the selected column list?

About whether there is a check that al keyCols are present in the selected column list, see the following tests:

testInsertTableWithClusteringWithClusteringOnNewColumnFromQuery
testInsertTableWithClusteringWithClusteringOnBadColumn

Do these cover the cases you are talking about?

About the join issue, do you have a concrete query in example, just to clarify?

kgyrtkirk · 2024-04-24T08:12:40Z

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java

+      final IdentifierNamespace insertNs = (IdentifierNamespace) targetNamespace;
+      SqlIdentifier identifier = insertNs.getId();
+      SqlValidatorTable catalogTable = getCatalogReader().getTable(identifier.names);
+      if (catalogTable != null) {


wouldn't the fall-thru from this condtional will cause that the CLUSTER BY on the ingestNode will not be applied (line399 right now); even if its there - is that okay?

if the ingestNode already has the clustering columns, they will be used. There are existing tests which test that the clustering columns are used in the plan returned from dml query, when clustering is defined at query time, and the table is / it not in catalog. Let me know if this covers the issue that think could occur.

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java

…ing-columns

zachjsh added 10 commits March 27, 2024 15:48

* fix

fafcc76

* fix

fe2c407

Merge remote-tracking branch 'apache/master' into fix-complex-types-s…

80151fc

…ql-type-inference

* address review comments

357e6a7

* fix

fd6cb24

* simplify tests

3012773

* fix complex type nullability issue

853ea76

Merge remote-tracking branch 'apache/master' into validate-catalog-co…

7b20b83

…mplex-columns

Merge remote-tracking branch 'apache/master' into validate-catalog-co…

9890d91

…mplex-columns

* implement and add tests

758a414

zachjsh requested a review from kgyrtkirk April 10, 2024 20:12

github-actions bot added the Area - Querying label Apr 10, 2024

zachjsh requested review from clintropolis, abhishekrb19 and jon-wei April 10, 2024 20:12

github-advanced-security bot found potential problems Apr 10, 2024

View reviewed changes

sql/src/test/java/org/apache/druid/sql/calcite/CalciteCatalogIngestionDmlTest.java Fixed Show fixed Hide fixed

sql/src/test/java/org/apache/druid/sql/calcite/CalciteCatalogIngestionDmlTest.java Fixed Show fixed Hide fixed

Merge remote-tracking branch 'apache/master' into use-catalog-cluster…

0401766

…ing-columns

kgyrtkirk reviewed Apr 12, 2024

View reviewed changes

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java Outdated Show resolved Hide resolved

kgyrtkirk reviewed Apr 12, 2024

View reviewed changes

sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java Show resolved Hide resolved

zachjsh added 7 commits April 16, 2024 11:22

Merge remote-tracking branch 'apache/master' into validate-catalog-co…

b042bb6

…mplex-columns

* address review comments

fdf2140

* address test review comments

736a7c8

Merge remote-tracking branch 'origin/validate-catalog-complex-columns…

000e015

…' into use-catalog-clustering-columns

* fix checkstyle

7ad8289

Merge remote-tracking branch 'origin/validate-catalog-complex-columns…

c4bb77d

…' into use-catalog-clustering-columns

* fix dependencies

6181eef

github-actions bot added the Area - Dependencies label Apr 16, 2024

zachjsh added 3 commits April 17, 2024 14:36

* all tests passing

87b4dd2

* cleanup

f9f6b7b

Merge remote-tracking branch 'apache/master' into use-catalog-cluster…

009b684

…ing-columns

* remove unneeded code

7cc749a

zachjsh requested a review from kgyrtkirk April 17, 2024 19:10

* remove unused dependency

03838bf

kgyrtkirk approved these changes Apr 24, 2024

View reviewed changes

zachjsh added 3 commits April 25, 2024 13:06

Merge remote-tracking branch 'apache/master' into use-catalog-cluster…

3b9d78d

…ing-columns

* fix checkstyle

659cac0

Merge remote-tracking branch 'apache/master' into use-catalog-cluster…

a0880bc

…ing-columns

zachjsh merged commit 365cd7e into apache:master Apr 26, 2024
87 checks passed

zachjsh deleted the use-catalog-clustering-columns branch April 26, 2024 14:19

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSERT/REPLACE can omit clustering when catalog has default #16260

INSERT/REPLACE can omit clustering when catalog has default #16260

zachjsh commented Apr 10, 2024 •

edited

Loading

kgyrtkirk left a comment

kgyrtkirk Apr 24, 2024

zachjsh Apr 25, 2024 •

edited

Loading

kgyrtkirk Apr 24, 2024

zachjsh Apr 25, 2024

INSERT/REPLACE can omit clustering when catalog has default #16260

INSERT/REPLACE can omit clustering when catalog has default #16260

Conversation

zachjsh commented Apr 10, 2024 • edited Loading

Description

kgyrtkirk left a comment

Choose a reason for hiding this comment

kgyrtkirk Apr 24, 2024

Choose a reason for hiding this comment

zachjsh Apr 25, 2024 • edited Loading

Choose a reason for hiding this comment

kgyrtkirk Apr 24, 2024

Choose a reason for hiding this comment

zachjsh Apr 25, 2024

Choose a reason for hiding this comment

zachjsh commented Apr 10, 2024 •

edited

Loading

zachjsh Apr 25, 2024 •

edited

Loading