Add a limit to the number of columns in the CLUSTERED BY clause #13352

LakshSingla · 2022-11-11T06:24:28Z

If there is a huge number of columns passed to the clustered by clause while ingesting via MSQ, then the Worker tasks can OOM. (With sequential merge in place, controller tasks shouldn't OOM).
This PR adds a limit to the number of clustered by columns that can be passed in a query and throws a fault in case they are exceeded.

Release note

There is a limit to the number of columns that can be passed in the CLUSTERED BY clause while ingesting via MSQ.

This PR has:

adarshsanjeev

Thanks for the PR! Had one comment.

adarshsanjeev · 2022-11-13T11:41:36Z

extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/QueryValidator.java

@@ -55,6 +57,15 @@ public static void validateQueryDef(final QueryDefinition queryDef)
        throw new ISE("Number of workers must be greater than 0");
      }
    }
+
+    // Check if the number of columns in the query's CLUSTERED BY clause donot exceed the limit
+    ClusterBy queryClusteredBy = queryDef.getFinalStageDefinition().getClusterBy();


Does only the final stage lead to an OOM? Wouldn't it be possible for more cluster by columns to be present in earlier stages than the final one?

The cluster by columns in the earlier stages might not have a 1:1 correspondence with the query that the user has written therefore raising a cluster by error, in that case, shouldn't be actionable for the user IMO. Hence I only added the limit in the final stage (the original query that the user has written). Along with the Sequential merge mode on, I think that there should be enough guard rails in place to prevent an OOM.

However we can add a limit on the cluster by in the other stages if we rephrase the error message as something like "Enough grouping keys present in stage [xx], the query might OOM". Those cluster by keys can correspond to something present in the group by clause for example. WDYT?

Looking at the TooManyColumnsFault, I think that we can also go ahead with the second proposition since that is also imposed at a per-stage level, which might not correspond to the final result that the user expects. (The wording might need to change though).

cryptoe · 2022-11-13T20:13:29Z

...-query/src/main/java/org/apache/druid/msq/indexing/error/TooManyClusteredByColumnsFault.java

+import java.util.Objects;
+
+@JsonTypeName(TooManyClusteredByColumnsFault.CODE)
+public class TooManyClusteredByColumnsFault extends BaseMSQFault


Let's document this fault as well.

Thanks for pointing it out, updated!

adarshsanjeev

LGTM after resolving merge conflict!

cryptoe

LGTM. Will merge post the conflicts are resolved.
Thanks @LakshSingla

cryptoe · 2022-11-14T13:14:59Z

...-query/src/main/java/org/apache/druid/msq/indexing/error/TooManyClusteredByColumnsFault.java

+
+import java.util.Objects;
+
+@JsonTypeName(TooManyClusteredByColumnsFault.CODE)


We might need to add this to MSQIndexingModule.java

Thanks for pointing it out, I added it to the module.

LakshSingla · 2022-11-15T10:25:08Z

Test failures seem unrelated/flaky, can the second stage of the CI/CD be run again?

cryptoe · 2022-11-15T16:35:33Z

Failure look unrelated.
Thanks for the PR @LakshSingla.

Add clustered by limit

29ab09e

adarshsanjeev reviewed Nov 13, 2022

View reviewed changes

cryptoe reviewed Nov 13, 2022

View reviewed changes

cryptoe added Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Release Notes labels Nov 13, 2022

change semantics, add docs

7b4e553

adarshsanjeev approved these changes Nov 14, 2022

View reviewed changes

cryptoe approved these changes Nov 14, 2022

View reviewed changes

Merge branch 'master' into clustered-limit

7b1582f

cryptoe reviewed Nov 14, 2022

View reviewed changes

LakshSingla added 3 commits November 14, 2022 18:49

add fault class to the module

ce2588b

add test

b47e158

unambiguate test

b8b9178

cryptoe merged commit 9e938b5 into apache:master Nov 15, 2022

kfaraz added this to the 25.0 milestone Nov 21, 2022

This was referenced Dec 18, 2022

[Draft] 25.0.0 Release Notes #13592

Closed

Add SegmentAllocationQueue to batch allocation actions #13369

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a limit to the number of columns in the CLUSTERED BY clause #13352

Add a limit to the number of columns in the CLUSTERED BY clause #13352

LakshSingla commented Nov 11, 2022 •

edited

Loading

adarshsanjeev left a comment

adarshsanjeev Nov 13, 2022

LakshSingla Nov 14, 2022

LakshSingla Nov 14, 2022

cryptoe Nov 13, 2022

LakshSingla Nov 14, 2022

adarshsanjeev left a comment •

edited

Loading

cryptoe left a comment

cryptoe Nov 14, 2022

LakshSingla Nov 15, 2022

LakshSingla commented Nov 15, 2022

cryptoe commented Nov 15, 2022


		import java.util.Objects;

		@JsonTypeName(TooManyClusteredByColumnsFault.CODE)

Add a limit to the number of columns in the CLUSTERED BY clause #13352

Add a limit to the number of columns in the CLUSTERED BY clause #13352

Conversation

LakshSingla commented Nov 11, 2022 • edited Loading

Release note

adarshsanjeev left a comment

Choose a reason for hiding this comment

adarshsanjeev Nov 13, 2022

Choose a reason for hiding this comment

LakshSingla Nov 14, 2022

Choose a reason for hiding this comment

LakshSingla Nov 14, 2022

Choose a reason for hiding this comment

cryptoe Nov 13, 2022

Choose a reason for hiding this comment

LakshSingla Nov 14, 2022

Choose a reason for hiding this comment

adarshsanjeev left a comment • edited Loading

Choose a reason for hiding this comment

cryptoe left a comment

Choose a reason for hiding this comment

cryptoe Nov 14, 2022

Choose a reason for hiding this comment

LakshSingla Nov 15, 2022

Choose a reason for hiding this comment

LakshSingla commented Nov 15, 2022

cryptoe commented Nov 15, 2022

LakshSingla commented Nov 11, 2022 •

edited

Loading

adarshsanjeev left a comment •

edited

Loading