Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Generate relevant error message when grouping functions are not used in GROUP BY #38017

Merged
merged 10 commits into from
Feb 2, 2019

Conversation

astefan
Copy link
Contributor

@astefan astefan commented Jan 30, 2019

Grouping functions (HISTOGRAM, so far) have a requirement of being used inside the GROUP BY. This PR adds checks to make sure this happens. Otherwise, an error message is generated.

Fixes #37952.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check needs to be extended so it traverses the tree in case the grouping function is embedded.
For example SELECT YEAR(HISTOGRAM(date, 1 YEAR)) FROM x GROUP BY HISTOGRAM(date, 1 YEAR).
Additionally (and this might be a separate PR), it's worth investigating how we handle functions over histograms FROM x GROUP BY YEAR(HISTOGRAM(date, 1 YEAR)).

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a coupe of questions, otherwise LGTM


proj.projections().forEach(e -> {
if (Functions.isGrouping(e) == false) {
e.collectFirstChildren(c -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collectFirstChildren uses a List to add the matches which we don't need in this impl.
Why don't you use forEachDown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the approach and pushed a new commit.


a.aggregates().forEach(as -> {
Expression exp = as instanceof Alias ? ((Alias) as).child() : as;
if (Functions.isGrouping(exp) && false == Expressions.anyMatch(a.groupings(), g -> exp.semanticEquals(g))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the histogram is wrapped with a Scalar function? is it caught somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already handled. See the test here.

@astefan
Copy link
Contributor Author

astefan commented Jan 31, 2019

@costin I pushed a new commit.

@astefan
Copy link
Contributor Author

astefan commented Jan 31, 2019

@elasticmachine run elasticsearch-ci/2

2 similar comments
@astefan
Copy link
Contributor Author

astefan commented Feb 1, 2019

@elasticmachine run elasticsearch-ci/2

@astefan
Copy link
Contributor Author

astefan commented Feb 1, 2019

@elasticmachine run elasticsearch-ci/2

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, I'm unclear on why extraChildren is needed...

Project proj = (Project) p;
proj.projections().forEach(e -> e.forEachDown(f -> localFailures.add(fail(f, "[%s] needs to be part of the grouping",
Expressions.name(f))), GroupingFunction.class));
} else if (p instanceof Aggregate) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block is a bit condensed and could use some whitespace and indentation to make it more readable.
For example put localFailures on a new line.

// if it does have a GROUP BY, check if the groupings contain the grouping functions (Histograms)
Aggregate a = (Aggregate) p;
a.aggregates().forEach(agg -> agg.forEachDown(e -> {
if (Expressions.anyMatch(a.groupings(), g -> {return g instanceof Function && e.functionEquals((Function) g);}) == false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyMatch(a.groupings(), g -> g instanceof Function && functionEquals((Function) g)
Avoids the use of {, return and ;
Also like above it will be readable if you put each statement on a new line

private final List<Expression> parameters;

private GroupingFunctionAttribute lazyAttribute;

protected GroupingFunction(Source source, Expression field) {
this(source, field, emptyList());
protected GroupingFunction(Source source, Expression field, List<Expression> extraChildren) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extraChildren is meaningless - use arguments instead (to preserve the terminology from the parent) or parameters instead.

}

protected GroupingFunction(Source source, Expression field, List<Expression> parameters) {
super(source, CollectionUtils.combine(singletonList(field), parameters));
protected GroupingFunction(Source source, Expression field, List<Expression> extraChildren, List<Expression> parameters) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there extraChildren and parameters?

@@ -24,8 +26,8 @@
private final ZoneId zoneId;

public Histogram(Source source, Expression field, Expression interval, ZoneId zoneId) {
super(source, field);
this.interval = (Literal) interval;
super(source, field, Arrays.asList(interval));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have this passed through parameters instead of extraChildren?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor improvement, use: Collections.singletonList(interval)

@astefan
Copy link
Contributor Author

astefan commented Feb 1, 2019

@costin the idea is to make interval part of the node's children so that the functionEquals check here does actually make two Histograms be equal. Since the ZoneId isn't really a children, that's why I defined a List of "extra" children, to make it more generic for any future (if any) function that will need a variable list of children...

@astefan
Copy link
Contributor Author

astefan commented Feb 2, 2019

@costin addressed your comments. Initially I looked at parameters, but I regarded them as having a different role than field and interval.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -24,8 +26,8 @@
private final ZoneId zoneId;

public Histogram(Source source, Expression field, Expression interval, ZoneId zoneId) {
super(source, field);
this.interval = (Literal) interval;
super(source, field, Arrays.asList(interval));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor improvement, use: Collections.singletonList(interval)

@@ -51,10 +53,13 @@ protected TypeResolution resolveType() {

return resolution;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really minor: remove the whitespaces.

@astefan astefan merged commit 6968f09 into elastic:master Feb 2, 2019
@astefan astefan deleted the 37952_fix branch February 2, 2019 20:43
astefan added a commit that referenced this pull request Feb 2, 2019
…used in GROUP BY (#38017)

* Add checks for Grouping functions restriction to be placed inside GROUP BY
* Fixed bug where GROUP BY HISTOGRAM (not using alias) wasn't recognized
properly in the Verifier due to functions equality not working correctly.

(cherry picked from commit 6968f09)
astefan added a commit that referenced this pull request Feb 4, 2019
…used in GROUP BY (#38017)

* Add checks for Grouping functions restriction to be placed inside GROUP BY
* Fixed bug where GROUP BY HISTOGRAM (not using alias) wasn't recognized
properly in the Verifier due to functions equality not working correctly.

(cherry picked from commit 6968f09)
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 4, 2019
…round-sync-6.x

* elastic/6.x:
  Fix testRestoreIncreasesPrimaryTerms on 6.x (elastic#38314)
  SQL: Remove exceptions from Analyzer (elastic#38260) (elastic#38287)
  SQL: Move metrics tracking inside PlanExecutor (elastic#38259) (elastic#38288)
  Backport of elastic#38311: Move TokenService to seqno powered cas
  Handle scheduler exceptions (elastic#38183)
  Mute MlMigrationFullClusterRestartIT#testMigration (elastic#38316)
  6.x Backport of elastic#38278: Move ML Optimistic Concurrency Control to Seq No
  Cleanup construction of interceptors (elastic#38296)
  Throw if two inner_hits have the same name (elastic#37645) (elastic#38194)
  AsyncTwoPhaseIndexerTests race condition fixed elastic#38195 Backport#37830
  Enable SSL in reindex with security QA tests (elastic#38293)
  Ensure ILM policies run safely on leader indices  (elastic#38140)
  Introduce ssl settings to reindex from remote (elastic#38292)
  Fix ordering problem in add or renew lease test (elastic#38281)
  Mute ReplicationTrackerRetentionLeaseTests#testAddOrRenewRetentionLease (elastic#38276)
  Fix NPE in Logfile Audit Filter (elastic#38120) (elastic#38271)
  Enable trace log in FollowerFailOverIT (elastic#38148)
  SQL: Generate relevant error message when grouping functions are not used in GROUP BY (elastic#38017)
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 8, 2019
* 6.6: (121 commits)
  [DOCS] Add warning about bypassing ML PUT APIs (elastic#38608)
  fix dissect doc "ip" --> "clientip" (elastic#38512)
  bad formatted JSON object (elastic#38515)
  SQL: Fix issue with IN not resolving to underlying keyword field (elastic#38440)
  Update ilm-api.asciidoc, point to REMOVE policy (elastic#38235)
  Backport changes to the release notes script. (elastic#38347)
  Change the milliseconds precision to 3 digits for intervals. (elastic#38297)
  SecuritySettingsSource license.self_generated: trial (elastic#38233) (elastic#38398)
  Fix IndexAuditTrail rolling upgrade on rollover edge 2 (elastic#38286) (elastic#38381)
  Cleanup construction of interceptors (elastic#38388)
  Skip unsupported languages for tests (elastic#38328) (elastic#38385)
  [ILM][TEST] increase assertBusy timeout (elastic#36864) (elastic#38354)
  Docs: Drop inline callout from scroll example (elastic#38340) (elastic#38365)
  Preserve ILM operation mode when creating new lifecycles (elastic#38134) (elastic#38230)
  [ML] Add explanation so far to file structure finder exceptions (elastic#38337)
  ML: Fix error race condition on stop _all datafeeds and close _all jobs (elastic#38113) (elastic#38211) (elastic#38222)
  SQL: Generate relevant error message when grouping functions are not used in GROUP BY (elastic#38017)
  Fix NPE in Logfile Audit Filter (elastic#38120) (elastic#38273)
  Enable trace log in FollowerFailOverIT (elastic#38148)
  Replace awaitBusy with assertBusy in atLeastDocsIndexed (elastic#38190)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants