Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Enable aggregations to create a separate bucket for missing values #32832

Merged
merged 5 commits into from
Aug 27, 2018

Conversation

astefan
Copy link
Contributor

@astefan astefan commented Aug 14, 2018

At the moment, the GROUP BY statements in SELECTs will not consider inexistent values in columns. This PR enables missing values to be added as a separate bucket in the aggregation results.
This is part of a larger effort regarding NULL/missing values handling.
This fixes #32831

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it LGTM, but I have a couple of small concerns. 1) I am not sure it is worth adding a separate test_emp_with_nulls. I think it might be better to modify test_emp to have some null values. Otherwise we will only ever use it when we have intent to test nulls. 2) It is a breaking change. We can are still in experimental phase so we can break things, but I think it would be nice to document it somewhere.

@astefan
Copy link
Contributor Author

astefan commented Aug 16, 2018

@imotov the reason for creating - what I hope to be - a temporary set of test data is that the NULLs checking is an wide scenario covering many more areas and functionalities. I do have a more general issue covering NULLs in general.
In this way, we can enable grouping by non-existent values easily and quickly and then covering the rest of the scenarios (especially those involving all kinds of sql functions and all the H2 comparative tests added so far in the project) in a later push.

I could probably integrate the NULLs in the main set of test data, but the effort involved in fixing all the implications doing this might be more significant.

Regarding the breaking changes awareness, where would be the best place to document such a change?

@imotov
Copy link
Contributor

imotov commented Aug 16, 2018

I am fine with having two tables for now.

Regarding the breaking changes awareness, where would be the best place to document such a change?

It's, probably, time to create a new file for sql in docs/reference/migration/migrate_* folder.

@astefan
Copy link
Contributor Author

astefan commented Aug 20, 2018

@imotov I've added the breaking change page for SQL.

@astefan
Copy link
Contributor Author

astefan commented Aug 20, 2018

Also, @imotov, should this go into 6.x as well? If it would then a different breaking page should be added, right?

@imotov
Copy link
Contributor

imotov commented Aug 20, 2018

Yeah, it's an experimental features, so I think it's ok to do it in 6.x as well. Indeed, it will be a similar file only in different directory.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be nice to have an example in the docs with an aggregation that includes null so I can get a good look at what the output is. At this point we just assert that it lines up with jdbc, but it'd still be nice to see.


==== Grouping by columns with missing values will create an additional group

An additional group will be present in the result of requests containing a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this goes to 6.x then this shouldn't actually be committed to migrate_7_0, only the 6.x version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... good point @nik9000... a breaking change happens once on the versions timeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaking change in 6.x should go to \docs\reference\migration\migrate_6_5.asciidoc, right?

@@ -0,0 +1,14 @@
selectGenderWithNullsAndGroupByGender
SELECT gender, COUNT(*) count FROM test_emp_with_nulls GROUP BY gender ORDER BY gender;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a separate file? Maybe should just be part of the aggs file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 thank you for reviewing, much appreciated.
I've done it this way with the idea of refactoring this in the near future as a wider effort (part of #32079). Then the test_emp_with_nulls would disappear and the null values will be part of test_emp itself and more tests will be added to various test files (functions, selects with group by, IS NULL type of selects and possibly other sections). I considered the small change of allowing null values as part of the aggregations results worthy of being added now relatively quickly (and allow a minor functionality be available), instead of tackling the null values support wider task which will probably take more time. Also, when the wider null-handling task is considered, I can add the documentation covering the null group result.
Let me know your thoughts.

@astefan astefan merged commit 3d9ca4b into elastic:master Aug 27, 2018
astefan added a commit that referenced this pull request Aug 27, 2018
…es (#32832)

Enable aggregations to create a separate bucket for missing values.
astefan added a commit that referenced this pull request Aug 27, 2018
…rs null or empty values as a separate group/bucket. Previously, they were ignored.

* This is part of backporting of #32832
@astefan
Copy link
Contributor Author

astefan commented Aug 27, 2018

Pushed to 6.x as well, together with breaking changes documentation update.

jasontedor added a commit that referenced this pull request Aug 27, 2018
* master:
  Adjust BWC version on mapping version
  Token API supports the client_credentials grant (#33106)
  Build: forked compiler max memory matches jvmArgs (#33138)
  Introduce mapping version to index metadata (#33147)
  SQL: Enable aggregations to create a separate bucket for missing values (#32832)
  Fix grammar in contributing docs
  SECURITY: Fix Compile Error in ReservedRealmTests (#33166)
  APM server monitoring (#32515)
  Support only string `format` in date, root object & date range (#28117)
  [Rollup] Move toBuilders() methods out of rollup config objects (#32585)
  Fix forbiddenapis on java 11  (#33116)
  Apply publishing to genreate pom (#33094)
  Have circuit breaker succeed on unknown mem usage
  Do not lose default mapper on metadata updates (#33153)
  Fix a mappings update test (#33146)
  Reload Secure Settings REST specs & docs (#32990)
  Refactor CachingUsernamePassword realm (#32646)
jasontedor added a commit that referenced this pull request Aug 27, 2018
* 6.x:
  Introduce mapping version to index metadata (#33147)
  Move non duplicated actions back into xpack core (#32952)
  HLRC: Create server agnostic request and response (#32912)
  Build: forked compiler max memory matches jvmArgs (#33138)
  * Added breaking change section for GROUP BY behavior: now it considers null or empty values as a separate group/bucket. Previously, they were ignored. * This is part of backporting of #32832
  SQL: Enable aggregations to create a separate bucket for missing values (#32832)
  [TEST] version guard for reload rest-api-spec
  Fix grammar in contributing docs
  APM server monitoring (#32515)
  Support only string `format` in date, root object & date range (#28117)
  [Rollup] Move toBuilders() methods out of rollup config objects (#32585)
  Accept Gradle build scan agreement (#30645)
  Fix forbiddenapis on java 11  (#33116)
  Run forbidden api checks with runtimeJavaVersion (#32947)
  Apply publishing to genreate pom (#33094)
  Fix a mappings update test (#33146)
  Reload Secure Settings REST specs & docs (#32990)
  Refactor CachingUsernamePassword realm (#32646)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL: allow null values to be considered in GROUP BYs
4 participants