Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Implement null handling for IN(v1, v2, ...) #34750

Merged
merged 3 commits into from
Oct 24, 2018

Conversation

matriv
Copy link
Contributor

@matriv matriv commented Oct 23, 2018

Implemented null handling for both the value tested but also for
values inside the list of values tested against.

The null handling is implemented for local processors, painless scripts
and Lucene Terms queries making it available for IN expressions occuring
in SELECT, WHERE and HAVING clauses.

Closes: #34582

Implemented null handling for both the value tested but also for
values inside the list of values tested against.

The null handling is implemented for local processors, painless scripts
and Lucene Terms queries making it available for `IN` expressions occuring
in `SELECT`, `WHERE` and `HAVING` clauses.

Closes: elastic#34582
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the NULL handling inside the values is incorrect since it should signify missing value.
However this can be a separate PR.


for (int i = 0; i < processsors.size() - 1; i++) {
Boolean compResult = Comparisons.eq(leftValue, processsors.get(i).process(input));
if (compResult == null) {
result = null;
}
if (compResult != null && compResult) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compResult is not null already so no need to check it;
if (compResult == null) {..} else if(compResult) { ..}

return valuesOf(list, to, false);
}

public static <T> List<T> valuesOf(List<Expression> list, DataType to, boolean removeNulls) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why nulls need to be removed - if the expression in the list is NULL, it should be return accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my comment here: #34750 (comment)

for (Expression rightValue : list) {
Boolean compResult = Comparisons.eq(foldedLeftValue, rightValue.fold());
if (compResult != null && compResult) {
if (compResult == null) {
// if (value().dataType() == DataType.NULL || rightValue.dataType() == DataType.NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comments are confusing (not sure if this is the final version or not).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, some mixup with git stash.

@@ -24,7 +24,7 @@
public TermsQuery(Location location, String term, List<Expression> values) {
super(location);
this.term = term;
this.values = new LinkedHashSet<>(Foldables.valuesOf(values, values.get(0).dataType()));
this.values = new LinkedHashSet<>(Foldables.valuesOf(values, values.get(0).dataType(), true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid having the overloaded method, it might be more convenient to use Collections.removeIf(p -> p == null) since this is something local to the TermsQuery. On the other hand, it might also imply that when the value is missing (is null), there should be a match.

Copy link
Contributor Author

@matriv matriv Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that only to avoid a second iteration on the list (to remove the nulls), but I can change it. what do you think? I guess it can make a difference only if the list is really long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I wouldn't worry about it. I'm with you regarding avoiding the iteration but again, the Foldable method looks obscure with the null exclusion.
In other words, it's not Foldable that should handle null - I'm fain with handling it inside Terms I wonder though if that has any implications for inline IN.

@@ -173,6 +173,19 @@ public void testTranslateInExpression_WhereClause() throws IOException {
assertEquals("keyword:(bar foo lala)", tq.asBuilder().toQuery(createShardContext()).toString());
}

public void testTranslateInExpression_WhereClauseAndNullHAndling() throws IOException {
LogicalPlan p = plan("SELECT * FROM test WHERE keyword IN ('foo', null, 'lala', null, 'foo', concat('la', 'la'))");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think null means the value is missing. That is the IN should become a bool query between missing and terms.

Copy link
Contributor Author

@matriv matriv Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to copy the behaviour to Postgres:

test=# select * from t1 where a in (null);
 a
---
(0 rows)

test=# select * from t1 where a is null;
 a
---




(4 rows)

WHERE col IN (null) is different than WHERE col is NULL, the first one evaluates to NULL which in turn becomes false for WHERE and HAVING clauses.

MySQL behaves the same way too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. IN means = and that fails against null (need to use IS (NOT) NULL).

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Left just one comment.

@@ -229,6 +229,8 @@ public static DataType fromEsType(String esType) {
public boolean isCompatibleWith(DataType other) {
if (this == other) {
return true;
} else if (this == NULL || other == NULL) {
return true;
} else if (isString() && other.isString()) {
return true;
} else if (isNumeric() && other.isNumeric()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these conditions that return true wouldn't look better if there is one if () only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it only to be more readable, can combine them in one if.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use one if with one condition evaluation per line - best of both worlds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@matriv matriv merged commit 4c73854 into elastic:master Oct 24, 2018
@matriv matriv deleted the mt/fix-34582-new branch October 24, 2018 12:42
matriv pushed a commit that referenced this pull request Oct 24, 2018
Implemented null handling for both the value tested but also for
values inside the list of values tested against.

The null handling is implemented for local processors, painless scripts
and Lucene Terms queries making it available for `IN` expressions occuring
in `SELECT`, `WHERE` and `HAVING` clauses.

Closes: #34582
@matriv
Copy link
Contributor Author

matriv commented Oct 24, 2018

Backported to 6.x with 54f3b69

matriv added a commit to matriv/elasticsearch that referenced this pull request Oct 24, 2018
Handle the case when `null` is the only value in the list so that
it's translated to a `MatchNoDocsQuery`.

Followup to: elastic#34750
matriv pushed a commit that referenced this pull request Oct 25, 2018
Handle the case when `null` is the only value in the list so that
it's translated to a `MatchNoDocsQuery`.

Followup to: #34750
matriv pushed a commit that referenced this pull request Oct 25, 2018
Handle the case when `null` is the only value in the list so that
it's translated to a `MatchNoDocsQuery`.

Followup to: #34750
matriv pushed a commit that referenced this pull request Oct 25, 2018
Handle the case when `null` is the only value in the list so that
it's translated to a `MatchNoDocsQuery`.

Followup to: #34750
kcm pushed a commit that referenced this pull request Oct 30, 2018
Implemented null handling for both the value tested but also for
values inside the list of values tested against.

The null handling is implemented for local processors, painless scripts
and Lucene Terms queries making it available for `IN` expressions occuring
in `SELECT`, `WHERE` and `HAVING` clauses.

Closes: #34582
kcm pushed a commit that referenced this pull request Oct 30, 2018
Handle the case when `null` is the only value in the list so that
it's translated to a `MatchNoDocsQuery`.

Followup to: #34750
matriv added a commit to matriv/elasticsearch that referenced this pull request Oct 30, 2018
Replace standard `||` and `==` painless operators with
`or` and `eq` null safe alternatives from `InternalSqlScriptUtils`.

Follow up to elastic#34750
matriv pushed a commit that referenced this pull request Nov 1, 2018
Replace standard `||` and `==` painless operators with
new `in` method introduced in `InternalSqlScriptUtils`.
This allows the list of values to become a script variable
which is replaced each time with the list of  values provided
by the user.

Move In to the same package as InPipe & InProcessor

Follow up to #34750

Co-authored-by: Costin Leau <costin.leau@gmail.com>
matriv pushed a commit that referenced this pull request Nov 1, 2018
Replace standard `||` and `==` painless operators with
new `in` method introduced in `InternalSqlScriptUtils`.
This allows the list of values to become a script variable
which is replaced each time with the list of  values provided
by the user.

Move In to the same package as InPipe & InProcessor

Follow up to #34750

Co-authored-by: Costin Leau <costin.leau@gmail.com>
matriv pushed a commit that referenced this pull request Nov 1, 2018
Replace standard `||` and `==` painless operators with
new `in` method introduced in `InternalSqlScriptUtils`.
This allows the list of values to become a script variable
which is replaced each time with the list of  values provided
by the user.

Move In to the same package as InPipe & InProcessor

Follow up to #34750

Co-authored-by: Costin Leau <costin.leau@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants