Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Implement null handling for IN(v1, v2, ...) #34750

Merged
merged 3 commits into from
Oct 24, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,8 @@ public static DataType fromEsType(String esType) {
public boolean isCompatibleWith(DataType other) {
if (this == other) {
return true;
} else if (this == NULL || other == NULL) {
return true;
} else if (isString() && other.isString()) {
return true;
} else if (isNumeric() && other.isNumeric()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these conditions that return true wouldn't look better if there is one if () only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it only to be more readable, can combine them in one if.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use one if with one condition evaluation per line - best of both worlds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,15 @@ public static double doubleValueOf(Expression e) {
}

public static <T> List<T> valuesOf(List<Expression> list, DataType to) {
return valuesOf(list, to, false);
}

public static <T> List<T> valuesOf(List<Expression> list, DataType to, boolean removeNulls) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why nulls need to be removed - if the expression in the list is NULL, it should be return accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my comment here: #34750 (comment)

List<T> l = new ArrayList<>(list.size());
for (Expression e : list) {
l.add(valueOf(e, to));
if (!removeNulls || e.dataType() != DataType.NULL) {
l.add(valueOf(e, to));
}
}
return l;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -82,16 +82,20 @@ public boolean foldable() {
}

@Override
public Object fold() {
public Boolean fold() {
Object foldedLeftValue = value.fold();

Boolean result = false;
for (Expression rightValue : list) {
Boolean compResult = Comparisons.eq(foldedLeftValue, rightValue.fold());
if (compResult != null && compResult) {
if (compResult == null) {
// if (value().dataType() == DataType.NULL || rightValue.dataType() == DataType.NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comments are confusing (not sure if this is the final version or not).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, some mixup with git stash.

result = null;
// }
} else if (compResult) {
return true;
}
}
return false;
return result;
}

@Override
Expand All @@ -118,15 +122,18 @@ public ScriptTemplate asScript() {
String scriptPrefix = leftScript + "==";
LinkedHashSet<Object> values = list.stream().map(Expression::fold).collect(Collectors.toCollection(LinkedHashSet::new));
for (Object valueFromList : values) {
if (valueFromList instanceof Expression) {
ScriptTemplate rightScript = asScript((Expression) valueFromList);
sj.add(scriptPrefix + rightScript.template());
rightParams.add(rightScript.params());
} else {
if (valueFromList instanceof String) {
sj.add(scriptPrefix + '"' + valueFromList + '"');
// if checked against null => false
if (valueFromList != null) {
if (valueFromList instanceof Expression) {
ScriptTemplate rightScript = asScript((Expression) valueFromList);
sj.add(scriptPrefix + rightScript.template());
rightParams.add(rightScript.params());
} else {
sj.add(scriptPrefix + valueFromList.toString());
if (valueFromList instanceof String) {
sj.add(scriptPrefix + '"' + valueFromList + '"');
} else {
sj.add(scriptPrefix + valueFromList.toString());
}
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,14 +40,18 @@ public final void writeTo(StreamOutput out) throws IOException {
@Override
public Object process(Object input) {
Object leftValue = processsors.get(processsors.size() - 1).process(input);
Boolean result = false;

for (int i = 0; i < processsors.size() - 1; i++) {
Boolean compResult = Comparisons.eq(leftValue, processsors.get(i).process(input));
if (compResult == null) {
result = null;
}
if (compResult != null && compResult) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compResult is not null already so no need to check it;
if (compResult == null) {..} else if(compResult) { ..}

return true;
}
}
return false;
return result;
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ public class TermsQuery extends LeafQuery {
public TermsQuery(Location location, String term, List<Expression> values) {
super(location);
this.term = term;
this.values = new LinkedHashSet<>(Foldables.valuesOf(values, values.get(0).dataType()));
this.values = new LinkedHashSet<>(Foldables.valuesOf(values, values.get(0).dataType(), true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid having the overloaded method, it might be more convenient to use Collections.removeIf(p -> p == null) since this is something local to the TermsQuery. On the other hand, it might also imply that when the value is missing (is null), there should be a match.

Copy link
Contributor Author

@matriv matriv Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that only to avoid a second iteration on the list (to remove the nulls), but I can change it. what do you think? I guess it can make a difference only if the list is really long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I wouldn't worry about it. I'm with you regarding avoiding the iteration but again, the Foldable method looks obscure with the null exclusion.
In other words, it's not Foldable that should handle null - I'm fain with handling it inside Terms I wonder though if that has any implications for inline IN.

}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ public class InProcessorTests extends AbstractWireSerializingTestCase<InProcesso
private static final Literal ONE = L(1);
private static final Literal TWO = L(2);
private static final Literal THREE = L(3);
private static final Literal NULL = L(null);

public static InProcessor randomProcessor() {
return new InProcessor(Arrays.asList(new ConstantProcessor(randomLong()), new ConstantProcessor(randomLong())));
Expand All @@ -47,6 +48,16 @@ public void testEq() {
assertEquals(false, new In(EMPTY, THREE, Arrays.asList(ONE, TWO)).makePipe().asProcessor().process(null));
}

public void testHandleNullOnLeftValue() {
assertNull(new In(EMPTY, NULL, Arrays.asList(ONE, TWO, THREE)).makePipe().asProcessor().process(null));
assertNull(new In(EMPTY, NULL, Arrays.asList(ONE, NULL, TWO)).makePipe().asProcessor().process(null));
}

public void testHandleNullOnRightValue() {
assertEquals(true, new In(EMPTY, THREE, Arrays.asList(ONE, NULL, THREE)).makePipe().asProcessor().process(null));
assertNull(new In(EMPTY, TWO, Arrays.asList(ONE, NULL, THREE)).makePipe().asProcessor().process(null));
}

private static Literal L(Object value) {
return Literal.of(EMPTY, value);
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License;
* you may not use this file except in compliance with the Elastic License.
*/
package org.elasticsearch.xpack.sql.expression.predicate;

import org.elasticsearch.test.ESTestCase;
import org.elasticsearch.xpack.sql.expression.Literal;

import java.util.Arrays;

import static org.elasticsearch.xpack.sql.tree.Location.EMPTY;

public class InTests extends ESTestCase {

private static final Literal ONE = L(1);
private static final Literal TWO = L(2);
private static final Literal THREE = L(3);
private static final Literal NULL = L(null);

public void testInWithContainedValue() {
In in = new In(EMPTY, TWO, Arrays.asList(ONE, TWO, THREE));
assertTrue(in.fold());
}

public void testInWithNotContainedValue() {
In in = new In(EMPTY, THREE, Arrays.asList(ONE, TWO));
assertFalse(in.fold());
}

public void testHandleNullOnLeftValue() {
In in = new In(EMPTY, NULL, Arrays.asList(ONE, TWO, THREE));
assertNull(in.fold());
in = new In(EMPTY, NULL, Arrays.asList(ONE, NULL, THREE));
assertNull(in.fold());

}

public void testHandleNullsOnRightValue() {
In in = new In(EMPTY, THREE, Arrays.asList(ONE, NULL, THREE));
assertTrue(in.fold());
in = new In(EMPTY, ONE, Arrays.asList(TWO, NULL, THREE));
assertNull(in.fold());
}

private static Literal L(Object value) {
return Literal.of(EMPTY, value);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,19 @@ public void testTranslateInExpression_WhereClause() throws IOException {
assertEquals("keyword:(bar foo lala)", tq.asBuilder().toQuery(createShardContext()).toString());
}

public void testTranslateInExpression_WhereClauseAndNullHAndling() throws IOException {
LogicalPlan p = plan("SELECT * FROM test WHERE keyword IN ('foo', null, 'lala', null, 'foo', concat('la', 'la'))");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think null means the value is missing. That is the IN should become a bool query between missing and terms.

Copy link
Contributor Author

@matriv matriv Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to copy the behaviour to Postgres:

test=# select * from t1 where a in (null);
 a
---
(0 rows)

test=# select * from t1 where a is null;
 a
---




(4 rows)

WHERE col IN (null) is different than WHERE col is NULL, the first one evaluates to NULL which in turn becomes false for WHERE and HAVING clauses.

MySQL behaves the same way too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. IN means = and that fails against null (need to use IS (NOT) NULL).

assertTrue(p instanceof Project);
assertTrue(p.children().get(0) instanceof Filter);
Expression condition = ((Filter) p.children().get(0)).condition();
assertFalse(condition.foldable());
QueryTranslation translation = QueryTranslator.toQuery(condition, false);
Query query = translation.query;
assertTrue(query instanceof TermsQuery);
TermsQuery tq = (TermsQuery) query;
assertEquals("keyword:(foo lala)", tq.asBuilder().toQuery(createShardContext()).toString());
}

public void testTranslateInExpressionInvalidValues_WhereClause() {
LogicalPlan p = plan("SELECT * FROM test WHERE keyword IN ('foo', 'bar', keyword)");
assertTrue(p instanceof Project);
Expand All @@ -196,4 +209,17 @@ public void testTranslateInExpression_HavingClause_Painless() {
assertEquals("InternalSqlScriptUtils.nullSafeFilter(params.a0==10 || params.a0==20)", sq.script().toString());
assertThat(sq.script().params().toString(), startsWith("[{a=MAX(int){a->"));
}

public void testTranslateInExpression_HavingClauseAndNullHandling_Painless() {
LogicalPlan p = plan("SELECT keyword, max(int) FROM test GROUP BY keyword HAVING max(int) in (10, null, 20, null, 30 - 10)");
assertTrue(p instanceof Project);
assertTrue(p.children().get(0) instanceof Filter);
Expression condition = ((Filter) p.children().get(0)).condition();
assertFalse(condition.foldable());
QueryTranslation translation = QueryTranslator.toQuery(condition, false);
assertTrue(translation.query instanceof ScriptQuery);
ScriptQuery sq = (ScriptQuery) translation.query;
assertEquals("InternalSqlScriptUtils.nullSafeFilter(params.a0==10 || params.a0==20)", sq.script().toString());
assertThat(sq.script().params().toString(), startsWith("[{a=MAX(int){a->"));
}
}
6 changes: 6 additions & 0 deletions x-pack/qa/sql/src/main/resources/agg.sql-spec
Original file line number Diff line number Diff line change
Expand Up @@ -450,3 +450,9 @@ selectHireDateGroupByHireDate
SELECT hire_date HD, COUNT(*) c FROM test_emp GROUP BY hire_date ORDER BY hire_date DESC;
selectSalaryGroupBySalary
SELECT salary, COUNT(*) c FROM test_emp GROUP BY salary ORDER BY salary DESC;

// filter with IN
aggMultiWithHavingUsingInAndNullHandling
SELECT MIN(salary) min, MAX(salary) max, gender g, COUNT(*) c FROM "test_emp" WHERE languages > 0 GROUP BY g HAVING max IN(74999, null, 74600) ORDER BY gender;
aggMultiGroupByMultiWithHavingUsingInAndNullHandling
SELECT MIN(salary) min, MAX(salary) max, gender g, languages l, COUNT(*) c FROM "test_emp" WHERE languages > 0 GROUP BY g, languages HAVING max IN (74500, null, 74600) ORDER BY gender, languages;
5 changes: 5 additions & 0 deletions x-pack/qa/sql/src/main/resources/filter.sql-spec
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,8 @@ SELECT last_name l FROM "test_emp" WHERE emp_no NOT IN (10000, 10001, 10002, 999

whereWithInAndComplexFunctions
SELECT last_name l FROM "test_emp" WHERE emp_no NOT IN (10000, abs(2 - 10003), 10002, 999) AND lcase(first_name) IN ('sumant', 'mary', 'patricio', 'No''Match') ORDER BY emp_no LIMIT 5;

whereWithInAndNullHandling1
SELECT last_name l FROM "test_emp" WHERE birth_date in (CAST('2018-10-01T00:00:00Z' AS TIMESTAMP), CAST('1959-10-01T00:00:00Z' AS TIMESTAMP)) AND (emp_no = 10038 OR emp_no = 10039 OR emp_no = 10040) ORDER BY emp_no;
whereWithInAndNullHandling2
SELECT last_name l FROM "test_emp" WHERE birth_date in (CAST('2018-10-01T00:00:00Z' AS TIMESTAMP), null, CAST('1959-10-01T00:00:00Z' AS TIMESTAMP)) AND (emp_no = 10038 OR emp_no = 10039 OR emp_no = 10040) ORDER BY emp_no;
37 changes: 36 additions & 1 deletion x-pack/qa/sql/src/main/resources/select.csv-spec
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,22 @@ false |true
;


inWithNullHandling
SELECT 2 IN (1, null, 3), 3 IN (1, null, 3), null IN (1, null, 3), null IN (1, 2, 3);

2 IN (1, null, 3) | 3 IN (1, null, 3) | null IN (1, null, 3) | null IN (1, 2, 3)
--------------------+--------------------+-----------------------+-------------------
null |true |null | null
;

inWithNullHandlingAndNegation
SELECT NOT 2 IN (1, null, 3), NOT 3 IN (1, null, 3), NOT null IN (1, null, 3), NOT null IN (1, 2, 3);

NOT 2 IN (1, null, 3) | NOT 3 IN (1, null, 3) | NOT null IN (1, null, 3) | null IN (1, 2, 3)
------------------------+------------------------+---------------------------+--------------------
null |false |null | null
;

//
// SELECT with IN and table columns
//
Expand Down Expand Up @@ -64,4 +80,23 @@ SELECT 1 IN (1, abs(2 - 4), 3) OR emp_no NOT IN (10000, 10000 + 1, 10002) FROM t
10003
10004
10005
;
;

inWithTableColumnAndNullHandling
SELECT emp_no, birth_date in (CAST('2018-10-01T00:00:00Z' AS TIMESTAMP), CAST('1959-10-01T00:00:00Z' AS TIMESTAMP)), birth_date in (CAST('2018-10-01T00:00:00Z' AS TIMESTAMP), null, CAST('1959-10-01T00:00:00Z' AS TIMESTAMP)) FROM test_emp WHERE emp_no = 10038 OR emp_no = 10039 OR emp_no = 10040 ORDER BY 1;

emp_no | birth_date in (CAST('2018-10-01T00:00:00Z' AS TIMESTAMP), CAST('1959-10-01T00:00:00Z' AS TIMESTAMP)) | birth_date in (CAST('2018-10-01T00:00:00Z' AS TIMESTAMP), null, CAST('1959-10-01T00:00:00Z' AS TIMESTAMP))
--------+-------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------
10038 | true | true
10039 | null | null
10040 | false | null


inWithTableColumnAndNullHandlingAndNegation
SELECT emp_no, NOT birth_date in (CAST('2018-10-01T00:00:00Z' AS TIMESTAMP), CAST('1959-10-01T00:00:00Z' AS TIMESTAMP)), NOT birth_date in (CAST('2018-10-01T00:00:00Z' AS TIMESTAMP), null, CAST('1959-10-01T00:00:00Z' AS TIMESTAMP)) FROM test_emp WHERE emp_no = 10038 OR emp_no = 10039 OR emp_no = 10040 ORDER BY 1;

emp_no | NOT birth_date in (CAST('2018-10-01T00:00:00Z' AS TIMESTAMP), CAST('1959-10-01T00:00:00Z' AS TIMESTAMP)) | NOT birth_date in (CAST('2018-10-01T00:00:00Z' AS TIMESTAMP), null, CAST('1959-10-01T00:00:00Z' AS TIMESTAMP))
--------+-----------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------
10038 | false | false
10039 | null | null
10040 | true | null