Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Introduce support for NULL values #34573

Merged
merged 7 commits into from
Oct 19, 2018
Merged

Conversation

costin
Copy link
Member

@costin costin commented Oct 17, 2018

Make SQL aware of missing and/or unmapped fields treating them as NULL
Make all functions and operators null-safe aware, including when used
in filtering or sorting contexts
Add missing and null-safe doc value extractor
Modify dataset to have null fields spread around (in groups of 10)
Enforce missing last and unmapped_type inside sorting
Consolidate Predicate templating and declaration
Add support for Like/RLike in scripting
Introduce early schema declaration for CSV spec tests: to keep the doc
snippets in place, introduce schema:: prefix to declare the CSV schema
upfront.

Fix #32079

Make SQL aware of missing and/or unmapped fields treating them as NULL
Make _all_ functions and operators null-safe aware, including when used
in filtering or sorting contexts
Add missing and null-safe doc value extractor
Modify dataset to have null fields spread around (in groups of 10)
Enforce missing last and unmapped_type inside sorting
Consolidate Predicate templating and declaration
Add support for Like/RLike in scripting
Introduce early schema declaration for CSV spec tests: to keep the doc
snippets in place, introduce schema:: prefix to declare the CSV schema
upfront.

Fix elastic#32079
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few comments.

}

public static Boolean match(Object value, Object pattern) {
if (value == null && pattern == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't here be a || instead of &&? If at least one of those values are null one of the next statements will throw a NPE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should. Fixed.

@@ -88,7 +88,7 @@ queryTerm
;

orderBy
: expression ordering=(ASC | DESC)?
: expression ordering=(ASC | DESC)? (NULLS nullOrdering=(FIRST | LAST))?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If null ordering can only be LAST, FIRST is here present for a future improvement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've generalized this in the latest commit. Regardless the reason for it is to properly support the correct grammar if functionality-wise, this is not implemented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've generalized this in the latest commit. Regardless the reason for it is to properly support the correct grammar if functionality-wise, this is not implemented.

@@ -8,46 +8,46 @@ birth_date,emp_no,first_name,gender,hire_date,languages,last_name,salary
1957-05-23T00:00:00Z,10007,Tzvetan,F,1989-02-10T00:00:00Z,4,Zielinski,74572
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test data has no document where more than one field is null and things like CONCAT(first_name,last_name) where both fields are NULL cannot be tested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather test such functions individually simply because the test data is not that large and the nulls already limit its usefulness.

throw new SqlIllegalArgumentException("Invalid date encountered [{}]", dateTime);
}

public static String dayName(Long millis, String tzId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing select day_name(birth_date) from test_emp group by day_name(birth_date); throws a NPE. Using the _translate API the error is summed up as

          "reason": {
                    "type": "script_exception",
                    "reason": "runtime error",
                    "script_stack": [
                        "InternalSqlScriptUtils.dayName(InternalSqlScriptUtils.docValue(doc,params.v0).millis, params.v1)",
                        "                                                                             ^---- HERE"
                    ],
                    "script": "InternalSqlScriptUtils.dayName(InternalSqlScriptUtils.docValue(doc,params.v0).millis, params.v1)",
                    "lang": "painless",
                    "caused_by": {
                        "type": "null_pointer_exception",
                        "reason": null
                    }
                }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the current test suite didn't catch this. I've added some tests for day_name, month_name and quarter as they all suffer from the same problem.


return Objects.equals(symbol, other.symbol)
return Objects.equals(other.symbol(), other.symbol())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo? These will always be equal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


public RLike(Location location, Expression left, Literal right) {
super(location, left, right, "RLIKE");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we don't have any RLIKE tests, but can be added in a separate issue.

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work! Left some minor comments.

Question: why did you choose to remove the table with null values and merge it with the test_emp?
Wouldn't be useful to keep separate and have null-value specific tests?

@@ -71,29 +71,12 @@ default R apply(Object o) {
private final Function<Object, Object> apply;

StringOperation(StringFunction<Object> apply) {
this.apply = l -> l == null ? null : apply.apply(l);
this.apply = l -> l == null ? null : apply.apply((l));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: remove extra parentheses

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

if (millis == null || tzId == null) {
return null;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: remove new line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

if (millis == null || tzId == null) {
return null;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: remove new line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

private static final Map<Pattern, String> FORMATTING_PATTERNS;

static {
Map<String, String> patterns = new LinkedHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: You could do it in one step without the intermediate patterns map.

Maybe you can use the ImmutableMap.of() or

Collections.unmodifiableMap(Stream.of(
    new SimpleEntry<>(Pattens.compile("doc[{}].value", Pattern.LITERAL), "{sql}.docValue(doc,{})"),
    ...
    .collect(Collectors.toMap((e) -> e.getKey(), (e) -> e.getValue())));
``` and avoid the `static` block.
 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to stay away from Guava especially for simple things.
The Stream suggestion is good however after adding it I think it decreases readability (there's a lot of Pattern.compile repetition and I find the map collector complicated).
The static block might be a bit more verbose but I find it clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, was just a suggestion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the pattern compile in the collector and got rid of the static block in the latest commit.

I need to merge master after all.

@@ -459,10 +459,10 @@ protected QueryTranslation asQuery(MultiMatchQueryPredicate q, boolean onAggs) {
}
}

static class BinaryLogic extends ExpressionTranslator<BinaryPredicate> {
static class BinaryLogic extends ExpressionTranslator<org.elasticsearch.xpack.sql.expression.predicate.logical.BinaryLogic> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something but why do you need fully qName here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name clash on BinaryLogic.


@Override
protected QueryTranslation asQuery(BinaryPredicate e, boolean onAggs) {
protected QueryTranslation asQuery(org.elasticsearch.xpack.sql.expression.predicate.logical.BinaryLogic e, boolean onAggs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a BinaryLogic class declared inside QueryTranslator (the one above).

@@ -177,7 +178,12 @@ private static void doAssertResultSetData(ResultSet expected, ResultSet actual,

// handle nulls first
if (expectedObject == null || actualObject == null) {
assertEquals(msg, expectedObject, actualObject);
// hack for JDBC CSV nulls
if ("null".equals(expectedObject)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe check with use of lower or uppercase to allow null and NULL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@astefan
Copy link
Contributor

astefan commented Oct 18, 2018

@matriv the null values table was added in the past specifically for testing the empty bucket being created for null values as part of #32831. The intention was to be removed when NULLs were in place and integrate the NULL values documents in the main table.

@costin
Copy link
Member Author

costin commented Oct 18, 2018

@matriv to add to what @astefan said, if null would be a separate table we'd have to either run the tests across multiple indices and take into account duplicated data or come with a completely different dataset/data.
Plus it also complicated debugging - what table am I looking into.
Turns out that merging the two together makes things simple from this perspective and increases the randomness of the data.

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@costin
Copy link
Member Author

costin commented Oct 18, 2018

retest this please

@costin costin removed the v6.5.0 label Oct 19, 2018
@costin costin merged commit 52104aa into elastic:master Oct 19, 2018
@costin costin deleted the fix-for-32079 branch October 19, 2018 14:02
costin added a commit to costin/elasticsearch that referenced this pull request Oct 19, 2018
Make SQL aware of missing and/or unmapped fields treating them as NULL
Make _all_ functions and operators null-safe aware, including when used
in filtering or sorting contexts
Add missing and null-safe doc value extractor
Modify dataset to have null fields spread around (in groups of 10)
Enforce missing last and unmapped_type inside sorting
Consolidate Predicate templating and declaration
Add support for Like/RLike in scripting
Generalize NULLS LAST/FIRST
Introduce early schema declaration for CSV spec tests: to keep the doc
snippets in place (introduce schema:: prefix for declaration)
upfront.

Fix elastic#32079

(cherry picked from commit 52104aa)
kcm pushed a commit that referenced this pull request Oct 30, 2018
Make SQL aware of missing and/or unmapped fields treating them as NULL
Make _all_ functions and operators null-safe aware, including when used
in filtering or sorting contexts
Add missing and null-safe doc value extractor
Modify dataset to have null fields spread around (in groups of 10)
Enforce missing last and unmapped_type inside sorting
Consolidate Predicate templating and declaration
Add support for Like/RLike in scripting
Generalize NULLS LAST/FIRST
Introduce early schema declaration for CSV spec tests: to keep the doc
snippets in place (introduce schema:: prefix for declaration)
upfront.

Fix #32079
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL: improve tests with respect to NULL values in columns
5 participants