-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Introduce support for NULL values #34573
Conversation
Make SQL aware of missing and/or unmapped fields treating them as NULL Make _all_ functions and operators null-safe aware, including when used in filtering or sorting contexts Add missing and null-safe doc value extractor Modify dataset to have null fields spread around (in groups of 10) Enforce missing last and unmapped_type inside sorting Consolidate Predicate templating and declaration Add support for Like/RLike in scripting Introduce early schema declaration for CSV spec tests: to keep the doc snippets in place, introduce schema:: prefix to declare the CSV schema upfront. Fix elastic#32079
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left few comments.
} | ||
|
||
public static Boolean match(Object value, Object pattern) { | ||
if (value == null && pattern == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't here be a ||
instead of &&
? If at least one of those values are null one of the next statements will throw a NPE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should. Fixed.
@@ -88,7 +88,7 @@ queryTerm | |||
; | |||
|
|||
orderBy | |||
: expression ordering=(ASC | DESC)? | |||
: expression ordering=(ASC | DESC)? (NULLS nullOrdering=(FIRST | LAST))? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If null ordering can only be LAST
, FIRST
is here present for a future improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've generalized this in the latest commit. Regardless the reason for it is to properly support the correct grammar if functionality-wise, this is not implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've generalized this in the latest commit. Regardless the reason for it is to properly support the correct grammar if functionality-wise, this is not implemented.
@@ -8,46 +8,46 @@ birth_date,emp_no,first_name,gender,hire_date,languages,last_name,salary | |||
1957-05-23T00:00:00Z,10007,Tzvetan,F,1989-02-10T00:00:00Z,4,Zielinski,74572 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test data has no document where more than one field is null
and things like CONCAT(first_name,last_name)
where both fields are NULL cannot be tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather test such functions individually simply because the test data is not that large and the nulls already limit its usefulness.
throw new SqlIllegalArgumentException("Invalid date encountered [{}]", dateTime); | ||
} | ||
|
||
public static String dayName(Long millis, String tzId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing select day_name(birth_date) from test_emp group by day_name(birth_date);
throws a NPE. Using the _translate
API the error is summed up as
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"InternalSqlScriptUtils.dayName(InternalSqlScriptUtils.docValue(doc,params.v0).millis, params.v1)",
" ^---- HERE"
],
"script": "InternalSqlScriptUtils.dayName(InternalSqlScriptUtils.docValue(doc,params.v0).millis, params.v1)",
"lang": "painless",
"caused_by": {
"type": "null_pointer_exception",
"reason": null
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the current test suite didn't catch this. I've added some tests for day_name, month_name and quarter as they all suffer from the same problem.
|
||
return Objects.equals(symbol, other.symbol) | ||
return Objects.equals(other.symbol(), other.symbol()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo? These will always be equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
public RLike(Location location, Expression left, Literal right) { | ||
super(location, left, right, "RLIKE"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems we don't have any RLIKE tests, but can be added in a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice work! Left some minor comments.
Question: why did you choose to remove the table with null values and merge it with the test_emp?
Wouldn't be useful to keep separate and have null-value specific tests?
@@ -71,29 +71,12 @@ default R apply(Object o) { | |||
private final Function<Object, Object> apply; | |||
|
|||
StringOperation(StringFunction<Object> apply) { | |||
this.apply = l -> l == null ? null : apply.apply(l); | |||
this.apply = l -> l == null ? null : apply.apply((l)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: remove extra parentheses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
if (millis == null || tzId == null) { | ||
return null; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: remove new line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
if (millis == null || tzId == null) { | ||
return null; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: remove new line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
private static final Map<Pattern, String> FORMATTING_PATTERNS; | ||
|
||
static { | ||
Map<String, String> patterns = new LinkedHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: You could do it in one step without the intermediate patterns
map.
Maybe you can use the ImmutableMap.of()
or
Collections.unmodifiableMap(Stream.of(
new SimpleEntry<>(Pattens.compile("doc[{}].value", Pattern.LITERAL), "{sql}.docValue(doc,{})"),
...
.collect(Collectors.toMap((e) -> e.getKey(), (e) -> e.getValue())));
``` and avoid the `static` block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're trying to stay away from Guava especially for simple things.
The Stream
suggestion is good however after adding it I think it decreases readability (there's a lot of Pattern.compile
repetition and I find the map collector complicated).
The static
block might be a bit more verbose but I find it clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, was just a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the pattern compile in the collector and got rid of the static block in the latest commit.
I need to merge master after all.
@@ -459,10 +459,10 @@ protected QueryTranslation asQuery(MultiMatchQueryPredicate q, boolean onAggs) { | |||
} | |||
} | |||
|
|||
static class BinaryLogic extends ExpressionTranslator<BinaryPredicate> { | |||
static class BinaryLogic extends ExpressionTranslator<org.elasticsearch.xpack.sql.expression.predicate.logical.BinaryLogic> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something but why do you need fully qName here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name clash on BinaryLogic
.
|
||
@Override | ||
protected QueryTranslation asQuery(BinaryPredicate e, boolean onAggs) { | ||
protected QueryTranslation asQuery(org.elasticsearch.xpack.sql.expression.predicate.logical.BinaryLogic e, boolean onAggs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a BinaryLogic
class declared inside QueryTranslator
(the one above).
@@ -177,7 +178,12 @@ private static void doAssertResultSetData(ResultSet expected, ResultSet actual, | |||
|
|||
// handle nulls first | |||
if (expectedObject == null || actualObject == null) { | |||
assertEquals(msg, expectedObject, actualObject); | |||
// hack for JDBC CSV nulls | |||
if ("null".equals(expectedObject)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe check with use of lower or uppercase to allow null
and NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@matriv to add to what @astefan said, if null would be a separate table we'd have to either run the tests across multiple indices and take into account duplicated data or come with a completely different dataset/data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
retest this please |
Push REST/YAML fix Update docs
Make SQL aware of missing and/or unmapped fields treating them as NULL Make _all_ functions and operators null-safe aware, including when used in filtering or sorting contexts Add missing and null-safe doc value extractor Modify dataset to have null fields spread around (in groups of 10) Enforce missing last and unmapped_type inside sorting Consolidate Predicate templating and declaration Add support for Like/RLike in scripting Generalize NULLS LAST/FIRST Introduce early schema declaration for CSV spec tests: to keep the doc snippets in place (introduce schema:: prefix for declaration) upfront. Fix elastic#32079 (cherry picked from commit 52104aa)
Make SQL aware of missing and/or unmapped fields treating them as NULL Make _all_ functions and operators null-safe aware, including when used in filtering or sorting contexts Add missing and null-safe doc value extractor Modify dataset to have null fields spread around (in groups of 10) Enforce missing last and unmapped_type inside sorting Consolidate Predicate templating and declaration Add support for Like/RLike in scripting Generalize NULLS LAST/FIRST Introduce early schema declaration for CSV spec tests: to keep the doc snippets in place (introduce schema:: prefix for declaration) upfront. Fix #32079
Make SQL aware of missing and/or unmapped fields treating them as NULL
Make all functions and operators null-safe aware, including when used
in filtering or sorting contexts
Add missing and null-safe doc value extractor
Modify dataset to have null fields spread around (in groups of 10)
Enforce missing last and unmapped_type inside sorting
Consolidate Predicate templating and declaration
Add support for Like/RLike in scripting
Introduce early schema declaration for CSV spec tests: to keep the doc
snippets in place, introduce schema:: prefix to declare the CSV schema
upfront.
Fix #32079