Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Implement IN(value1, value2, ...) expression. #34581

Merged
merged 26 commits into from
Oct 23, 2018

Conversation

matriv
Copy link
Contributor

@matriv matriv commented Oct 18, 2018

Implement the functionality to translate the
`field IN (value1, value2,...) expressions to proper Lucene queries
or painless script depending on the use case.

The IN expression can be used in SELECT, WHERE and HAVING clauses.

Closes: #32955

Implement the functionality to translate the
`field IN (value1, value2,...) expressions to proper Lucene queries
or painless script depending on the use case.

The `IN` expression can be used in SELECT, WHERE and HAVING clauses.

Closes: elastic#32955
@matriv matriv requested review from costin and astefan October 18, 2018 10:42
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@matriv
Copy link
Contributor Author

matriv commented Oct 18, 2018

Currently there is no check for null handling. It will be done in a next step once #34573 is merged.
Tracked by #34582.

There are no tests for negation when there is a translation to painless as this: #34558 must be fixed first

@astefan
Copy link
Contributor

astefan commented Oct 18, 2018

I tested this a bit and a combination of a function and IN values of a different type (select * from test_emp where lcase(first_name) in (1,2)) would be incorrect, but the error message presented to the user is rather verbose and exposing the full stack trace of the error.

@astefan
Copy link
Contributor

astefan commented Oct 18, 2018

@matriv this one fails and I think it shouldn't:

sql> select * from test_emp where year(birth_date) in (1990);
Server error [Server sent bad type [search_phase_execution_exception]. Original type was [all shards failed]. [Failed to execute phase [query], all shards failed; shardFailures {[1xf7J15UQ6-6Eagq0pzBBg][test_emp][0]: RemoteTransportException[[node-0][127.0.0.1:9300][indices:data/read/search[phase/query]]]; nested: QueryShardException[failed to create query: {
  "script" : {
    "script" : {
      "source" : "InternalSqlScriptUtils.dateTimeChrono(doc[params.v0].value.millis, params.v1, params.v2)==1990",
      "lang" : "painless",
      "params" : {
        "v0" : "birth_date",
        "v1" : "UTC",
        "v2" : "YEAR"
      }
    },
    "boost" : 1.0
  }
}]; nested: ScriptException[compile error]; nested: ClassCastException[Cannot cast from [int] to [java.lang.Object].]; }
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:293)
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:133)
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:254)

@costin
Copy link
Member

costin commented Oct 18, 2018

The above comparison is fixed in #34573 (the underlying null-safe equality does proper widening when comparing Numbers).

@matriv
Copy link
Contributor Author

matriv commented Oct 18, 2018

@astefan

I tested this a bit and a combination of a function and IN values of a different type (select * from test_emp where lcase(first_name) in (1,2)) would be incorrect, but the error message presented to the user is rather verbose and exposing the full stack trace of the error.

Thanks for catching that. Added validation and tests for nice error message.

@astefan
Copy link
Contributor

astefan commented Oct 18, 2018

sql> select * from test_emp where 1<2 in (false);
Bad request [Found 1 problem(s)
line 1:8: Unexecutable item]

Strange that the one above fails, but the next one doesn't: select * from test_emp where 1<2 in (true).

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, otherwise LGTM.

@matriv
Copy link
Contributor Author

matriv commented Oct 18, 2018

@astefan Thanks for digging in and testing!

select * from test_emp where 1<2 in (false);

This is a general problem, not related to in. Opened issue #34613.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
I've left a number of comments but most of them are stylistic.
I wonder if there's an optimization rule that we can use to removes from the list the items that are known to not match in order to minimize the list and thus the number of pipes, etc.. that follows.
That would only work though if the value (the left as you say) is constant and thus all not matching constants from the list could be removed:

`SELECT 1 IN (2,3, foo) FROM TABLE;

if (this == other) {
return true;
} else if (this.isString() && other.isString()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to specify this : if (isString() && other.isString())

@@ -188,6 +193,16 @@ private static Failure fail(Node<?> source, String message, Object... args) {

Set<Failure> localFailures = new LinkedHashSet<>();

if (p instanceof Filter) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To increase readability please move the two ifs to a separate method (checkInExpression?) similar to checkGroupBy.

In in = (In) e;
DataType dt = null;
for (Expression rightValue : in.list()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rightValue is a bit confusing since it also means the correct value. How about inValue or just value?

DataType dt = null;
for (Expression rightValue : in.list()) {
if (dt == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding the rule but I think this can be simplified by initializing dt with a first value and avoid the double ifs (which care very similar).
If the rule check the the values between themselves, this can be done by picking the first item and then comparing it with the rest (through an index (a bit verbose but fast) or a sublist).
If the rule checks the value against the list (which includes the former but can't be as precise in the message if the former occurs) dt is initialized to the In.value() and then iterates through the list.


private static void validateInExpression(Expression e, Set<Failure> localFailures) {
if (e instanceof In) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid the instanceof check using plan.forEachExpression(method, In.class) so for Filter you can do
filterPlan.condition().forEachExpression(validateIn, In.class)

validateIn(In in) { ...}
the Set<Failure> can be passed to the closure directly - see checkGroupBy & co for examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thx for the suggestion.

for (Pipe p : right) {
newRight.add(p.resolveAttributes(resolver));
}
return replaceChildren(Stream.concat(Stream.of(newLeft), newRight.stream()).collect(Collectors.toList()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, if there's only one list no need to separate, concatenate things.


@Override
public boolean resolved() {
return left().resolved() && right().stream().allMatch(Pipe::resolved);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolvables.resolved(list).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm cannot use that. Pipe is not Resolvable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing Pipe to implement Resolvable


public class InPipe extends Pipe {

private Pipe left;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think left and right are a bit confusing - why not use value and list?

// if the code gets here it's a bug
//
else {
throw new UnsupportedOperationException("No idea how to translate " + in.value());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use SqlIllegalArgumentException instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private Analyzer analyzer;

public QueryTranslatorTests() {
public class QueryTranslatorTests extends AbstractBuilderTestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is AbstractBuilderTestCase used anywhere (maybe I'm missing it)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used it for the shardContext here: https://github.com/elastic/elasticsearch/pull/34581/files/ab7c1502d1b6341b2b3cc3be7eda72b00e606ada#diff-aef2b0ce456b8fdd5cc09d6cfd55f0c2R173.
I tried to manually mock it but ended up with ugly code. The AbstractBuilderTestCase is parent class for many tests in .search.aggregations and .index.query packages.

@costin
Copy link
Member

costin commented Oct 18, 2018

By the way, this should go in 6.5 as well.

droberts195 and others added 8 commits October 19, 2018 13:06
The setting that reduces the disk space requirement
for the forecasting integration tests was accidentally
removed in elastic#31757 when files were moved around.  This
change simply adds back the setting that existed before
that.
Applies our standard column wrapping to the `discovery-ec2` and
`repository-s3` plugins.
Changes wording in the FIPS 140-2 related documentation. 

Co-authored-by: derickson <dave@elastic.co>
Adds support for query-time formatting of the date histo keys
when executing a rollup search.

Closes elastic#34391
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I've added some minor comments regarding styling.
Also, it's worth adding two unit tests : one to check the optimizer does folding of the in expressions (see optimizer tests - something like 1 in (2-1, 2, 3), it should return true and another to see whether In removes duplicates 1 in (1,2,3,1,2,3,1,2,3) which is handled by passing the list through an insertion-order set. From there on it can be treated as a list, knowing there are no duplicates.

@@ -144,7 +150,7 @@
}

static QueryTranslation toQuery(Expression e, boolean onAggs) {
QueryTranslation translation = null;
QueryTranslation translation;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a formatting issue - I'd keep the explicit initialization (to keep introspection tools at bay that yes, null is expected).

public TermsQuery(Location location, String term, List<Expression> values) {
super(location);
this.term = term;
this.values = values.stream().map(Expression::fold).collect(Collectors.toList());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the collector could be a set (as oppose to a list) to remove duplicates.

/**
* Comparison utilities.
*/
abstract class Comparisons {
public final class Comparisons {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick, I tend to explicitly use valueOf/xxxValue to clarify the use of boxing.

}

@Override
public boolean foldable() {
return foldable;
return children().stream().allMatch(Expression::foldable);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be moved to Expressions#foldable similar to nullable and resolvable.

@matriv
Copy link
Contributor Author

matriv commented Oct 20, 2018

@costin Addressed comments. Please take another look.

If the whole IN expression is not foldable then the values in the list IN(v1, v2, ...) are not folded because the ConstantFolding is a DOWN rule. This is not a problem as when we build the TermsQuery or the painless script we call fold().

Making the ConstantFolding an UP rule solves this more elegantly, but it generates a serious of other issues. Maybe we can introduce a special ConstantFolding rule just for the IN or you think the current solution is ok?

public TermsQuery(Location location, String term, List<Expression> values) {
super(location);
this.term = term;
this.values = values.stream().map(Expression::fold).collect(Collectors.toCollection(LinkedHashSet::new));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative, use Foldables.*: new LinkedHashSet(Foldables.valuesOf(values, datatType()))

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

The optimization rule is not critical since the end result is the same though I'm curious why it doesn't kick in.

@matriv
Copy link
Contributor Author

matriv commented Oct 22, 2018

@costin as discussed, I was wrong the optimisation kicks in, it was just not tested properly, I now have a test for that. Thank you!

@matriv
Copy link
Contributor Author

matriv commented Oct 22, 2018

retest this please

@matriv matriv merged commit 4a8386f into elastic:master Oct 23, 2018
@matriv matriv deleted the mt/impl-in-32955 branch October 23, 2018 12:31
matriv pushed a commit that referenced this pull request Oct 23, 2018
Implement the functionality to translate the
`field IN (value1, value2,...)` expressions to proper Lucene queries
or painless script or local processors depending on the use case.

The `IN` expression can be used in SELECT, WHERE and HAVING clauses.

Closes: #32955
@matriv
Copy link
Contributor Author

matriv commented Oct 23, 2018

Backported to 6.x with 528e2b2

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Oct 23, 2018
* master: (24 commits)
  ingest: better support for conditionals with simulate?verbose (elastic#34155)
  [Rollup] Job deletion should be invoked on the allocated task (elastic#34574)
  [DOCS] .Security index is never auto created (elastic#34589)
  CCR: Requires soft-deletes on the follower (elastic#34725)
  re-enable bwc tests (elastic#34743)
  Empty GetAliases authorization fix (elastic#34444)
  INGEST: Document Processor Conditional (elastic#33388)
  [CCR] Add total fetch time leader stat (elastic#34577)
  SQL: Support pattern against compatible indices (elastic#34718)
  [CCR] Auto follow pattern APIs adjustments (elastic#34518)
  [Test] Remove dead code from ExceptionSerializationTests (elastic#34713)
  A small typo in migration-assistance doc (elastic#34704)
  ingest: processor stats (elastic#34724)
  SQL: Implement IN(value1, value2, ...) expression. (elastic#34581)
  Tests: Add checks to GeoDistanceQueryBuilderTests (elastic#34273)
  INGEST: Rename Pipeline Processor Param. (elastic#34733)
  Core: Move IndexNameExpressionResolver to java time (elastic#34507)
  [DOCS] Force Merge: clarify execution and storage requirements (elastic#33882)
  TESTING.asciidoc fix examples using forbidden annotation (elastic#34515)
  SQL: Implement `CONVERT`, an alternative to `CAST` (elastic#34660)
  ...
kcm pushed a commit that referenced this pull request Oct 30, 2018
Implement the functionality to translate the
`field IN (value1, value2,...)` expressions to proper Lucene queries
or painless script or local processors depending on the use case.

The `IN` expression can be used in SELECT, WHERE and HAVING clauses.

Closes: #32955
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.