Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Resolve attributes recursively for improved subquery support #69765

Merged
merged 9 commits into from
Mar 11, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -238,30 +238,52 @@ public boolean isEmpty() {
@Override
public boolean containsKey(Object key) {
if (key instanceof NamedExpression) {
return delegate.keySet().contains(new AttributeWrapper(((NamedExpression) key).toAttribute()));
return delegate.containsKey(new AttributeWrapper(((NamedExpression) key).toAttribute()));
}
return false;
}

@Override
public boolean containsValue(Object value) {
return delegate.values().contains(value);
return delegate.containsValue(value);
}

/**
* @param key {@link NamedExpression} to look up
* @return Looks up the key in the AttributeMap recursively, meaning if the value for this key
* is another key in the map, it will follow it. It will return the final value or null if the key
* does not exist in the map.
*/
@Override
public E get(Object key) {
if (key instanceof NamedExpression) {
return delegate.get(new AttributeWrapper(((NamedExpression) key).toAttribute()));
}
return null;
return getOrDefault(key, null);
}

/**
* @param key {@link NamedExpression} to look up
* @return Looks up the key in the AttributeMap recursively, meaning if the value for this key
* is another key in the map, it will follow it. It will return the final value or the
* specified default value if the key does not exist in the map.
*/
@Override
public E getOrDefault(Object key, E defaultValue) {
E e;
return (((e = get(key)) != null) || containsKey(key))
? e
: defaultValue;
E candidate = defaultValue;
E value = null;
while (((value = lookup(key)) != null || containsKey(key)) && value != key) {
key = candidate = value;
}
return candidate;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is backwards.
getOrDefault should delegate to get not vice-versa. The former is an extension of the latter historically and conceptually.

Moreover by making get and getOrDefault recursive the Map contract has been fundamentally broken:

m.put(a, b);
assertTrue(b, m.get(a));
m.put(b,c);
assertTrue(b, m.get(a)); // fails

Instead of renaming get to lookup and modifying the contract, simply introduce a separate method (e.g. traverse(key), getResolvedValue(key), getRecursive(key) or something else) to the original map that does the recursive lookup and use that instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree as well on this, I'd prefer to have a dedicated method that does this more complex logic rather than overriding the default get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, but maybe the solution here is not just a simple rename of the method (resolve(key)), but to not implement the Map<Attribute, E> interface anymore. The Map contract was broken since the beginning because instead of equals() the semanticEquals() is used for key equality checks (which for example violates "containsKey(key) returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k)).")
After noget()s are required anymore, unless you can think of a place where recursive key resolution would actually cause harm (if that is the case, I'd say the AttributeMap building should be fixed instead to limit the scope from where the attributes are picked up from).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttributeMap is a Map for AttributeWrappers; the get, contains rely on equals. For convenience the underlying class is not exposed instead Attribute is used with the wrapping happening automatically.
It's similar to IdentityHashMap for example.

resolve(key) is a compound method that relies on get, the latter might not be used publicly now however I see too many downsides in removing it (and the Map contract) instead of keeping it.
For example get/put do per key read/write but resolve, which does multi-key read has not write counter-party; removing/replacing get breaks too many assumptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about the downsides or the breaking assumptions, since we barely utilize anything from the Map interface, practically only some read methods: isEmpty, size and get/getOrDefault (which can be renamed to resolve). The building of the map (puts) happen through the Builder methods today, here is how a full minimalization of the public methods would look like (at this point the AttributeMap is rather an AttributeResolver): palesz@fe814ec

Nonetheless, I can keep the current Map implementation and create a new resolve method for the recursive resolution, switch out all the get/getOrDefault calls to resolve and leave the gets there between the other unused methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


/**
* @param key {@link NamedExpression} to look up
* @return Result of the non-recursive lookup of the key in the map.
*/
private E lookup(Object key) {
if (key instanceof NamedExpression) {
return delegate.get(new AttributeWrapper(((NamedExpression) key).toAttribute()));
}
return null;
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
import java.util.Set;

import static java.util.stream.Collectors.toList;
import static org.elasticsearch.xpack.ql.TestUtils.fieldAttribute;
import static org.elasticsearch.xpack.ql.TestUtils.of;
import static org.hamcrest.Matchers.arrayContaining;
import static org.hamcrest.Matchers.arrayWithSize;
import static org.hamcrest.Matchers.contains;
Expand Down Expand Up @@ -61,6 +63,35 @@ public void testAttributeMapWithSameAliasesCanResolveAttributes() {
assertTrue(newAttributeMap.get(param2.toAttribute()) == param2.child());
}

public void testResolveRecursively() {
AttributeMap.Builder<Object> builder = AttributeMap.builder();
Attribute one = a("one");
Attribute two = fieldAttribute("two", DataTypes.INTEGER);
Attribute three = fieldAttribute("three", DataTypes.INTEGER);
Alias threeAlias = new Alias(Source.EMPTY, "three_alias", three);
Alias threeAliasAlias = new Alias(Source.EMPTY, "three_alias_alias", threeAlias);
builder.put(one, of("one"));
builder.put(two, "two");
builder.put(three, of("three"));
builder.put(threeAlias.toAttribute(), threeAlias.child());
builder.put(threeAliasAlias.toAttribute(), threeAliasAlias.child());
AttributeMap<Object> map = builder.build();

assertEquals(of("one"), map.get(one));
assertEquals(map.get(one), map.getOrDefault(one, null));
assertEquals("two", map.get(two));
assertEquals(map.get(two), map.getOrDefault(two, null));
assertEquals(of("three"), map.get(three));
assertEquals(map.get(three), map.getOrDefault(three, null));
assertEquals(map.get(three), map.getOrDefault(threeAlias, null));
assertEquals(map.get(three), map.get(threeAlias));
assertEquals(map.get(three), map.getOrDefault(threeAliasAlias, null));
assertEquals(map.get(three), map.get(threeAliasAlias));
Attribute four = a("four");
assertEquals("not found", map.getOrDefault(four, "not found"));
assertNull(map.get(four));
}

private Alias createIntParameterAlias(int index, int value) {
Source source = new Source(1, index * 5, "?");
Literal literal = new Literal(source, value, DataTypes.INTEGER);
Expand Down
22 changes: 20 additions & 2 deletions x-pack/plugin/sql/qa/server/src/main/resources/subselect.sql-spec
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,28 @@ basicGroupBy
SELECT gender FROM (SELECT first_name AS f, last_name, gender FROM test_emp) GROUP BY gender ORDER BY gender ASC;
basicGroupByAlias
SELECT g FROM (SELECT first_name AS f, last_name, gender AS g FROM test_emp) GROUP BY g ORDER BY g ASC;
// @AwaitsFix(bugUrl = "follow-up to https://github.com/elastic/elasticsearch/pull/67216")
basicGroupByWithFilterByAlias-Ignore
basicGroupByWithFilterByAlias
SELECT g FROM (SELECT first_name AS f, last_name, gender AS g FROM test_emp) WHERE g IS NOT NULL GROUP BY g ORDER BY g ASC;
basicGroupByRealiased
SELECT g AS h FROM (SELECT first_name AS f, last_name, gender AS g FROM test_emp) GROUP BY g ORDER BY g DESC NULLS last;
basicGroupByRealiasedTwice
SELECT g AS h FROM (SELECT first_name AS f, last_name, gender AS g FROM test_emp) GROUP BY h ORDER BY h DESC NULLS last;
basicOrderByRealiasedField
SELECT g AS h FROM (SELECT first_name AS f, last_name, gender AS g FROM test_emp) ORDER BY g DESC NULLS first;

groupAndOrderByRealiasedExpression
SELECT emp_group AS e, min_high_salary AS s
FROM (
SELECT emp_no % 2 AS emp_group, MIN(salary) AS min_high_salary
FROM test_emp
WHERE salary > 50000
GROUP BY emp_group
)
ORDER BY e DESC;

// AwaitsFix: https://github.com/elastic/elasticsearch/issues/69758
filterAfterGroupBy-Ignore
SELECT s2 AS s3 FROM (SELECT s AS s2 FROM ( SELECT salary AS s FROM test_emp) GROUP BY s2) WHERE s2 < 5 ORDER BY s3 DESC NULLS last;

countAndComplexCondition
SELECT COUNT(*) as c FROM (SELECT * FROM test_emp WHERE gender IS NOT NULL) WHERE ABS(salary) > 0 GROUP BY gender ORDER BY gender;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -316,10 +316,11 @@ private static boolean checkGroupByOrder(LogicalPlan p, Set<Failure> localFailur
Map<Expression, Node<?>> missing = new LinkedHashMap<>();

o.order().forEach(oe -> {
Expression e = oe.child();
final Expression e = oe.child();
final Expression resolvedE = attributeRefs.getOrDefault(e, e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above - it's misleading for a map lookup to do resolution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// aggregates are allowed
if (Functions.isAggregate(attributeRefs.getOrDefault(e, e))) {
if (Functions.isAggregate(resolvedE)) {
return;
}

Expand All @@ -340,8 +341,12 @@ private static boolean checkGroupByOrder(LogicalPlan p, Set<Failure> localFailur
// e.g.: if "GROUP BY f2(f1(field))" you can "ORDER BY f4(f3(f2(f1(field))))"
//
// Also, make sure to compare attributes directly
if (e.anyMatch(expression -> Expressions.anyMatch(groupingAndMatchingAggregatesAliases,
g -> expression.semanticEquals(expression instanceof Attribute ? Expressions.attribute(g) : g)))) {
if (resolvedE.anyMatch(expression -> Expressions.anyMatch(groupingAndMatchingAggregatesAliases,
g -> {
Expression resolvedG = attributeRefs.getOrDefault(g, g);
resolvedG = expression instanceof Attribute ? Expressions.attribute(resolvedG) : resolvedG;
return expression.semanticEquals(resolvedG);
}))) {
return;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,8 @@ protected Iterable<RuleExecutor<LogicalPlan>.Batch> batches() {
);

Batch refs = new Batch("Replace References", Limiter.ONCE,
new ReplaceReferenceAttributeWithSource()
);
new ReplaceReferenceAttributeWithSource()
);

Batch operators = new Batch("Operator Optimization",
// combining
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Expand Down Expand Up @@ -222,7 +222,7 @@ public LogicalPlan apply(LogicalPlan plan) {
AttributeMap.Builder<Expression> builder = AttributeMap.builder();
// collect aliases
plan.forEachExpressionUp(Alias.class, a -> builder.put(a.toAttribute(), a.child()));
final Map<Attribute, Expression> collectRefs = builder.build();
final AttributeMap<Expression> collectRefs = builder.build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

java.util.function.Function<ReferenceAttribute, Expression> replaceReference = r -> collectRefs.getOrDefault(r, r);

plan = plan.transformUp(p -> {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -600,10 +600,12 @@ else if (target.foldable()) {
else {
GroupByKey matchingGroup = null;
if (groupingContext != null) {
matchingGroup = groupingContext.groupFor(target);
final Expression resolvedTarget = queryC.aliases().getOrDefault(target, target);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit of code is confusing due to the left overs.
Target doesn't seem to be used anywhere or at least its initial value is no good - instead of adding a resolvedTarget just reinitialize it:
target = queryC.aliases().getOrDefault(target, target)

Same for id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

matchingGroup = groupingContext.groupFor(resolvedTarget);
Check.notNull(matchingGroup, "Cannot find group [{}]", Expressions.name(ne));

queryC = queryC.addColumn(new GroupByRef(matchingGroup.id(), null, isDateBased(ne.dataType())), id);
queryC = queryC.addColumn(
new GroupByRef(matchingGroup.id(), null, isDateBased(ne.dataType())), Expressions.id(resolvedTarget));
}
// fallback
else {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -508,6 +508,8 @@ public void testGroupByOrderByScalarOverNonGrouped() {
public void testGroupByOrderByFieldFromGroupByFunction() {
assertEquals("1:54: Cannot order by non-grouped column [int], expected [ABS(int)]",
error("SELECT ABS(int) FROM test GROUP BY ABS(int) ORDER BY int"));
assertEquals("1:91: Cannot order by non-grouped column [c], expected [b] or an aggregate function",
error("SELECT b, abs, 2 as c FROM (SELECT bool as b, ABS(int) abs FROM test) GROUP BY b ORDER BY c"));
}

public void testGroupByOrderByScalarOverNonGrouped_WithHaving() {
Expand All @@ -517,7 +519,14 @@ public void testGroupByOrderByScalarOverNonGrouped_WithHaving() {

public void testGroupByHavingNonGrouped() {
assertEquals("1:48: Cannot use HAVING filter on non-aggregate [int]; use WHERE instead",
error("SELECT AVG(int) FROM test GROUP BY text HAVING int > 10"));
error("SELECT AVG(int) FROM test GROUP BY bool HAVING int > 10"));
accept("SELECT AVG(int) FROM test GROUP BY bool HAVING AVG(int) > 2");
}

@AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/69758")
public void testGroupByWhereSubselect() {
accept("SELECT b, a FROM (SELECT bool as b, AVG(int) as a FROM test GROUP BY bool) WHERE b = false");
accept("SELECT b, a FROM (SELECT bool as b, AVG(int) as a FROM test GROUP BY bool HAVING AVG(int) > 2) WHERE b = false");
}

public void testGroupByAggregate() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2599,8 +2599,7 @@ public void testSubqueryFilterOrderByAlias() throws Exception {
"WHERE i IS NOT NULL " +
"ORDER BY i");
}

@AwaitsFix(bugUrl = "follow-up to https://github.com/elastic/elasticsearch/pull/67216")

public void testSubqueryGroupByFilterAndOrderByByAlias() throws Exception {
PhysicalPlan p = optimizeAndPlan("SELECT i FROM " +
"( SELECT int AS i FROM test ) " +
Expand Down Expand Up @@ -2658,4 +2657,29 @@ public void testSubqueryWithAliasOrderByAlias() throws Exception {
"( SELECT int AS i FROM test ) AS s " +
"ORDER BY s.i > 10");
}

public void testReferenceResolutionInSubqueries() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please break this method into multiple tests that indicate the difference between them in their name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

optimizeAndPlan("SELECT i AS j FROM ( SELECT int AS i FROM test) ORDER BY j");
optimizeAndPlan("SELECT j AS k FROM (SELECT i AS j FROM ( SELECT int AS i FROM test)) ORDER BY k");
optimizeAndPlan("SELECT int_group AS g, min_date AS d " +
"FROM (" +
" SELECT int % 2 AS int_group, MIN(date) AS min_date " +
" FROM test WHERE date > '1970-01-01'::datetime GROUP BY int_group" +
") " +
"ORDER BY d DESC");
optimizeAndPlan("SELECT int_group AS g, min_date AS d " +
"FROM (" +
" SELECT int % 2 AS int_group, MIN(date) AS min_date " +
" FROM test WHERE date > '1970-01-01'::datetime GROUP BY int_group " +
")" +
"ORDER BY g DESC");
optimizeAndPlan("SELECT i AS j FROM ( SELECT int AS i FROM test) GROUP BY j");
optimizeAndPlan("SELECT j AS k FROM (SELECT i AS j FROM ( SELECT int AS i FROM test)) GROUP BY k");
optimizeAndPlan("SELECT g FROM (SELECT date AS f, int AS g FROM test) WHERE g IS NOT NULL GROUP BY g ORDER BY g ASC");
}

@AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/69758")
public void testFilterAfterGroupBy() {
optimizeAndPlan("SELECT j AS k FROM (SELECT i AS j FROM ( SELECT int AS i FROM test) GROUP BY j) WHERE j < 5");
}
}