Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Resolve attributes recursively for improved subquery support #69765

Merged
merged 9 commits into from
Mar 11, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,8 @@ public static final <E> AttributeMap<E> emptyAttributeMap() {
return EMPTY;
}

private static final Object NOT_FOUND = new Object();

private final Map<AttributeWrapper, E> delegate;
private Set<Attribute> keySet = null;
private Collection<E> values = null;
Expand Down Expand Up @@ -238,14 +240,14 @@ public boolean isEmpty() {
@Override
public boolean containsKey(Object key) {
if (key instanceof NamedExpression) {
return delegate.keySet().contains(new AttributeWrapper(((NamedExpression) key).toAttribute()));
return delegate.containsKey(new AttributeWrapper(((NamedExpression) key).toAttribute()));
}
return false;
}

@Override
public boolean containsValue(Object value) {
return delegate.values().contains(value);
return delegate.containsValue(value);
}

@Override
Expand All @@ -258,10 +260,21 @@ public E get(Object key) {

@Override
public E getOrDefault(Object key, E defaultValue) {
E e;
return (((e = get(key)) != null) || containsKey(key))
? e
: defaultValue;
if (key instanceof NamedExpression) {
return delegate.getOrDefault(new AttributeWrapper(((NamedExpression) key).toAttribute()), defaultValue);
}
return defaultValue;
}

@SuppressWarnings({ "unchecked" })
public E resolve(Object key, E defaultValue) {
AttributeMap<Object> map = (AttributeMap<Object>) this;
Object candidate = defaultValue;
Object value = null;
while ((value = map.getOrDefault(key, NOT_FOUND)) != NOT_FOUND && value != key) {
key = candidate = value;
}
return (E) candidate;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this method builds on other types methods in this class, it shouldn't have unchecked warnings nor used NOT_FOUND. Also not sure why the candidate is returned instead of value, should be the other way around.

Further more this implementation looks buggy: if the map contains (e,e) calling resolve(e, default) would incorrectly return default and not e.
Further more there's no handling of cycles which leads to an infinite loop: (a, b), (b, a).

Below a suggestion to fix both issues:

Suggested change
@SuppressWarnings({ "unchecked" })
public E resolve(Object key, E defaultValue) {
AttributeMap<Object> map = (AttributeMap<Object>) this;
Object candidate = defaultValue;
Object value = null;
while ((value = map.getOrDefault(key, NOT_FOUND)) != NOT_FOUND && value != key) {
key = candidate = value;
}
return (E) candidate;
}
public E resolveOrDefault(Object key, E defaultValue) {
E value = defaultValue;
E candidate;
int allowedLookups = 10;
while (key != value && ((candidate = get(key)) != null || containsKey(key))) {
if (--allowedLookups == 0) {
throw new QlIllegalArgumentException("Potential cycle detected");
}
key = candidate;
value = candidate;
}
return value;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch with the default value. I was thinking of adding the limitation on the number of lookups, but seemed we would need quite a huge bug in our AttributeMap population for it to happen with more than one hops in the cycle. Anyways, I added this defensive check.

The suggested code won't really work, because:

1.resolveOrDefault(e, e) will always return e even if the map has e -> f mapping.
2. The loop will always terminate after the first lookup (key == value == candidate)

Anyways, the updated code fixes the bug you found.


@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
import java.util.Set;

import static java.util.stream.Collectors.toList;
import static org.elasticsearch.xpack.ql.TestUtils.fieldAttribute;
import static org.elasticsearch.xpack.ql.TestUtils.of;
import static org.hamcrest.Matchers.arrayContaining;
import static org.hamcrest.Matchers.arrayWithSize;
import static org.hamcrest.Matchers.contains;
Expand Down Expand Up @@ -61,6 +63,30 @@ public void testAttributeMapWithSameAliasesCanResolveAttributes() {
assertTrue(newAttributeMap.get(param2.toAttribute()) == param2.child());
}

public void testResolve() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see more tests for dealing with:
a. same key, value association - (e,e)
b. handling of default value across association: resolveOrDefault(k, default) for a map with (k, default), (default, value) and also resolveOrDefault(k, k)
c. handling of cycles: (a, b), (b, c), (c,a ) with a mixture of default value in place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✓ Done.

AttributeMap.Builder<Object> builder = AttributeMap.builder();
Attribute one = a("one");
Attribute two = fieldAttribute("two", DataTypes.INTEGER);
Attribute three = fieldAttribute("three", DataTypes.INTEGER);
Alias threeAlias = new Alias(Source.EMPTY, "three_alias", three);
Alias threeAliasAlias = new Alias(Source.EMPTY, "three_alias_alias", threeAlias);
builder.put(one, of("one"));
builder.put(two, "two");
builder.put(three, of("three"));
builder.put(threeAlias.toAttribute(), threeAlias.child());
builder.put(threeAliasAlias.toAttribute(), threeAliasAlias.child());
AttributeMap<Object> map = builder.build();

assertEquals(of("one"), map.resolve(one, null));
assertEquals("two", map.resolve(two, null));
assertEquals(of("three"), map.resolve(three, null));
assertEquals(of("three"), map.resolve(threeAlias, null));
assertEquals(of("three"), map.resolve(threeAliasAlias, null));
Attribute four = a("four");
assertEquals("not found", map.resolve(four, "not found"));
assertNull(map.resolve(four, null));
}

private Alias createIntParameterAlias(int index, int value) {
Source source = new Source(1, index * 5, "?");
Literal literal = new Literal(source, value, DataTypes.INTEGER);
Expand Down
28 changes: 26 additions & 2 deletions x-pack/plugin/sql/qa/server/src/main/resources/subselect.sql-spec
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,34 @@ basicGroupBy
SELECT gender FROM (SELECT first_name AS f, last_name, gender FROM test_emp) GROUP BY gender ORDER BY gender ASC;
basicGroupByAlias
SELECT g FROM (SELECT first_name AS f, last_name, gender AS g FROM test_emp) GROUP BY g ORDER BY g ASC;
// @AwaitsFix(bugUrl = "follow-up to https://github.com/elastic/elasticsearch/pull/67216")
basicGroupByWithFilterByAlias-Ignore
basicGroupByWithFilterByAlias
SELECT g FROM (SELECT first_name AS f, last_name, gender AS g FROM test_emp) WHERE g IS NOT NULL GROUP BY g ORDER BY g ASC;
basicGroupByRealiased
SELECT g AS h FROM (SELECT first_name AS f, last_name, gender AS g FROM test_emp) GROUP BY g ORDER BY g DESC NULLS last;
basicGroupByRealiasedTwice
SELECT g AS h FROM (SELECT first_name AS f, last_name, gender AS g FROM test_emp) GROUP BY h ORDER BY h DESC NULLS last;
basicOrderByRealiasedField
SELECT g AS h FROM (SELECT first_name AS f, last_name, gender AS g FROM test_emp) ORDER BY g DESC NULLS first;

groupAndOrderByRealiasedExpression
SELECT emp_group AS e, min_high_salary AS s
FROM (
SELECT emp_no % 2 AS emp_group, MIN(salary) AS min_high_salary
FROM test_emp
WHERE salary > 50000
GROUP BY emp_group
)
ORDER BY e DESC;

multiLevelSelectStar
SELECT * FROM (SELECT * FROM ( SELECT * FROM test_emp ));

multiLevelSelectStarWithAlias
SELECT * FROM (SELECT * FROM ( SELECT * FROM test_emp ) b) c;

// AwaitsFix: https://github.com/elastic/elasticsearch/issues/69758
filterAfterGroupBy-Ignore
SELECT s2 AS s3 FROM (SELECT s AS s2 FROM ( SELECT salary AS s FROM test_emp) GROUP BY s2) WHERE s2 < 5 ORDER BY s3 DESC NULLS last;

countAndComplexCondition
SELECT COUNT(*) as c FROM (SELECT * FROM test_emp WHERE gender IS NOT NULL) WHERE ABS(salary) > 0 GROUP BY gender ORDER BY gender;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -638,12 +638,12 @@ protected LogicalPlan doRule(LogicalPlan plan) {
AttributeMap.Builder<Expression> builder = AttributeMap.builder();
// collect aliases
child.forEachUp(p -> p.forEachExpressionUp(Alias.class, a -> builder.put(a.toAttribute(), a.child())));
final Map<Attribute, Expression> collectRefs = builder.build();
final AttributeMap<Expression> collectRefs = builder.build();

referencesStream = referencesStream.filter(r -> {
for (Attribute attr : child.outputSet()) {
if (attr instanceof ReferenceAttribute) {
Expression source = collectRefs.getOrDefault(attr, attr);
Expression source = collectRefs.resolve(attr, attr);
// found a match, no need to resolve it further
// so filter it out
if (source.equals(r.child())) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -316,10 +316,11 @@ private static boolean checkGroupByOrder(LogicalPlan p, Set<Failure> localFailur
Map<Expression, Node<?>> missing = new LinkedHashMap<>();

o.order().forEach(oe -> {
Expression e = oe.child();
final Expression e = oe.child();
final Expression resolvedE = attributeRefs.resolve(e, e);

// aggregates are allowed
if (Functions.isAggregate(attributeRefs.getOrDefault(e, e))) {
if (Functions.isAggregate(resolvedE)) {
return;
}

Expand All @@ -340,8 +341,12 @@ private static boolean checkGroupByOrder(LogicalPlan p, Set<Failure> localFailur
// e.g.: if "GROUP BY f2(f1(field))" you can "ORDER BY f4(f3(f2(f1(field))))"
//
// Also, make sure to compare attributes directly
if (e.anyMatch(expression -> Expressions.anyMatch(groupingAndMatchingAggregatesAliases,
g -> expression.semanticEquals(expression instanceof Attribute ? Expressions.attribute(g) : g)))) {
if (resolvedE.anyMatch(expression -> Expressions.anyMatch(groupingAndMatchingAggregatesAliases,
g -> {
Expression resolvedG = attributeRefs.resolve(g, g);
resolvedG = expression instanceof Attribute ? Expressions.attribute(resolvedG) : resolvedG;
return expression.semanticEquals(resolvedG);
}))) {
return;
}

Expand Down Expand Up @@ -406,7 +411,7 @@ private static boolean checkGroupByHavingHasOnlyAggs(Expression e, Set<Expressio

// resolve FunctionAttribute to backing functions
if (e instanceof ReferenceAttribute) {
e = attributeRefs.get(e);
e = attributeRefs.resolve(e, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The null variant is used enough times to make a resolve method, in addition to resolveOrDefault on the main class similar to get and getOrDefault (even if it's just syntactic sugar).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✓ Done.

}

// scalar functions can be a binary tree
Expand Down Expand Up @@ -484,7 +489,7 @@ private static boolean onlyRawFields(Iterable<? extends Expression> expressions,

expressions.forEach(e -> e.forEachDown(c -> {
if (c instanceof ReferenceAttribute) {
c = attributeRefs.getOrDefault(c, c);
c = attributeRefs.resolve(c, c);
}
if (c instanceof Function) {
localFailures.add(fail(c, "No functions allowed (yet); encountered [{}]", c.sourceText()));
Expand Down Expand Up @@ -579,7 +584,7 @@ private static boolean checkGroupMatch(Expression e, Node<?> source, List<Expres

// resolve FunctionAttribute to backing functions
if (e instanceof ReferenceAttribute) {
e = attributeRefs.get(e);
e = attributeRefs.resolve(e, null);
}

// scalar functions can be a binary tree
Expand Down Expand Up @@ -668,7 +673,7 @@ private static void checkFilterOnAggs(LogicalPlan p, Set<Failure> localFailures,
LogicalPlan filterChild = filter.child();
if (filterChild instanceof Aggregate == false) {
filter.condition().forEachDown(Expression.class, e -> {
if (Functions.isAggregate(attributeRefs.getOrDefault(e, e))) {
if (Functions.isAggregate(attributeRefs.resolve(e, e))) {
if (filterChild instanceof Project) {
filter.condition().forEachDown(FieldAttribute.class,
f -> localFailures.add(fail(e, "[{}] field must appear in the GROUP BY clause or in an aggregate function",
Expand All @@ -690,7 +695,7 @@ private static void checkFilterOnGrouping(LogicalPlan p, Set<Failure> localFailu
if (p instanceof Filter) {
Filter filter = (Filter) p;
filter.condition().forEachDown(Expression.class, e -> {
if (Functions.isGrouping(attributeRefs.getOrDefault(e, e))) {
if (Functions.isGrouping(attributeRefs.resolve(e, e))) {
localFailures
.add(fail(e, "Cannot filter on grouping function [{}], use its argument instead", Expressions.name(e)));
}
Expand All @@ -717,7 +722,7 @@ private static void checkNestedUsedInGroupByOrHavingOrWhereOrOrderBy(LogicalPlan
}
};
Consumer<Expression> checkForNested = e ->
attributeRefs.getOrDefault(e, e).forEachUp(FieldAttribute.class, matchNested);
attributeRefs.resolve(e, e).forEachUp(FieldAttribute.class, matchNested);
Consumer<ScalarFunction> checkForNestedInFunction = f -> f.arguments().forEach(
arg -> arg.forEachUp(FieldAttribute.class, matchNested));

Expand All @@ -739,7 +744,7 @@ private static void checkNestedUsedInGroupByOrHavingOrWhereOrOrderBy(LogicalPlan

// check in where (scalars not allowed)
p.forEachDown(Filter.class, f -> f.condition().forEachUp(e ->
attributeRefs.getOrDefault(e, e).forEachUp(ScalarFunction.class, sf -> {
attributeRefs.resolve(e, e).forEachUp(ScalarFunction.class, sf -> {
if (sf instanceof BinaryComparison == false &&
sf instanceof IsNull == false &&
sf instanceof IsNotNull == false &&
Expand All @@ -758,7 +763,7 @@ private static void checkNestedUsedInGroupByOrHavingOrWhereOrOrderBy(LogicalPlan

// check in order by (scalars not allowed)
p.forEachDown(OrderBy.class, ob -> ob.order().forEach(o -> o.forEachUp(e ->
attributeRefs.getOrDefault(e, e).forEachUp(ScalarFunction.class, checkForNestedInFunction)
attributeRefs.resolve(e, e).forEachUp(ScalarFunction.class, checkForNestedInFunction)
)));
if (nested.isEmpty() == false) {
localFailures.add(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -222,8 +222,8 @@ public LogicalPlan apply(LogicalPlan plan) {
AttributeMap.Builder<Expression> builder = AttributeMap.builder();
// collect aliases
plan.forEachExpressionUp(Alias.class, a -> builder.put(a.toAttribute(), a.child()));
final Map<Attribute, Expression> collectRefs = builder.build();
java.util.function.Function<ReferenceAttribute, Expression> replaceReference = r -> collectRefs.getOrDefault(r, r);
final AttributeMap<Expression> collectRefs = builder.build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

java.util.function.Function<ReferenceAttribute, Expression> replaceReference = r -> collectRefs.resolve(r, r);

plan = plan.transformUp(p -> {
// non attribute defining plans get their references removed
Expand Down Expand Up @@ -300,7 +300,7 @@ static class PruneOrderByNestedFields extends OptimizerRule<Project> {
private void findNested(Expression exp, AttributeMap<Function> functions, Consumer<FieldAttribute> onFind) {
exp.forEachUp(e -> {
if (e instanceof ReferenceAttribute) {
Function f = functions.get(e);
Function f = functions.resolve(e, null);
if (f != null) {
findNested(f, functions, onFind);
}
Expand Down Expand Up @@ -578,7 +578,7 @@ private List<NamedExpression> combineProjections(List<? extends NamedExpression>
// replace any matching attribute with a lower alias (if there's a match)
// but clean-up non-top aliases at the end
for (NamedExpression ne : upper) {
NamedExpression replacedExp = (NamedExpression) ne.transformUp(Attribute.class, a -> aliases.getOrDefault(a, a));
NamedExpression replacedExp = (NamedExpression) ne.transformUp(Attribute.class, a -> aliases.resolve(a, a));
replaced.add((NamedExpression) CleanAliases.trimNonTopLevelAliases(replacedExp));
}
return replaced;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -600,10 +600,13 @@ else if (target.foldable()) {
else {
GroupByKey matchingGroup = null;
if (groupingContext != null) {
target = queryC.aliases().resolve(target, target);
id = Expressions.id(target);
matchingGroup = groupingContext.groupFor(target);
Check.notNull(matchingGroup, "Cannot find group [{}]", Expressions.name(ne));

queryC = queryC.addColumn(new GroupByRef(matchingGroup.id(), null, isDateBased(ne.dataType())), id);
queryC = queryC.addColumn(
new GroupByRef(matchingGroup.id(), null, isDateBased(ne.dataType())), id);
}
// fallback
else {
Expand Down Expand Up @@ -707,7 +710,7 @@ protected PhysicalPlan rule(OrderExec plan) {

// if it's a reference, get the target expression
if (orderExpression instanceof ReferenceAttribute) {
orderExpression = qContainer.aliases().get(orderExpression);
orderExpression = qContainer.aliases().resolve(orderExpression, null);
}
String lookup = Expressions.id(orderExpression);
GroupByKey group = qContainer.findGroupForAgg(lookup);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ public class QueryContainer {

// scalar function processors - recorded as functions get folded;
// at scrolling, their inputs (leaves) get updated
private final AttributeMap<Pipe> scalarFunctions;
private AttributeMap<Pipe> scalarFunctions;

private final Map<String, Sort> sort;
private final int limit;
Expand Down Expand Up @@ -185,7 +185,7 @@ public BitSet columnMask(List<Attribute> columns) {
aliasName(columns.get(0));

for (Attribute column : columns) {
Expression expression = aliases.getOrDefault(column, column);
Expression expression = aliases.resolve(column, column);

// find the column index
String id = Expressions.id(expression);
Expand Down Expand Up @@ -375,7 +375,11 @@ static Query rewriteToContainNestedField(@Nullable Query query, Source source, S

// replace function/operators's input with references
private Tuple<QueryContainer, FieldExtraction> resolvedTreeComputingRef(ScalarFunction function, Attribute attr) {
Pipe proc = scalarFunctions.computeIfAbsent(attr, v -> function.asPipe());
Pipe proc = null;
if ((proc = scalarFunctions.resolve(attr, null)) == null) {
proc = function.asPipe();
scalarFunctions = AttributeMap.builder(scalarFunctions).put(attr, proc).build();
}

// find the processor inputs (Attributes) and convert them into references
// no need to promote them to the top since the container doesn't have to be aware
Expand Down Expand Up @@ -407,14 +411,14 @@ public FieldExtraction resolve(Attribute attribute) {
}

public QueryContainer addColumn(Attribute attr) {
Expression expression = aliases.getOrDefault(attr, attr);
Expression expression = aliases.resolve(attr, attr);
Tuple<QueryContainer, FieldExtraction> tuple = asFieldExtraction(attr);
return tuple.v1().addColumn(tuple.v2(), Expressions.id(expression));
}

private Tuple<QueryContainer, FieldExtraction> asFieldExtraction(Attribute attr) {
// resolve it Expression
Expression expression = aliases.getOrDefault(attr, attr);
Expression expression = aliases.resolve(attr, attr);

if (expression instanceof FieldAttribute) {
FieldAttribute fa = (FieldAttribute) expression;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -508,6 +508,8 @@ public void testGroupByOrderByScalarOverNonGrouped() {
public void testGroupByOrderByFieldFromGroupByFunction() {
assertEquals("1:54: Cannot order by non-grouped column [int], expected [ABS(int)]",
error("SELECT ABS(int) FROM test GROUP BY ABS(int) ORDER BY int"));
assertEquals("1:91: Cannot order by non-grouped column [c], expected [b] or an aggregate function",
error("SELECT b, abs, 2 as c FROM (SELECT bool as b, ABS(int) abs FROM test) GROUP BY b ORDER BY c"));
}

public void testGroupByOrderByScalarOverNonGrouped_WithHaving() {
Expand All @@ -517,7 +519,14 @@ public void testGroupByOrderByScalarOverNonGrouped_WithHaving() {

public void testGroupByHavingNonGrouped() {
assertEquals("1:48: Cannot use HAVING filter on non-aggregate [int]; use WHERE instead",
error("SELECT AVG(int) FROM test GROUP BY text HAVING int > 10"));
error("SELECT AVG(int) FROM test GROUP BY bool HAVING int > 10"));
accept("SELECT AVG(int) FROM test GROUP BY bool HAVING AVG(int) > 2");
}

@AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/69758")
public void testGroupByWhereSubselect() {
accept("SELECT b, a FROM (SELECT bool as b, AVG(int) as a FROM test GROUP BY bool) WHERE b = false");
accept("SELECT b, a FROM (SELECT bool as b, AVG(int) as a FROM test GROUP BY bool HAVING AVG(int) > 2) WHERE b = false");
}

public void testGroupByAggregate() {
Expand Down
Loading