Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow predicate pushdown after aggregation pushdown #6007

Merged

Conversation

alexjo2144
Copy link
Member

@alexjo2144 alexjo2144 commented Nov 18, 2020

Predicates pushed down after aggregations must be on group by keys.

#4112

@cla-bot cla-bot bot added the cla-signed label Nov 18, 2020
@alexjo2144 alexjo2144 force-pushed the predicate-pushdown-with-aggregation branch from 89582f3 to 7208a55 Compare November 18, 2020 22:48
// handle's aggregations are applied after constraint, so we cannot apply filter if aggregates is already set
// TODO (https://github.com/prestosql/presto/issues/4112) allow filter pushdown after aggregation pushdown
return Optional.empty();
if (constraint.getSummary().getDomains().isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (constraint.getSummary().getDomains().isEmpty()) {
if (constraint.getSummary().isNone()) {

return Optional.empty();
}

Map<ColumnHandle, Domain> domains = constraint.getSummary().getDomains().get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Map<ColumnHandle, Domain> domains = constraint.getSummary().getDomains().get();
Map<ColumnHandle, Domain> domains = constraint.getSummary().getDomains().orElseThrow();

Comment on lines 124 to 125
boolean canPushDown = domains.keySet().stream().allMatch(predicateColumn ->
groupingSets.stream().allMatch(group -> group.contains(predicateColumn)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can pushdown if all the constraint columns are fully contained by every grouping set.
it's not testable right now, since the SPI supoports grouping set pushdown, but the engine actually does not yet

i propose

Set<ColumnHandle> constraintColumns = constraint.getSummary().getDomains().orElseThrow().keySet();
List<List<JdbcColumnHandle>> groupingSets = handle.getGroupingSets().get();
verify(!groupingSets.isEmpty());
boolean canPushDown = groupingSets.stream()
        .allMatch(groupingSet -> isContainedBy(constraintColumns, groupingSet));
private static boolean isContainedBy(Set<ColumnHandle> searchedFor, List<JdbcColumnHandle> container)
    {
        if (searchedFor.size() < 3) {
            return container.containsAll(searchedFor);
        }
        return ImmutableSet.copyOf(container).containsAll(searchedFor);
    }

(we can also consider the method overkill and just do the defensive version inline)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this correctly the conditions are the same. Right? Every constraint column must be in every group set.

What's the motivation for the defensive version, with the copy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for the defensive version, with the copy?

avoid O(m*n) when long group by (m) and many predicated column (n)

If I'm reading this correctly the conditions are the same.

correct. i read the initial code incorrectly though

@alexjo2144
Copy link
Member Author

Comments addressed

ConnectorTableHandle baseTableHandle = metadata.getTableHandle(session, new SchemaTableName("example", "numbers"));
ConnectorTableHandle aggregatedTable = applyCountAggregation(session, baseTableHandle, groupByColumn);

Domain domain = Domain.singleValue(BIGINT, 123L);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a case with two grouping sets (a, b) and (a) and predicate on column (b) only. No predicate pushdown should occur

Copy link
Member Author

@alexjo2144 alexjo2144 Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to set supportsGroupingSets to true in TestingH2JdbcClient to add that case, might break some other tests?

Edit: Seems fine

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: Seems fine

surprising to me, will take a look

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i was adding the flag because h2 did not support GROUPING SETS; OTOH tests currently cannot exploit this fact, as engine doesn't push grouping sets yet.
Yet, h2 version did not change, so it may be h2 still doesn't support this.
Maybe instead of changing the return type of supportsGroupingSets, just use a "mock" jdbc client in the TestJdbcMetadata test?
By "mock" i mean a ForwardingJdbcClient delegating to database.getJdbcClient(), but overriding supportsGroupingSets?
This way you'd avoid making a false statement in the io.prestosql.plugin.jdbc.TestingH2JdbcClient#supportsGroupingSets

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Updated.

@alexjo2144 alexjo2144 force-pushed the predicate-pushdown-with-aggregation branch from 5f7760c to 5e6777b Compare November 20, 2020 16:26
@@ -37,7 +37,7 @@ public TestingH2JdbcClient(BaseJdbcConfig config, ConnectionFactory connectionFa
@Override
public boolean supportsGroupingSets()
{
return false;
return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the overrride, as the default is (and will be) true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see however #6007 (comment)

ConnectorTableHandle baseTableHandle = metadata.getTableHandle(session, new SchemaTableName("example", "numbers"));
ConnectorTableHandle aggregatedTable = applyCountAggregation(session, baseTableHandle, groupByColumn);

Domain domain = Domain.singleValue(BIGINT, 123L);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: Seems fine

surprising to me, will take a look

@alexjo2144 alexjo2144 force-pushed the predicate-pushdown-with-aggregation branch from 5e6777b to 32bf131 Compare November 23, 2020 16:06
@alexjo2144 alexjo2144 force-pushed the predicate-pushdown-with-aggregation branch from 32bf131 to c9ff336 Compare November 24, 2020 18:33
Predicates pushed down after aggregations must be on group by keys
@alexjo2144 alexjo2144 force-pushed the predicate-pushdown-with-aggregation branch from c9ff336 to fce761c Compare November 24, 2020 20:52
@findepi findepi merged commit d14fdf9 into trinodb:master Nov 25, 2020
@findepi
Copy link
Member

findepi commented Nov 25, 2020

Merged, thanks!

@findepi findepi added this to the 347 milestone Nov 25, 2020
@findepi findepi mentioned this pull request Nov 25, 2020
10 tasks
@alexjo2144 alexjo2144 deleted the predicate-pushdown-with-aggregation branch November 30, 2020 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants