Add table stats cache #12196

lxynov · 2022-04-29T18:59:07Z

Cache table statistics during query planning. So they can be shared by CBOs like ReorderJoins and DetermineJoinDistributionType. It helps reduce Iceberg metadata processing when planning a query joining Iceberg tables.

To demo the helpfulness of this PR and #11858, I did some experiment with a locally-running Trino and a production join query,

Without #11858 and this PR: the planning takes 2 min 24s
With #11858 but not this PR: the planning takes 1 min 12s
With this PR: the planning takes 49s

Description

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement.

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Engine.

How would you describe this change to a non-technical end user or system administrator?

Improve query planning performance.

Related issues, pull requests, and links

#11708 #11858

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Section
* Improve query planning performance.

lxynov · 2022-05-05T15:58:00Z

@findepi Gentle reminder on this PR.

findepi · 2022-05-11T11:30:57Z

How does this relate to @clemensvonschwerin's #8659?

lxynov · 2022-05-11T23:34:35Z

@findepi The major difference is that #8659 is on the connector side whereas this PR is on the engine side.

findepi · 2022-05-13T08:09:57Z

@przemekak is going to help benchmark this

findepi · 2022-05-13T08:10:08Z

cc @alexjo2144

alexjo2144 · 2022-05-13T16:08:11Z

Can you give a short explanation of how this is different from the existing CachingStatsProvider? It seems to fit in a similar place as the CachingTableStatsProvider added here, the key is just on the PlanNode instead of the TableHandle

lxynov · 2022-05-16T21:01:49Z

@alexjo2144 Sure. Yeah it's indeed a bit confusing. Essentially the difference is that CachingStatsProvider is created per IterativeOptimizer, whereas CachingTableStatsProvider is created per query. CachingTableStatsProvider helps share table stats cache among CBOs, like ReorderJoins, DetermineJoinDistributionType and DetermineSemiJoinDistributionType.

core/trino-main/src/main/java/io/trino/cost/TableStatsProvider.java

findepi · 2022-05-17T10:42:47Z

core/trino-main/src/main/java/io/trino/cost/StatsCalculator.java

@@ -34,10 +34,11 @@ PlanNodeStatsEstimate calculateStats(
            StatsProvider sourceStats,
            Lookup lookup,
            Session session,
-            TypeProvider types);
+            TypeProvider types,
+            TableStatsProvider tableStatsProvider);


Why do we need a new entity here, the tableStatsProvider?
Could StatsProvider sourceStats do the job?

StatsProvider sourceStats is called to get children plan nodes' stats.

TableStatsProvider tableStatsProvider is used when calculating the "ultimate" source stats: table scan stats. In fact, all this PR does is to wire in a CachingTableStatsProvider for TableScanStatsRule to use.

You can easily imagine CachingStatsProvider that caches stats on per-PlanNode basis. This would be a generally useful implementation, potentially allowing us to cut down some stats calculation cost.

The currently existing CachingStatsProvider caches stats in Memo's group. This is useful as well -- the group contains alternative plans (currently always exactly one at a time) that produce same relation. Same relation implies that once the stats are calculated, they are applicable to all group members.

Now, these two concepts are different, but not exclusionary:

we could have "L2" cache that's on per-PlanNode basis

then "L1" cache that's based on Memo group

Now the question is, whether per-PlanNode basis and per-TableHandle basis are significantly different.
I think there are not. I don't think we ever have a case where we have two TableScanNode that have same TableHandle.

Now the question is, whether per-PlanNode basis and per-TableHandle basis are significantly different.

Actually they would be different.

Plan nodes have no equality, they compare by identity. Upon each exit from IterativeOptimizer (exit from Memo), a fully new plan structure is created. This can be improved a bit (eg produce new plan only if anything changed), but still would break identity-based caching, except for root nodes (table scans). Thus, the solution would look more generic, but would not actually be.

core/trino-main/src/main/java/io/trino/cost/CachingTableStatsProvider.java

findepi · 2022-05-17T10:48:41Z

core/trino-main/src/main/java/io/trino/cost/CachingTableStatsProvider.java

+        implements TableStatsProvider
+{
+    private final Metadata metadata;
+    private final Map<TableHandle, TableStatistics> cache = new HashMap<>();


A connector creates new TableHandles during various ConnectorMetadata.apply* calls.
A table handle may become "seen" in the plan and then discarded, make obsolete.

I think we should use weak keys here. Otherwise we need to size this as a regular cache.
(WeakHashMap provides weak keys with equality-based lookup, so i'd recommend that)

I don't fully understand this comment.. My understanding is that CachingTableStatsProvider is created per query and its lifecycle is within query planning. So the cache here can be GCed after the query planning. What issue did you see with it?

findepi · 2022-05-17T10:52:25Z

core/trino-main/src/main/java/io/trino/sql/planner/optimizations/HashGenerationOptimizer.java

@@ -106,7 +107,7 @@ public HashGenerationOptimizer(Metadata metadata)
    }

    @Override
-    public PlanNode optimize(PlanNode plan, Session session, TypeProvider types, SymbolAllocator symbolAllocator, PlanNodeIdAllocator idAllocator, WarningCollector warningCollector)
+    public PlanNode optimize(PlanNode plan, Session session, TypeProvider types, SymbolAllocator symbolAllocator, PlanNodeIdAllocator idAllocator, WarningCollector warningCollector, TableStatsProvider tableStatsProvider)


It would greatly help reviewing, if you had two commits

all the logic, which adds new PlanOptimizer.optimize overload, delegating to the old one

this would change only those PlanOptimizers which use this new info, or which delegate to some other PlanOptimizer

remove the old method overload, which would add , TableStatsProvider tableStatsProvider to many files (mechanical change)

Makes sense. Separated them. Please feel free to squash them when merging if you feel appropriate.

...in/src/test/java/io/trino/sql/planner/optimizations/TestRemoveUnsupportedDynamicFilters.java

testing/trino-server-dev/etc/catalog/mariadb.properties

findepi · 2022-05-17T10:56:24Z

please don't rebase when applying comments.

findepi · 2022-05-23T15:13:03Z

please don't rebase when applying comments.

i see a conflict here, so you'll need a rebase.
Please separate rebase and changes using a fixup commit.

findepi · 2022-05-27T13:33:50Z

@lxynov let me know if you need help with the rebase

findepi · 2022-06-07T12:14:08Z

core/trino-main/src/test/java/io/trino/cost/StatsCalculatorAssertion.java

        this.statsCalculator = requireNonNull(statsCalculator, "statsCalculator cannot be null");
-        this.session = requireNonNull(session, "session cannot be null");
+        this.session = requireNonNull(session, "sesssion cannot be null");


findepi · 2022-06-07T12:15:15Z

core/trino-main/src/test/java/io/trino/cost/TestCostCalculator.java

@@ -825,11 +824,15 @@ private AggregationNode aggregation(String id, PlanNode source)
                Optional.empty(),
                Optional.empty());

-        return singleAggregation(


why this change?

findepi · 2022-06-07T12:18:03Z

core/trino-main/src/main/java/io/trino/cost/CachingTableStatsProvider.java

@@ -29,7 +31,7 @@

    public CachingTableStatsProvider(Metadata metadata)
    {
-        this.metadata = metadata;
+        this.metadata = requireNonNull(metadata, "metadata is null");


Squash "Address comments" with respective commits.

(You did rebase anyway, and split the commit into two in this PR, so keeping some changes as a fixup commit doesn't make review easier)

findepi · 2022-06-07T12:19:09Z

core/trino-main/src/main/java/io/trino/cost/SimpleStatsRule.java

    @Override
    public final Optional<PlanNodeStatsEstimate> calculate(T node, StatsProvider sourceStats, Lookup lookup, Session session, TypeProvider types, TableStatsProvider tableStatsProvider)
    {
        return doCalculate(node, sourceStats, lookup, session, types, tableStatsProvider)
                .map(estimate -> normalizer.normalize(estimate, node.getOutputSymbols(), types));
    }

-    protected abstract Optional<PlanNodeStatsEstimate> doCalculate(T node, StatsProvider sourceStats, Lookup lookup, Session session, TypeProvider types);
-


Can you give "Add table stats cache 2: remove method overloading" commit a better tittle?
e.g. how would you name the commit, it it was going in a separate PR?

findepi · 2022-06-07T12:22:20Z

core/trino-main/src/main/java/io/trino/cost/TableStatsProvider.java

+{
+    TableStatsProvider EMPTY = (session, tableHandle) -> TableStatistics.empty();
+
+    public TableStatistics getTableStatistics(Session session, TableHandle tableHandle);


it's instantiated on per-query basis, so can take session as constructor argument

findepi · 2022-06-07T13:03:39Z

core/trino-main/src/main/java/io/trino/cost/StatsCalculator.java

@@ -34,10 +34,11 @@ PlanNodeStatsEstimate calculateStats(
            StatsProvider sourceStats,
            Lookup lookup,
            Session session,
-            TypeProvider types);
+            TypeProvider types,
+            TableStatsProvider tableStatsProvider);


You can easily imagine CachingStatsProvider that caches stats on per-PlanNode basis. This would be a generally useful implementation, potentially allowing us to cut down some stats calculation cost.

The currently existing CachingStatsProvider caches stats in Memo's group. This is useful as well -- the group contains alternative plans (currently always exactly one at a time) that produce same relation. Same relation implies that once the stats are calculated, they are applicable to all group members.

Now, these two concepts are different, but not exclusionary:

we could have "L2" cache that's on per-PlanNode basis

then "L1" cache that's based on Memo group

Now the question is, whether per-PlanNode basis and per-TableHandle basis are significantly different.
I think there are not. I don't think we ever have a case where we have two TableScanNode that have same TableHandle.

findepi · 2022-06-07T13:04:23Z

core/trino-main/src/main/java/io/trino/cost/TableScanStatsRule.java

@@ -56,12 +52,18 @@ public Pattern<TableScanNode> getPattern()

    @Override
    protected Optional<PlanNodeStatsEstimate> doCalculate(TableScanNode node, StatsProvider sourceStats, Lookup lookup, Session session, TypeProvider types)
+    {
+        throw new IllegalStateException("This is not expected to be called because the other overload is implemented.");


UnsupportedOE

findepi · 2022-06-30T10:31:48Z

I applied comments and posted a copy of this PR: #13047

cla-bot bot added the cla-signed label Apr 29, 2022

lxynov requested a review from findepi April 29, 2022 19:10

findepi reviewed May 17, 2022

View reviewed changes

findepi mentioned this pull request May 23, 2022

query-level caching for hive tables and iceberg table statistics #8659

Closed

lxynov force-pushed the caching-tablestats branch 2 times, most recently from bfb1298 to be37e0d Compare June 7, 2022 00:30

lxynov added 3 commits June 6, 2022 21:55

Add table stats cache 1: core logic

dc8228a

Add table stats cache 2: remove method overloading

24058f9

Address comments

5d549d3

lxynov force-pushed the caching-tablestats branch from be37e0d to 5d549d3 Compare June 7, 2022 05:03

findepi reviewed Jun 7, 2022

View reviewed changes

findepi mentioned this pull request Jun 30, 2022

Add table stats cache #13047

Merged

alexjo2144 mentioned this pull request Jul 15, 2022

Delta Lake: Improve coordinator memory consumption for large tables #13198

Closed

findepi closed this Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add table stats cache #12196

Add table stats cache #12196

lxynov commented Apr 29, 2022

lxynov commented May 5, 2022

findepi commented May 11, 2022

lxynov commented May 11, 2022

findepi commented May 13, 2022

findepi commented May 13, 2022

alexjo2144 commented May 13, 2022

lxynov commented May 16, 2022

findepi May 17, 2022

lxynov Jun 6, 2022

findepi Jun 7, 2022

findepi Jun 30, 2022

findepi May 17, 2022

lxynov Jun 6, 2022

findepi May 17, 2022

lxynov Jun 6, 2022

findepi commented May 17, 2022

findepi commented May 23, 2022

findepi commented May 27, 2022

findepi Jun 7, 2022

findepi Jun 7, 2022

findepi Jun 7, 2022

findepi Jun 7, 2022

findepi Jun 7, 2022

findepi Jun 7, 2022

findepi Jun 7, 2022

findepi commented Jun 30, 2022

Add table stats cache #12196

Add table stats cache #12196

Conversation

lxynov commented Apr 29, 2022

Description

Related issues, pull requests, and links

Documentation

Release notes

lxynov commented May 5, 2022

findepi commented May 11, 2022

lxynov commented May 11, 2022

findepi commented May 13, 2022

findepi commented May 13, 2022

alexjo2144 commented May 13, 2022

lxynov commented May 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

findepi commented May 17, 2022

findepi commented May 23, 2022

findepi commented May 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

findepi commented Jun 30, 2022