-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add table stats cache #13047
Add table stats cache #13047
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
/* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package io.trino.cost; | ||
|
||
import io.trino.Session; | ||
import io.trino.metadata.Metadata; | ||
import io.trino.metadata.TableHandle; | ||
import io.trino.spi.statistics.TableStatistics; | ||
|
||
import java.util.Map; | ||
import java.util.WeakHashMap; | ||
|
||
import static java.util.Objects.requireNonNull; | ||
|
||
public class CachingTableStatsProvider | ||
implements TableStatsProvider | ||
{ | ||
private final Metadata metadata; | ||
private final Session session; | ||
|
||
private final Map<TableHandle, TableStatistics> cache = new WeakHashMap<>(); | ||
|
||
public CachingTableStatsProvider(Metadata metadata, Session session) | ||
{ | ||
this.metadata = requireNonNull(metadata, "metadata is null"); | ||
this.session = requireNonNull(session, "session is null"); | ||
} | ||
|
||
@Override | ||
public TableStatistics getTableStatistics(TableHandle tableHandle) | ||
{ | ||
TableStatistics stats = cache.get(tableHandle); | ||
if (stats == null) { | ||
stats = metadata.getTableStatistics(session, tableHandle); | ||
cache.put(tableHandle, stats); | ||
} | ||
return stats; | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,16 +28,18 @@ public interface StatsCalculator | |
* @param sourceStats The stats provider for any child nodes' stats, if needed to compute stats for the {@code node} | ||
* @param lookup Lookup to be used when resolving source nodes, allowing stats calculation to work within {@link IterativeOptimizer} | ||
* @param types The type provider for all symbols in the scope. | ||
* @param tableStatsProvider The table stats provider. | ||
*/ | ||
PlanNodeStatsEstimate calculateStats( | ||
PlanNode node, | ||
StatsProvider sourceStats, | ||
Lookup lookup, | ||
Session session, | ||
TypeProvider types); | ||
TypeProvider types, | ||
TableStatsProvider tableStatsProvider); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
other option would be to have stats rules query-scoped and create new There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
why? Maybe we can just preserve cached stats across planner/optimizer rules? The root cause here is really that we have a lot of iterative optimizer instances. If we just had one, then there would be no caching problem. Anyways, I'm more in favor of preserving cached stats (maybe with weak references) across optimizer executions rather than adding artificial components to a clean interface. This is similar issue as with caching expression optimizer/analyzer results. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
that's what the PR is doing
i agree. Memo would do all of the work for now.
I am not following. @sopel39 what are you suggesting? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I suggest introducing some kind of context to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I've implemented it as part of this PR: #12016 Basically like we have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We still need to pass a parameter. I agree we could introduce a more generic context as well. |
||
|
||
static StatsCalculator noopStatsCalculator() | ||
{ | ||
return (node, sourceStats, lookup, ignore, types) -> PlanNodeStatsEstimate.unknown(); | ||
return (node, sourceStats, lookup, ignore, types, tableStatsProvider) -> PlanNodeStatsEstimate.unknown(); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,7 +15,6 @@ | |
|
||
import io.trino.Session; | ||
import io.trino.matching.Pattern; | ||
import io.trino.metadata.Metadata; | ||
import io.trino.spi.connector.ColumnHandle; | ||
import io.trino.spi.statistics.ColumnStatistics; | ||
import io.trino.spi.statistics.TableStatistics; | ||
|
@@ -40,12 +39,9 @@ public class TableScanStatsRule | |
{ | ||
private static final Pattern<TableScanNode> PATTERN = tableScan(); | ||
|
||
private final Metadata metadata; | ||
|
||
public TableScanStatsRule(Metadata metadata, StatsNormalizer normalizer) | ||
public TableScanStatsRule(StatsNormalizer normalizer) | ||
{ | ||
super(normalizer); // Use stats normalization since connector can return inconsistent stats values | ||
this.metadata = requireNonNull(metadata, "metadata is null"); | ||
} | ||
|
||
@Override | ||
|
@@ -55,13 +51,13 @@ public Pattern<TableScanNode> getPattern() | |
} | ||
|
||
@Override | ||
protected Optional<PlanNodeStatsEstimate> doCalculate(TableScanNode node, StatsProvider sourceStats, Lookup lookup, Session session, TypeProvider types) | ||
protected Optional<PlanNodeStatsEstimate> doCalculate(TableScanNode node, StatsProvider sourceStats, Lookup lookup, Session session, TypeProvider types, TableStatsProvider tableStatsProvider) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. that was my initial idea as well. see #12196 (comment) . |
||
{ | ||
if (isStatisticsPrecalculationForPushdownEnabled(session) && node.getStatistics().isPresent()) { | ||
return node.getStatistics(); | ||
} | ||
|
||
TableStatistics tableStatistics = metadata.getTableStatistics(session, node.getTable()); | ||
TableStatistics tableStatistics = tableStatsProvider.getTableStatistics(node.getTable()); | ||
|
||
Map<Symbol, SymbolStatsEstimate> outputSymbolStats = new HashMap<>(); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why do you need WeakHashMap if CTSP is created per query. Do you expect that
TableHandle
objects will be GCed often while query is running. It is possible for sure but I am not sure if often in practice.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During optimizations we can create large number of TableHandle objects (eg subsequent pushdowns) and old TableHandle objects may not be reachable in the plan. I don't think "large number" would be a common situation (there isn't an unlimited number of pushdown opportunities for a given query), but i still don't think we should be using strong references here.