-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up IN list types analysis #11907
Conversation
567a739
to
d7ea68d
Compare
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/TypeAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/plan/SimplePlanRewriter.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/tests/TestLocalQueries.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some thoughts on the ExpressionAnalyzer
part. Please give me a chance to comment, I'll hopefully get back to it tomorrow.
d7ea68d
to
10fee6e
Compare
Together with #11902 it yields following performance improvement: |
8c739ca
to
b67f405
Compare
b67f405
to
0452821
Compare
// We need an explicit copy to avoid ConcurrentModificationException | ||
Set<Type> types = typeExpressions.keySet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't see explicit copy here.
|
||
private void addOrReplaceExpressionsCoercion(List<Expression> expressions, Type type, Type superType) | ||
{ | ||
Map<NodeRef<Expression>, Type> expressionRefs = expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"expressionRefs"?
(the name could be OK for a list, but that's a Map)
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Show resolved
Hide resolved
@@ -45,6 +55,14 @@ | |||
private final PlannerContext plannerContext; | |||
private final StatementAnalyzerFactory statementAnalyzerFactory; | |||
|
|||
private final NonEvictableCache<CacheIdentityKey, ExpressionAnalyzer> analyzerCache = buildNonEvictableCache( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExpressionAnalyzer
is a ton of mutable state, it's totally insuitable for caching & sharing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was proposed by @sopel39 . My previous version was actually caching only Map<RefNode<Expression>, Type>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type
is immutable, so it was definitely a better idea
(not judging whether this particular cache in this place is required)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sopel39 wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExpressionAnalyzer is a ton of mutable state, it's totally insuitable for caching & sharing.
I don't think it's a problem since it only adds new information to analysis. It already caches via io.trino.sql.analyzer.ExpressionAnalyzer.Visitor#process
.
That was proposed by @sopel39 . My previous version was actually caching only Map<RefNode, Type>
IIRC that map was not passed to ExpressionAnalyzer
hence it wouldn't work with sub-expressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, for me planning is a process so keeping ExpressionAnalyzer
is not really an issue. Actually, it's probably desirable since I want to cache ExpressionInterpreter
too and it needs to be able to fetch type information for new expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly opt for caching Map<RefNode, Type>
, as that's what you actually need here.
The ExpressionAnalyzer was not designed to be created in multiple instances throughout the Planner. It is intended for the Analysis phase, where it provides different kind of information to the Planner via the Analysis object. If you need to analyze expressions later for some reason, there's the analyzeExpressions
method, and createConstantAnalyzer
method, depending on your use case. Those methods give you access to the "analyzing" capability of the ExpressionAnalyzer
while they hide the complexity which is not relevant at that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ExpressionAnalyzer was not designed to be created in multiple instances throughout the Planner.
@kasiafi This is exactly what is happening right now. Every io.trino.sql.planner.TypeAnalyzer#getTypes
call creates a new ExpressionAnalyzer
instance (via analyzeExpressions
) and does a full analysis. Overall it's expensive if done repeatedly.
We want to reduce that cost by keeping ExpressionAnalyzer
instance during planning, which should be fine since planning is a process rather than moving from one immutable state to the other.
I strongly opt for caching Map<RefNode, Type>, as that's what you actually need here.
I would like to keep ExpressionInterpreter
too. For that I need a utility to return type
for a (sub)expression rather than analyzing entire expression to get full Map<NodeRef<Expression>, Type>
map. In that context keeping ExpressionAnalyzer
is more natural.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Cache expensive TypeAnalyzer instance creation and reduce type analys…
…is cost"
core/trino-main/src/main/java/io/trino/sql/planner/plan/SimplePlanRewriter.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/TypeAnalyzer.java
Outdated
Show resolved
Hide resolved
.orElse(session.getQueryId().getId()); | ||
} | ||
|
||
private static class CacheIdentityKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe the meaning of the class's name?
core/trino-main/src/main/java/io/trino/sql/planner/TypeAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/analyzer/ExpressionAnalyzer.java
Outdated
Show resolved
Hide resolved
typeExpressions.put(process(expression, context), expression); | ||
} | ||
|
||
// We need an explicit copy to avoid ConcurrentModificationException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove obsolete comment
@@ -2592,13 +2594,22 @@ private Type coerceToSingleType(StackableAstVisitorContext<Context> context, Str | |||
|
|||
private void addOrReplaceExpressionCoercion(Expression expression, Type type, Type superType) | |||
{ | |||
NodeRef<Expression> ref = NodeRef.of(expression); | |||
expressionCoercions.put(ref, superType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List.of(expression)
-> ImmutableList.of(expression)
@@ -45,6 +55,14 @@ | |||
private final PlannerContext plannerContext; | |||
private final StatementAnalyzerFactory statementAnalyzerFactory; | |||
|
|||
private final NonEvictableCache<CacheIdentityKey, ExpressionAnalyzer> analyzerCache = buildNonEvictableCache( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ExpressionAnalyzer was not designed to be created in multiple instances throughout the Planner.
@kasiafi This is exactly what is happening right now. Every io.trino.sql.planner.TypeAnalyzer#getTypes
call creates a new ExpressionAnalyzer
instance (via analyzeExpressions
) and does a full analysis. Overall it's expensive if done repeatedly.
We want to reduce that cost by keeping ExpressionAnalyzer
instance during planning, which should be fine since planning is a process rather than moving from one immutable state to the other.
I strongly opt for caching Map<RefNode, Type>, as that's what you actually need here.
I would like to keep ExpressionInterpreter
too. For that I need a utility to return type
for a (sub)expression rather than analyzing entire expression to get full Map<NodeRef<Expression>, Type>
map. In that context keeping ExpressionAnalyzer
is more natural.
@@ -45,6 +55,14 @@ | |||
private final PlannerContext plannerContext; | |||
private final StatementAnalyzerFactory statementAnalyzerFactory; | |||
|
|||
private final NonEvictableCache<CacheIdentityKey, ExpressionAnalyzer> analyzerCache = buildNonEvictableCache( | |||
CacheBuilder.newBuilder() | |||
// Try to evict entries as soon as possible to keep cache relatively small |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO TypeAnalyzer
should be created per query instance and possibly extracted as an interface. It should be then created and used in io.trino.sql.planner.LogicalPlanner#plan
. No NonEvictableCache<CacheIdentityKey, ExpressionAnalyzer> analyzerCache
is needed then
@Override | ||
protected Object[][] largeInValuesCountData() | ||
{ | ||
return new Object[][] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to add a case where IN
list is in subexpression, e.g: x IS NULL or y IN (...)
.getExpressionTypes(); | ||
try { | ||
ExpressionAnalyzer analyzer = analyzerCache.get(new CacheIdentityKey(getIdFromSession(session), expressions), () -> createExpressionAnalyzer(session, plannerContext, statementAnalyzerFactory, inputTypes)); | ||
return analyzeExpressions(analyzer, expressions).getExpressionTypes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just call ExpressionAnalyzer#analyze
directly, no need to go through ExpressionAnalysis
Wouldn't caching
Type for a (sub)expression would be there in the cache.
Why? Is that to avoid re-evaluating constant expressions? We could cache |
Why would it be in cache? It can be a new instance of expression (e.g. symbol changes). Bringing new instance of
Yes. We do that repetitively in planning (e.g. with |
In my opinion, keeping the ExpressionAnalyzer is much more acceptable than creating and caching many ExpressionAnalyzers. However, I still consider it kind of "abstraction leak". The ExpressionAnalyzer is here to perform correctness checks, determine coercions, etc on the pre-planning phase. Computing types is kind of "implementation detail". We learned to reuse this capability throughout the Planner, because we found that we need to know the types over and over. The new IR should definitely be enhanced with types (and also with pre-computed constants). While we currently reuse the ExpressionAnalyzer for getting the expression types, we should try to isolate this capability from all the other work that ExpressionAnalyzer does. For that reason, we use the public methods rather than the constructor, which also involves irrelevant parts, like the Analysis. |
I'm OK with having a persistent (per query) ExpressionInterpreter with caching. |
Creating a new instance of InPredicate would cause expression type cache miss, which is using node reference as a cache key.
@kasiafi What does it mean in practice? You mean you would like to have something like |
Yes, that was my thinking. However, I still think that caching Expression -> Type should be sufficient, and occasional creation of new ExpressionAnalyzer is better than "pulling" it throughout the Optimizer. And to make it the most "occasional", we should consider preserving the NodeRefs whenever possible (mostly, in ExpressionInterpreter) instead of creating identical copies of Expressions. |
Why it would be better? You end up performing all these correctness checks anyway, so why to pretend we don't use
I don't like that approach mostly because |
I dismissed my review. @martint ptal if you happen to have time. |
0452821
to
d94e061
Compare
👋 @wendigo - this PR has become inactive. If you're still interested in working on it, please let us know. We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks. |
Description
Related issues, pull requests, and links
Documentation
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
( ) Release notes entries required with the following suggested text: