-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize aggregation analyzer #16198
Optimize aggregation analyzer #16198
Conversation
// fields and expressions in the group by clause | ||
private final Set<FieldId> groupingFields; | ||
@Nullable | ||
private Set<FieldId> groupingFields; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supplier<Set<FieldId>>
with memoization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many fancy words for a performance-related change.
Simple if (x==null)
will provide the best branch prediction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if com.google.common.base.Suppliers#memoize
wouldn't be worse, we should use that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guava's memoized supplier is thread-safe, meaning it does some sync logic before any access. This code does not need to be thread safe so there is no need for it IMO. With a simple nullable field we can pretty-much guarantee there will be no performance regression as it is effectively a no-op. With some additional logic, I am not that certain
"Grouping field %s should originate from %s", fieldId, sourceScope.getRelationType()); | ||
}); | ||
this.groupingFields.forEach(fieldId -> { | ||
checkState(isFieldFromScope(fieldId, sourceScope), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But now we are applying this check lazily, would it fail at a different place post Analysis
phase or a with a different error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a sanity check related strictly to this calculations, so it will not fire unless groupingFields
fields is actually accessed.
And it will never fire unless somebody screws up the code is some other place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That last sentence turned out to be false
groupingFieldsBuilder.add(resolvedField.getFieldId()); | ||
private Set<FieldId> getGroupingFields() | ||
{ | ||
if (groupingFields == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we evaluate them lazily would there be a chance - where we won't use the groupingFIelds
information ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only invocations are getGroupingFields().contains(fieldId)
so it is always used, after init.
core/trino-main/src/main/java/io/trino/sql/analyzer/AggregationAnalyzer.java
Show resolved
Hide resolved
f6a86eb
to
b8cf329
Compare
// fields and expressions in the group by clause | ||
private final Set<FieldId> groupingFields; | ||
@Nullable | ||
private Set<FieldId> groupingFields; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if com.google.common.base.Suppliers#memoize
wouldn't be worse, we should use that
core/trino-main/src/main/java/io/trino/sql/analyzer/AggregationAnalyzer.java
Outdated
Show resolved
Hide resolved
@@ -588,7 +596,7 @@ private boolean isGroupingKey(Expression node) | |||
return true; | |||
} | |||
|
|||
return groupingFields.contains(fieldId); | |||
return getGroupingFields().contains(fieldId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How common are queries when aggregation analyzer is constructed but getGroupingFields is not invoked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have answer to that question.
It happens for "my query", which is absurdly complicated and I cannot share it here. I also struggle to make up any query that will suffer the same problems on the same scale.
That is why my approach here is to help in "my case" which may or may not be common and, what's more important, not add any regression.
`groupingFields` set is expensive to construct for queries with a lot of group by clauses and might not be needed during analysis. We still need to iterate over `groupByExpressions` to carry out validation, but the results do not need to be stored in a set. This commit calculates it only when asked for, reducing analysis time for some huge queries. Validation is done at the end, and it reuses already calculated set, if it was created.
b8cf329
to
278ba8b
Compare
@Praveen2112 |
Closing this for now. I'll return to this topic, once i get a working benchmark repro. |
Description
Some performance tweaks to make analysis faster for certain type of large queries.
This is a sort-of follow-up to #15292
Additional context and related issues
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: