-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use IdentityHashMap in AggregationAnalyzer #14983
Use IdentityHashMap in AggregationAnalyzer #14983
Conversation
instead of wrapping keys in NodeRef class. This should increase the performance of building map in a constructor which could be slow for certain, complicated queries.
@@ -79,6 +79,7 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would wrapping in NodeRef
make a big perf difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is we use NodeRef
in a lot of places. Why this particular place is special?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this particular place is special?
It is special for my case as it seems to be performance bottleneck.
The issue is we use NodeRef in a lot of places
The usage in this particular class in encapsulated, so IMHO this will not cause any issues.
The question is why we were using NodeRef
so extensively in the first place (I found 681 usages). Maybe I am simply missing something and it does indeed bring some benefits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @findepi
Before introduction of |
Can you elaborate? |
yet. Ticking bomb.
Check planner dev history from 2017. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok, assuming that it works % comments.
this.columnReferences = analysis.getColumnReferenceFields() | ||
.entrySet().stream() | ||
.collect(toImmutableMap(Map.Entry::getKey, entry -> entry.getValue().getFieldId())); | ||
Map<Expression, FieldId> columnReferencesLocal = new IdentityHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of feel that this requires a comment maybe? To justify why using identity is ok in this context and why it has been used.
I kind of was feeling the same, but it's true that its internal - maybe with proper comment in code and in commit it would be good enough to be used? |
Perhaps a bit more precision? |
Multiple changes. I can try dig history, or someone else does that
Everything is internal in some form.
Maybe. Error-proneness is definitely not a binary thing. Maybe we need a new collection type, say "ExpressionMap". It will internally use IdentityHashMap for performance reasons, but will NOT be a |
@skrzypo987 what kind of queries this would be an improvement for? |
I just found You changing this in a lot of places, looking like you were in a hurry. But no precise reason.
That is an interesting idea. Quite crazy, but at least my kind of crazy.
I am in a possession of a 1,2k line query that causes this. I don't know exactly what is going on there yet. I will at some point trim it down to something that a human being can understand and analyze. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @findepi pointed out, using an IdentityHashMap as a Map is error prone. IdentityHashMap violates the Map contract that stipulates, for instance, that:
Map.containsKey
returns true if and only if this map contains a mapping for a keyk
such thatObjects.equals(key, k)
Before we make any changes, I would like to understand why AggregationAnalyzer is slow for the query in question, what performance we're talking about in absolute terms, and whether there are other things that might be a factor, such as this class doing unnecessary repeated work.
Closing in favor of #15292 |
Description
See commit description.
Non-technical explanation
performance tweaks to planning phase
Release notes
(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: