Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use IdentityHashMap in AggregationAnalyzer #14983

Conversation

skrzypo987
Copy link
Member

Description

See commit description.

Non-technical explanation

performance tweaks to planning phase

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

instead of wrapping keys in NodeRef class. This should increase the
performance of building map in a constructor which could be slow for
certain, complicated queries.
@@ -79,6 +79,7 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would wrapping in NodeRef make a big perf difference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what JFR profile tells me. Unfortunately I cannot share the exact query here.
But positions 2-5 in this profile:
obraz
is associated with constructing this map

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is we use NodeRef in a lot of places. Why this particular place is special?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this particular place is special?

It is special for my case as it seems to be performance bottleneck.

The issue is we use NodeRef in a lot of places

The usage in this particular class in encapsulated, so IMHO this will not cause any issues.

The question is why we were using NodeRef so extensively in the first place (I found 681 usages). Maybe I am simply missing something and it does indeed bring some benefits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sopel39 sopel39 requested review from kasiafi and martint November 10, 2022 09:54
@findepi findepi changed the title Ude IdentityHashMap in AggregationAnalyzer Use IdentityHashMap in AggregationAnalyzer Nov 10, 2022
@findepi
Copy link
Member

findepi commented Nov 10, 2022

Ude IdentityHashMap in AggregationAnalyzer

instead of wrapping keys in NodeRef class. This should increase the
performance of building map in a constructor which could be slow for
certain, complicated queries.

Before introduction of NodeRef, we were using IdentityHashMaps for same purpose.
This was insanely error-prone. I don't think we should go back there.

@skrzypo987
Copy link
Member Author

Ude IdentityHashMap in AggregationAnalyzer
instead of wrapping keys in NodeRef class. This should increase the
performance of building map in a constructor which could be slow for
certain, complicated queries.

Before introduction of NodeRef, we were using IdentityHashMaps for same purpose. This was insanely error-prone. I don't think we should go back there.

Can you elaborate?
I understand using them in API as just Map<X,Y> does not force identity based comparison and using IdentityHashMap is a leak of implementation. However, in cases when its an implementation detail I see no risk (yet)

@findepi
Copy link
Member

findepi commented Nov 10, 2022

However, in cases when its an implementation detail I see no risk (yet)

yet. Ticking bomb.

Can you elaborate?

Check planner dev history from 2017.
(There were identity-related problems discovered and fixed later on as well)

Copy link
Member

@s2lomon s2lomon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok, assuming that it works % comments.

this.columnReferences = analysis.getColumnReferenceFields()
.entrySet().stream()
.collect(toImmutableMap(Map.Entry::getKey, entry -> entry.getValue().getFieldId()));
Map<Expression, FieldId> columnReferencesLocal = new IdentityHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of feel that this requires a comment maybe? To justify why using identity is ok in this context and why it has been used.

@s2lomon
Copy link
Member

s2lomon commented Nov 10, 2022

yet. Ticking bomb.

I kind of was feeling the same, but it's true that its internal - maybe with proper comment in code and in commit it would be good enough to be used?

@skrzypo987
Copy link
Member Author

Check planner dev history from 2017.

Perhaps a bit more precision?

@findepi
Copy link
Member

findepi commented Nov 14, 2022

Check planner dev history from 2017.

Perhaps a bit more precision?

Multiple changes. I can try dig history, or someone else does that

yet. Ticking bomb.

I kind of was feeling the same, but it's true that its internal

Everything is internal in some form.

maybe with proper comment in code and in commit it would be good enough to be used?

Maybe. Error-proneness is definitely not a binary thing.
With a comment we're safer than without.
With NodeRef we're safer than with IdentityHashMap.

Maybe we need a new collection type, say "ExpressionMap". It will internally use IdentityHashMap for performance reasons, but will NOT be a Map<...>, so it will be impossible to mis-use it (you won't be eg able to ImmutableMap.copyOf(expressionMap)).

@findepi
Copy link
Member

findepi commented Nov 14, 2022

@skrzypo987 what kind of queries this would be an improvement for?

@skrzypo987
Copy link
Member Author

Multiple changes. I can try dig history, or someone else does that

I just found You changing this in a lot of places, looking like you were in a hurry. But no precise reason.

Maybe we need a new collection type, say "ExpressionMap".

That is an interesting idea. Quite crazy, but at least my kind of crazy.
It can actually work. I believe (just an elaborate guess) that the perf problem is caused by the objects (NodeRefs) that need to actually created and stored on heap. With only a method delegation to IdentityHashMap it can be inlined easier.

@skrzypo987 what kind of queries this would be an improvement for?

I am in a possession of a 1,2k line query that causes this. I don't know exactly what is going on there yet. I will at some point trim it down to something that a human being can understand and analyze.

Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @findepi pointed out, using an IdentityHashMap as a Map is error prone. IdentityHashMap violates the Map contract that stipulates, for instance, that:

Map.containsKey returns true if and only if this map contains a mapping for a key k such that Objects.equals(key, k)

Before we make any changes, I would like to understand why AggregationAnalyzer is slow for the query in question, what performance we're talking about in absolute terms, and whether there are other things that might be a factor, such as this class doing unnecessary repeated work.

@skrzypo987
Copy link
Member Author

Closing in favor of #15292

@skrzypo987 skrzypo987 closed this Dec 5, 2022
@skrzypo987 skrzypo987 deleted the skrzypo/129-replace-noderef-with-identityhashmap branch December 5, 2022 11:59
@skrzypo987 skrzypo987 restored the skrzypo/129-replace-noderef-with-identityhashmap branch December 5, 2022 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants