Use IdentityHashMap in AggregationAnalyzer #14983

skrzypo987 · 2022-11-10T09:42:21Z

Description

See commit description.

Non-technical explanation

performance tweaks to planning phase

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

instead of wrapping keys in NodeRef class. This should increase the performance of building map in a constructor which could be slow for certain, complicated queries.

sopel39 · 2022-11-10T09:54:16Z

core/trino-main/src/main/java/io/trino/sql/analyzer/AggregationAnalyzer.java

@@ -79,6 +79,7 @@



Why would wrapping in NodeRef make a big perf difference?

That is what JFR profile tells me. Unfortunately I cannot share the exact query here.
But positions 2-5 in this profile:

is associated with constructing this map

The issue is we use NodeRef in a lot of places. Why this particular place is special?

Why this particular place is special?

It is special for my case as it seems to be performance bottleneck.

The issue is we use NodeRef in a lot of places

The usage in this particular class in encapsulated, so IMHO this will not cause any issues.

The question is why we were using NodeRef so extensively in the first place (I found 681 usages). Maybe I am simply missing something and it does indeed bring some benefits.

cc @findepi

findepi · 2022-11-10T12:02:03Z

Ude IdentityHashMap in AggregationAnalyzer

instead of wrapping keys in NodeRef class. This should increase the
performance of building map in a constructor which could be slow for
certain, complicated queries.

Before introduction of NodeRef, we were using IdentityHashMaps for same purpose.
This was insanely error-prone. I don't think we should go back there.

skrzypo987 · 2022-11-10T12:12:45Z

Ude IdentityHashMap in AggregationAnalyzer
instead of wrapping keys in NodeRef class. This should increase the
performance of building map in a constructor which could be slow for
certain, complicated queries.

Before introduction of NodeRef, we were using IdentityHashMaps for same purpose. This was insanely error-prone. I don't think we should go back there.

Can you elaborate?
I understand using them in API as just Map<X,Y> does not force identity based comparison and using IdentityHashMap is a leak of implementation. However, in cases when its an implementation detail I see no risk (yet)

findepi · 2022-11-10T12:29:35Z

However, in cases when its an implementation detail I see no risk (yet)

yet. Ticking bomb.

Can you elaborate?

Check planner dev history from 2017.
(There were identity-related problems discovered and fixed later on as well)

s2lomon

I think it's ok, assuming that it works % comments.

s2lomon · 2022-11-10T13:41:30Z

core/trino-main/src/main/java/io/trino/sql/analyzer/AggregationAnalyzer.java

-        this.columnReferences = analysis.getColumnReferenceFields()
-                .entrySet().stream()
-                .collect(toImmutableMap(Map.Entry::getKey, entry -> entry.getValue().getFieldId()));
+        Map<Expression, FieldId> columnReferencesLocal = new IdentityHashMap<>();


I kind of feel that this requires a comment maybe? To justify why using identity is ok in this context and why it has been used.

s2lomon · 2022-11-10T13:47:14Z

yet. Ticking bomb.

I kind of was feeling the same, but it's true that its internal - maybe with proper comment in code and in commit it would be good enough to be used?

skrzypo987 · 2022-11-10T14:14:44Z

Check planner dev history from 2017.

Perhaps a bit more precision?

findepi · 2022-11-14T16:32:13Z

Check planner dev history from 2017.

Perhaps a bit more precision?

Multiple changes. I can try dig history, or someone else does that

yet. Ticking bomb.

I kind of was feeling the same, but it's true that its internal

Everything is internal in some form.

maybe with proper comment in code and in commit it would be good enough to be used?

Maybe. Error-proneness is definitely not a binary thing.
With a comment we're safer than without.
With NodeRef we're safer than with IdentityHashMap.

Maybe we need a new collection type, say "ExpressionMap". It will internally use IdentityHashMap for performance reasons, but will NOT be a Map<...>, so it will be impossible to mis-use it (you won't be eg able to ImmutableMap.copyOf(expressionMap)).

findepi · 2022-11-14T16:32:50Z

@skrzypo987 what kind of queries this would be an improvement for?

skrzypo987 · 2022-11-14T16:43:47Z

Multiple changes. I can try dig history, or someone else does that

I just found You changing this in a lot of places, looking like you were in a hurry. But no precise reason.

Maybe we need a new collection type, say "ExpressionMap".

That is an interesting idea. Quite crazy, but at least my kind of crazy.
It can actually work. I believe (just an elaborate guess) that the perf problem is caused by the objects (NodeRefs) that need to actually created and stored on heap. With only a method delegation to IdentityHashMap it can be inlined easier.

@skrzypo987 what kind of queries this would be an improvement for?

I am in a possession of a 1,2k line query that causes this. I don't know exactly what is going on there yet. I will at some point trim it down to something that a human being can understand and analyze.

martint

As @findepi pointed out, using an IdentityHashMap as a Map is error prone. IdentityHashMap violates the Map contract that stipulates, for instance, that:

Map.containsKey returns true if and only if this map contains a mapping for a key k such that Objects.equals(key, k)

Before we make any changes, I would like to understand why AggregationAnalyzer is slow for the query in question, what performance we're talking about in absolute terms, and whether there are other things that might be a factor, such as this class doing unnecessary repeated work.

skrzypo987 · 2022-12-05T11:59:07Z

Closing in favor of #15292

Ude IdentityHashMap in AggregationAnalyzer

767f6af

instead of wrapping keys in NodeRef class. This should increase the performance of building map in a constructor which could be slow for certain, complicated queries.

cla-bot bot added the cla-signed label Nov 10, 2022

skrzypo987 requested review from Praveen2112, s2lomon, huberty89 and sopel39 November 10, 2022 09:42

sopel39 reviewed Nov 10, 2022

View reviewed changes

sopel39 requested review from kasiafi and martint November 10, 2022 09:54

findepi changed the title ~~Ude IdentityHashMap in AggregationAnalyzer~~ Use IdentityHashMap in AggregationAnalyzer Nov 10, 2022

s2lomon approved these changes Nov 10, 2022

View reviewed changes

martint requested changes Nov 16, 2022

View reviewed changes

skrzypo987 mentioned this pull request Dec 5, 2022

Prevent redundant collection rewriting in AggregationAnalyzer #15292

Merged

skrzypo987 closed this Dec 5, 2022

skrzypo987 deleted the skrzypo/129-replace-noderef-with-identityhashmap branch December 5, 2022 11:59

skrzypo987 restored the skrzypo/129-replace-noderef-with-identityhashmap branch December 5, 2022 11:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use IdentityHashMap in AggregationAnalyzer #14983

Use IdentityHashMap in AggregationAnalyzer #14983

skrzypo987 commented Nov 10, 2022

sopel39 Nov 10, 2022

skrzypo987 Nov 10, 2022

sopel39 Nov 10, 2022

skrzypo987 Nov 10, 2022

sopel39 Nov 10, 2022

findepi commented Nov 10, 2022

skrzypo987 commented Nov 10, 2022

findepi commented Nov 10, 2022

s2lomon left a comment

s2lomon Nov 10, 2022

s2lomon commented Nov 10, 2022

skrzypo987 commented Nov 10, 2022

findepi commented Nov 14, 2022

findepi commented Nov 14, 2022

skrzypo987 commented Nov 14, 2022

martint left a comment

skrzypo987 commented Dec 5, 2022

Use IdentityHashMap in AggregationAnalyzer #14983

Use IdentityHashMap in AggregationAnalyzer #14983

Conversation

skrzypo987 commented Nov 10, 2022

Description

Non-technical explanation

Release notes

sopel39 Nov 10, 2022

Choose a reason for hiding this comment

skrzypo987 Nov 10, 2022

Choose a reason for hiding this comment

sopel39 Nov 10, 2022

Choose a reason for hiding this comment

skrzypo987 Nov 10, 2022

Choose a reason for hiding this comment

sopel39 Nov 10, 2022

Choose a reason for hiding this comment

findepi commented Nov 10, 2022

skrzypo987 commented Nov 10, 2022

findepi commented Nov 10, 2022

s2lomon left a comment

Choose a reason for hiding this comment

s2lomon Nov 10, 2022

Choose a reason for hiding this comment

s2lomon commented Nov 10, 2022

skrzypo987 commented Nov 10, 2022

findepi commented Nov 14, 2022

findepi commented Nov 14, 2022

skrzypo987 commented Nov 14, 2022

martint left a comment

Choose a reason for hiding this comment

skrzypo987 commented Dec 5, 2022