[Performance] Full build takes more time since 2024-09 (type inference) #3387

stephan-herrmann · 2024-12-03T17:52:49Z

optimize 3:

never visit the same super type more than once

(eclipse-jdt#3384) optimize 3: + never visit the same super type more than once Fixes eclipse-jdt#3327

stephan-herrmann · 2024-12-03T18:10:30Z

For a moment I was tricked by the discussion about O(n^2) vs O(n) without saying what is n :)

The requirement put forward by JLS (and thus not subject to our choosing) is similar to O(n^2), by saying that all pairs of supertypes of types s and t must be inspected. The implementation actually cuts this down to two linear-ish searches. See that t never changes in all recursive invocations. This part of the equation is always handled using findSuperTypeOriginatingFrom() which performs a limited search in that hierarchy (limited in that it stops when it find a "matching" type).

What still looks like some combinatorial explosion is in fact a difference between the number of paths vs nodes in the inheritance hierarchy. Viz, the original implementation traversed all paths from t up to j.l.Object whereas visiting each node in that graph is fully sufficient. This is less an issue of the depth of a hierarchy but more of its "volume", it must be deep and wide with lots of joining edges in the graph for effects to show.

Limiting that traversal is what the cache achieves. In contrast to #3333 (comment) no cache of pairs is needed, since s is constant during a traversal :)

Additionally, using prototype() I ensure that differently annotated variants of the same type are still recognized as same.

I'm still a bit uneasy about using TypeBinding in a hashed collection, but looking at the comment in ReferenceBinding.hashCode() this could still be safe as long as no UnresolvedReferenceBinding are in the lot (because those intentionally break uniqueness of hashCode()).

stephan-herrmann · 2024-12-03T18:18:34Z

FWIW, using the reproducer my poor man's measurements yield:

	min	max
real	0m4,050s	0m4,656s
user	0m8,669s	0m10,040s
sys	0m0,355s	0m0,423s

which is not such a significant gain over results from #3384:

	min	max
real	0m4,210s	0m6,360s
user	0m8,628s	0m9,131s
sys	0m0,316s	0m0,393s

stephan-herrmann · 2024-12-03T18:23:19Z

@fedejeanne I marked this PR as draft, because I'd like to get feedback on results from #3384 in real life, before stacking more speculative optimization on top of the previous one. I mean we've seen the differences in counts, but not so much in compile time.

stephan-herrmann · 2024-12-03T18:35:32Z

which is not such a significant gain over results from #3384:

The reason is: while we still visit a significant number of types, we don't perform any actual work. The list of pairs already has length zero thanks to #3384 :)

fedejeanne · 2024-12-04T07:33:54Z

@stephan-herrmann understood, I was about to check the performance gain anyway when I read your comments :-)

I am setting up my IDE right now and I will measure performance with #3384 right away.

jukzi · 2024-12-04T08:43:47Z

I'm still a bit uneasy about using TypeBinding in a hashed collection, but looking at the comment in ReferenceBinding.hashCode() this could still be safe as long as no UnresolvedReferenceBinding are in the lot (because those intentionally break uniqueness of hashCode()).

This PR uses hashCode(), equals() only during the search. The cache is not kept on heap. The search itself should not (but may? - that would be bad) modify the hashcode during the search.
The hashcode implementation looks horribly bad: equals() and hashCode() should always be in sync. I wonder where this hashcode is already used and if it would not be better fixed.
As workaround this PR could use ReferenceBindingSetWrapper that already tried to workaround that problem.
The Implementation of org.eclipse.jdt.internal.compiler.lookup.ReferenceBinding.identityHashCode() is bad, as it relies on the fact that super does not override hashCode. ReferenceBindingSetWrapper should use System.identityHashCode instead.

fedejeanne · 2024-12-04T08:59:48Z

FTR, even in its current state and with the bad implementations of hashCode() / equals(), I rebased this PR on master (so #3384 is also applied) and run a full build: it took ~12min 👏

I just happened to have VisualVM still open so I am adding also a screenshot of the Monitor view. FYI the build took time between ~9:23 to ~9:35 (sorry that I can not be more precise, I wasn't looking at the watch).

This is the sample (sampled every 20ms): v20241203-1907 and 3387-Full build (clean workspace).zip

And one can see that the hotspots changed:

allSuperPairsWithCommonGenericType and allSuperPairsWithCommonGenericTypeRecursive are way down in the hotspots view:

jukzi · 2024-12-04T09:20:33Z

Good, that's what i expected.

BTW: you should not trust the Heap/Memory view while you are sampling. Sampling does too much memory overhead.
You may also want to investigate in new Issues why you have hotspots in indexOfChildren() and findSuperTypeOriginatingFrom() [the later may be similar problem as this]. Never ever should you have any CPU hotspot during compile expect File IO, as compiling is not really a computational intensive problem but should be mostly lookups.

fedejeanne · 2024-12-04T09:34:05Z

BTW: you should not trust the Heap/Memory view while you are sampling. Sampling does too much memory overhead.

I wasn't aware of that, thank you for the tip!

You may also want to investigate in new Issues why you have hotspots in indexOfChildren() and findSuperTypeOriginatingFrom() [the later may be similar problem as this]. Never ever should you have any CPU hotspot during compile expect File IO, as compiling is not really a computational intensive problem but should be mostly lookups.

Hm, I didn't know that either. Do you have any hints about what should I look for when debugging these methods? I honestly assumed that these hotspots were "normal" since they were already there in the sampling I did with the good old fast 2024-06 in the original issue (#3327 (comment))

Also, this PR brings down the number of invocations on those 2 methods almost to those of 2024-06 so I don't know how much more there is to improve in there. I can certainly take a look at them after merging this PR though (since this PR improves both of them, I mean). I am always happy to help track and improve performance bottlenecks :-)

jukzi · 2024-12-04T10:18:17Z

Do you have any hints about what should I look for when debugging these methods?

Same procedure: first step is to gather data why it is slow. Typically a breakpoint where it takes longest time (i.e. most elements in the loop) or statistic of the arguments used. second: create a reproducer. At best you could even write a junit test that triggers the ill behavior. At that point others can easily jump in to find a solution.
The potential gain is a reduction of up to another 1.5 minutes (total time cpu):

stephan-herrmann · 2024-12-04T23:30:15Z

I'm still a bit uneasy about using TypeBinding in a hashed collection, but looking at the comment in ReferenceBinding.hashCode() this could still be safe as long as no UnresolvedReferenceBinding are in the lot (because those intentionally break uniqueness of hashCode()).

This PR uses hashCode(), equals() only during the search. The cache is not kept on heap. The search itself should not (but may? - that would be bad) modify the hashcode during the search.

no modification in sight

The hashcode implementation looks horribly bad: equals() and hashCode() should always be in sync. I wonder where this hashcode is already used and if it would not be better fixed.

Please let's be extremely careful about "fixing". The current implementation is necessary for semantic correctness and has served this purpose without any reported issues for 20 years.

In fact, scanning the compiler code, there are several locations using TypeBinding as keys in a HashMap. Again with no reported anomalies.

Given those precedents, I'll stop worrying at this particular location. If desired, the general topic can be picked up in a follow-up ticket.

stephan-herrmann · 2024-12-04T23:38:07Z

You may also want to investigate in new Issues why you have hotspots in indexOfChildren() and findSuperTypeOriginatingFrom() [the later may be similar problem as this].

findSuperTypeOriginatingFrom() is part of the algorithm being optimized here, so no surprise if you ask me.

Never ever should you have any CPU hotspot during compile expect File IO, as compiling is not really a computational intensive problem but should be mostly lookups.

I'm not sure I'm following the reasoning here. Perhaps there's an unintended "not" involved? Or should it read except instead of expect? As you see I'm clueless :)

stephan-herrmann · 2024-12-04T23:42:05Z

FTR, even in its current state and with the bad implementations of hashCode() / equals(), I rebased this PR on master (so #3384 is also applied) and run a full build: it took ~12min 👏

So, in the real world case there seems to be something that is not captured in the reproducer. With this gain I agree that there is value in the 3rd level optimization and will merge it, so it will get thorough field testing before release.

fedejeanne · 2024-12-05T09:28:35Z

FTR, even in its current state and with the bad implementations of hashCode() / equals(), I rebased this PR on master (so #3384 is also applied) and run a full build: it took ~12min 👏

So, in the real world case there seems to be something that is not captured in the reproducer. With this gain I agree that there is value in the 3rd level optimization and will merge it, so it will get thorough field testing before release.

Great, thank you!

I'm looking forward to tomorrow's I-BUILD and I will use it ASAP.

szarnekow · 2024-12-05T10:16:45Z

org.eclipse.jdt.core.compiler.batch/src/org/eclipse/jdt/internal/compiler/lookup/BoundSet.java

@@ -1277,11 +1283,11 @@ protected List<Pair<TypeBinding>> allSuperPairsWithCommonGenericType(TypeBinding
 		if (tSuper != null && s.isParameterizedType() && tSuper.isParameterizedType()) { // optimization #1 again


Assuming s.isParameterizedType is a very cheap operation and findSuperTypeOriginatingFrom is not:
The lookup of tSuper can be guarded by s.isParameterizedType():

TypeBinding tSuper = s.isParameterizedType() ? t.findSuperTypeOriginatingFrom(s) : null; if (tSuper != null && tSuper.isParameterizedType()) { result.add(new Pair<>(s, tSuper)); }

I can try this tomorrow and report back. Thank you for the hint!

Incredible... I had to sample twice because I couldn't believe my eyes 😄

Thank you, @szarnekow ! --> #3411

[Performance] Full build takes more time since 2024-09 (type inference)

68514d4

(eclipse-jdt#3384) optimize 3: + never visit the same super type more than once Fixes eclipse-jdt#3327

stephan-herrmann marked this pull request as draft December 3, 2024 18:21

This was referenced Dec 3, 2024

Try with a simpler search in BoundSet::incorporate(...) before falling back to a more thorough (and expensive) search #3333

Closed

[Performance] Full build takes more time since 2024-09 (type inference) #3384

Merged

jukzi added the performance label Dec 4, 2024

stephan-herrmann marked this pull request as ready for review December 4, 2024 23:42

stephan-herrmann added this to the 4.35 M1 milestone Dec 4, 2024

stephan-herrmann merged commit 7ebcecc into eclipse-jdt:master Dec 4, 2024
10 checks passed

stephan-herrmann deleted the issue3327_2 branch December 4, 2024 23:44

szarnekow reviewed Dec 5, 2024

View reviewed changes

fedejeanne mentioned this pull request Dec 6, 2024

Guard lookup in BoundSet::allSuperPairsWithCommonGenericTypeRecursive #3411

Merged

jukzi mentioned this pull request Dec 6, 2024

TypeBinding hashCode may be bad #3412

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Full build takes more time since 2024-09 (type inference) #3387

[Performance] Full build takes more time since 2024-09 (type inference) #3387

stephan-herrmann commented Dec 3, 2024

stephan-herrmann commented Dec 3, 2024 •

edited

Loading

stephan-herrmann commented Dec 3, 2024

stephan-herrmann commented Dec 3, 2024

stephan-herrmann commented Dec 3, 2024

fedejeanne commented Dec 4, 2024

jukzi commented Dec 4, 2024

fedejeanne commented Dec 4, 2024

jukzi commented Dec 4, 2024

fedejeanne commented Dec 4, 2024

jukzi commented Dec 4, 2024

stephan-herrmann commented Dec 4, 2024

stephan-herrmann commented Dec 4, 2024

stephan-herrmann commented Dec 4, 2024

fedejeanne commented Dec 5, 2024

szarnekow Dec 5, 2024

fedejeanne Dec 5, 2024

fedejeanne Dec 6, 2024

		@@ -1277,11 +1283,11 @@ protected List<Pair<TypeBinding>> allSuperPairsWithCommonGenericType(TypeBinding
		if (tSuper != null && s.isParameterizedType() && tSuper.isParameterizedType()) { // optimization #1 again

[Performance] Full build takes more time since 2024-09 (type inference) #3387

[Performance] Full build takes more time since 2024-09 (type inference) #3387

Conversation

stephan-herrmann commented Dec 3, 2024

stephan-herrmann commented Dec 3, 2024 • edited Loading

stephan-herrmann commented Dec 3, 2024

stephan-herrmann commented Dec 3, 2024

stephan-herrmann commented Dec 3, 2024

fedejeanne commented Dec 4, 2024

jukzi commented Dec 4, 2024

fedejeanne commented Dec 4, 2024

jukzi commented Dec 4, 2024

fedejeanne commented Dec 4, 2024

jukzi commented Dec 4, 2024

stephan-herrmann commented Dec 4, 2024

stephan-herrmann commented Dec 4, 2024

stephan-herrmann commented Dec 4, 2024

fedejeanne commented Dec 5, 2024

szarnekow Dec 5, 2024

Choose a reason for hiding this comment

fedejeanne Dec 5, 2024

Choose a reason for hiding this comment

fedejeanne Dec 6, 2024

Choose a reason for hiding this comment

stephan-herrmann commented Dec 3, 2024 •

edited

Loading