Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make LocalCache not use synchronized to detect recursive loads. #6857

Merged
1 commit merged into from
Dec 7, 2023

Conversation

copybara-service[bot]
Copy link
Contributor

Make LocalCache not use synchronized to detect recursive loads.

Fixes #6851
Fixes #6845

RELNOTES=n/a

Comment on lines 2199 to 2203
// As of this writing, the only prod ValueReference implementation for which isLoading() is
// true is LoadingValueReference. However, that might change, and we already have a *test*
// implementation for which it doesn't hold. So we check instanceof to be safe.
Copy link
Contributor

@ben-manes ben-manes Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix compute bugs, I added ComputingValueReference extends LoadingValueReference where isLoading is false (42bf4f4). You'd still want recursive detection, so I think checking by the type instead of the flag is better here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, this is complicated! Thanks.

Hmm, does the right thing even currently happen if someone starts a load when compute is already running? If ComputingValueReference returns false from isLoading, then would we call valueReference.get()... and then either get back null (if there was no previous value) and say an entry was COLLECTED (which seems iffy) or get back the previous value (which might be fine but which I haven't thought about)? We probably never get to waitForLoadingValue and the deadlock detection at all? (If we did, we'd trigger the AssertionError above!)

Maybe there's sketchy stuff going on but this PR at least preserves existing behavior?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#5439? #3719? I am going to try not to get sucked much deeper into this. Use Caffeine, everyone! (And hopefully we can still make this "simple" virtual-thread fix :))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, there are lots of little gotchas and I may have introduced some in my fix attempts too. I'd only suggest modifying that comment and certainly not expand the scope. Even Doug had challenges with recursive computes and there are so many interactions that I couldn't possibly keep them all in my head (hence 10M test executions per build in hopes to catch regressions).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I have tweaked the comment to avoid implying that instanceof LoadingValueReference implies isLoading().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, does the right thing even currently happen if someone starts a load when compute is already running?

oh, to answer your question this cannot happen because the compute methods are implemented very coarsely. They hold the segment lock during the computation which blocks all other writes to that segment, meaning a loader would wait to obtain the lock and by then the computing value reference would be gone.

The alternative would be to use retry loops on all write paths by taking the segment lock, if loading then waiting on the entry to complete outside of the segment lock, or else perform its entry action accordingly. That would have been far more invasive and may not have maintained linearizability, so I believe simpler approach was used just to avoid accidents as the default method version is non-atomic.

ConcurrentHashMap uses fine-grained segments that are dynamically added, so it similarly locks rather than using retry loops. Caffeine delegates loading to computeIfAbsent as merely an alias, where the only negative effect is long-running computes could impact siblings. Thus, AsyncCache mitigates that when an issue since it sacrifices linearization of the loading value (only of the future mapping) so that the map operations are fast, and the synchronous view uses the retry loops approach as best-effort (not linearizable as it fails Lincheck).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good. I'm glad that we at least do not have an additional bug on top of the one you already reported... :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, can you give me an update regarding the current status of this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry.

The testing results are good. My reviewer for the internal change had some other things come up, so I'll reassign the review to someone else.

While we're here: How important is it to you for us to publish a release that contains this change? I'm hoping to publish one soonish no matter what, but it would be useful to know specifically about this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem,

Ok 👍 .

I don't need one right away, if that's what you mean. I have built a version of guava from this branch locally which I am already using for some tests, and that's totally fine for now. It would of course be cool if it were already included in the next release, but it is not a necessity for me.

@copybara-service copybara-service bot force-pushed the test_586666218 branch 2 times, most recently from 4117d35 to 8980fca Compare December 7, 2023 14:38
Fixes #6851
Fixes #6845

RELNOTES=n/a
PiperOrigin-RevId: 588778069
@copybara-service copybara-service bot closed this pull request by merging all changes into master in 1512730 Dec 7, 2023
@copybara-service copybara-service bot closed this Dec 7, 2023
@copybara-service copybara-service bot deleted the test_586666218 branch December 7, 2023 15:00
// Synchronizes on the entry to allow failing fast when a recursive load is
// detected. This may be circumvented when an entry is copied, but will fail fast most
// of the time.
synchronized (e) {
return loadSync(key, hash, loadingValueReference, loader);
}
return loadSync(key, hash, loadingValueReference, loader);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for this post merge comment, I was expecting to find this change on current Guava release, however it looks like the current version on master is still using synchronized (see below). Was this change reverted?

https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/LocalCache.java#L2186-L2191

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem at all.

Indeed, we reverted this in https://github.com/google/guava/releases/tag/v33.1.0 because of a bug: #6851 (comment)

From what I understand, the JDK issue will be fixed in Java 24: openjdk/jdk#21565

But I know that's not much help now :/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick reply! it looks like the suggested path is migrating to Caffeine, I'll look into that. Thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're welcome.

To be clear, Caffeine also uses synchronized, so its users also need the JDK fix to eliminate the problem with virtual threads. It just offers plenty of other advantages over Guava :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, due the the dependency with Java ConcurrentHashMap, however it looks like AsynCache is not affected ben-manes/caffeine#779

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove synchronized from LocalCache::get(key, loader) to allow for VirtualThread-friendly value-loading
4 participants