Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary fetches when using asynchronous bulk load #1409

Closed
mudiobada opened this issue Dec 20, 2023 · 5 comments · Fixed by #1550
Closed

Unnecessary fetches when using asynchronous bulk load #1409

mudiobada opened this issue Dec 20, 2023 · 5 comments · Fixed by #1550

Comments

@mudiobada
Copy link

mudiobada commented Dec 20, 2023

For a cache backed by AsyncCacheLoader that

  • loads values for keys that were not specifically requested ("pre-fetch")
  • and where getAll() is called in close successions.

Observed behavior:

Depending on timing, asyncLoadAll() may sometimes be called unnecessarily for already pre-fetched items.

A reproducer can be provided if needed (that injects artificial delays in the map returned by asyncLoadAll()).
The following notes on LocalAsyncCache should however explain the behavior:

  1. Completion of getAll() depends only on items that were actually requested
 return composeResult(futures);

https://github.com/ben-manes/caffeine/blob/v3.1.8/caffeine/src/main/java/com/github/benmanes/caffeine/cache/LocalAsyncCache.java#L149

  1. AsyncBulkCompleter completes futures in the following order - first requested items, then pre-fetched items:
        fillProxies(result);
        addNewEntries(result);

https://github.com/ben-manes/caffeine/blob/v3.1.8/caffeine/src/main/java/com/github/benmanes/caffeine/cache/LocalAsyncCache.java#L260

The above results in behavior subject to timing between completion of addNewEntries(result) and when the next getAll() is triggered.

If getAll() is triggered again after the Future returned by a previous getAll() completed, but before "pre-fetched" items are added to the cache by addNewEntries(result), then the second call of getAll() could re-trigger a fetch of items that were already fetched (but not yet added to the cache.

Assumption is that getAll() completion immediately after requested items are added to the cache (by fillProxies(result)) is an optimization.

For a scenario where asyncLoadAll() might be very expensive (as it is in my use case), this optimization is not the appropriate trade-off.

Question:

Would it be possible to add an option that allows completion of getAll() only after all fetched items have been added to the cache?
The effect of this should be the same as swapping the order of fillProxies(result); and addNewEntries(result); in AsyncBulkCompleter.

@ben-manes
Copy link
Owner

ben-manes commented Dec 20, 2023

I think it would be fine to reorder those lines or await on the whenComplete future as well, as a refinement to the behavior. It makes sense that a caller might think that all of those unrequested keys were stored by the time their future completes.

@mudiobada
Copy link
Author

It would be great if behavior could generally be changed to guarantee completion.
A wait on completion of both futures and proxies in getAll() would surely be easier to reason about and guard against regression in later releases.

@allanrodriguez
Copy link
Contributor

I ran into the same issue and thought I'd try my hand at the refinement with #1550.

@ben-manes
Copy link
Owner

Thanks! I'll take a look. There are a few things that I want to review while we're looking at this functionality.

  1. Can we write tests to assert this behavior to avoid regressions?
  2. What happens if the cancels, completes, or times out the returned future?
  3. Similarly, is there any error case due to the chaining that we need to be careful about?

I think your test case change was for 1. Offhand I don't recall if 2 & 3 were handled very well presently.

ben-manes pushed a commit that referenced this issue Feb 25, 2024
…l keys to be added to the cache before returning requested keys (#1550)

Fixes #1409.
@ben-manes
Copy link
Owner

Released in 3.2.0 (sorry for the delay)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants