Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Expiry implementations return currentTime + desiredLength instead of desiredLength #1499

Closed
cpovirk opened this issue Feb 1, 2024 · 8 comments

Comments

@cpovirk
Copy link
Collaborator

cpovirk commented Feb 1, 2024

@cgdecker reports that every Google Expiry implementation either (a) doesn't use currentTime or (b) uses it wrong.

One of the wrong implementations is similar to #441. I see that the docs try to warn people about that misunderstanding, thanks to 1d6fa59, so we can hope that that won't come up again.

The other two wrong implementations do what I said in the bug title: They behave as if they're supposed to return the Ticker time at which the entry should expire, rather than the length of time from now.

I could imagine ways to try to prevent this if we were writing the interface from scratch, like maybe sneaking "duration" into the method name (though I would have hoped that "expireAfterCreate" would have already hinted at that) or (nowadays) actually using the Duration type.

At the moment, the best option is some combination of further documentation and maybe renames of parameters. (I don't know what I'd call the parameters, though... "unrelatedTickerTime?" :)) I don't have brilliant thoughts on the exact doc edits to make, either. I'm happy to review your proposal or put together one of my own if you'd like.

@ben-manes
Copy link
Owner

I didn't expect Expiry to be used that often since it did not come up regularly for Guava's Cache. I think we could find ways to be less error prone, but also I wonder if those usages made sense vs expireAfterWrite? Do you think people are too quickly reaching for advanced functionality?

Ideally a Duration would have been used, but that would have significantly harmed performance due to expireAfterRead and allocating on every access for both the parameters and return types. That meant until value types land it isn't feasible and even then, unfortunately, Duration's implementation is ugly as it does expensive arithmetic which should be avoided in perf sensitive, high throughput code (e.g. divisions and modulus). If our interface was only for writes (create or update) then a duration would have been perfect because there are so many other costs that dominate (e.g. see this api sketch for variable refresh).

At times I've wondered if there was any nice static factory or builder trick that we could provide to make it easier to do the right thing? For example like how Comparator evolved with comparingX and chained thenComparingX fluent builders. At worst that could be a private utility outside of this library, like Guava's ComparisonChain did for the jdk, until we figure out something better. I haven't seen enough Expiry implementations to have a strong gut feeling on what a better API or builder syntax should look like. The goal would be coerce users towards a less error prone api that may be a little slower, without shutting the door on users who can work lower level for the performance gains. Since expireAfterRead is rarely correct, I think there is some type of create/update builders that could be done.

@cpovirk
Copy link
Collaborator Author

cpovirk commented Feb 2, 2024

Oh, that's a great point about the simpler API. Both callers look like this:

.expireAfter(
    new Expiry<String, Long>() {
      @Override
      public long expireAfterCreate(String key, Long value, long currentTime) {
        return currentTime + EXPIRATION;
      }

      @Override
      public long expireAfterUpdate(
          String key, Long value, long currentTime, long currentDuration) {
        return currentDuration;
      }

      @Override
      public long expireAfterRead(
          String key, Long value, long currentTime, long currentDuration) {
        return currentDuration;
      }
    })

That certainly sounds like they could probably both be just expireAfterWrite(EXPIRATION). Or I guess it depends on whether they really don't want to reset expiration after update?

I glanced at a few of our other Expiry implementations (not all, but we don't have a ton, anyway) and saw some implementations based on the wall-clock time, one that returns the same value from expireAfterCreate and expireAfterUpdate (so definitely a candidate for expireAfterWrite?), and one that actually uses a different expiration for different values. I don't think I have enough information to say anything confidently about a builder approach.

And for Duration, I'm so used to thinking about Java 8 and Android than I'd forgotten about the performance....

@cgdecker
Copy link

cgdecker commented Feb 2, 2024

Having looked at all of the implementations we have (like Chris said there aren't a ton of them), almost all of them either only set expiration on create (slightly more of these I think), or set expiration on both create and update (in the same way). There's I think one that changes expiration on read, and it sets the expiration to Long.MAX_VALUE. Most of them do use some kind of logic involving the key or value to calculate the expiration, so a fixed expiry duration for create or write wouldn't work for them.

As far as currentTime, I've had a hard time imagining what a user could possibly need it for. Entirely possible that's just a lack of understanding on my part, but I was feeling like maybe it ends up mostly being a trap.

@ben-manes
Copy link
Owner

I think that I supplied currentTime just because it was a known value, but not being wall clock meant that I didn’t know if it could be useful either. I also think that I was so focused on the implementation (timer wheels were uncommon) that I probably hand waved it as just an ignorable parameter. The lack of Duration types frustrated me, though, and I sorely missed having a community to discuss and bake apis before cementing them.

We unfortunately allowed both expireAfterWrite and expireAfterAccess to be set in Guava, and I allowed that here to simplify migrations. We could add a fixed expieAfterCreation as exclusive, it’s just odd for users why some can be mixed as if there’s a good use-case. Similarly if we added a callback version of those builder methods instead of fixed durations. It might be a reasonable restriction to add even if a breaking change as I don’t think multiple expiration settings makes much sense.

I suppose my ideal would be lambda versions in the cache builder for one-liners (fixed/variable create, write, access). Then Expiry should almost never be used, but could be to avoid the Duration tax.

@cgdecker
Copy link

cgdecker commented Feb 2, 2024

Yeah, lambda equivalents to fixed duration builder methods was an approach I was thinking might be nice. Differing exclusivity of them does sound a bit awkward though.

(Aside: Just want to be sure this isn't coming off as "why did you design it this way?!". It's just an interesting (?) observation I made when looking at usages.)

@ben-manes
Copy link
Owner

ben-manes commented Feb 4, 2024

Unfortunately any evolution of the apis here has some drawback that makes it potentially confusing.

Cache builder

If we extended the cache builder to accept lambda expressions then we'd have to decide on some of the forms

Caffeine.newBuilder()
    .expireAfterCreate(Duration.ofMinutes(5))
    .expireAfterCreate((K key, V value) -> duration)
    .expireAfterCreate((Map.Entry<K, V> entry) -> duration)
    .expireAfterWrite((K key, V value, Duration duration) -> duration)
    .expireAfterWrite((Map.Entry<K, V> entry, Duration duration) -> duration)

where expireAfterWrite's duration argument on entry creation would be Duration.ofNanos(Long.MAX_VALUE).

The Map.Entry types would be to avoid needing to define a new triFunction type, but becomes inconsistent and noise. Often developers forget that they can inline the parameter types so they instead write the hideous
(BiFunction<Map.Entry<K, V>, Duration, Duration>) (key, value, duration) -> duration. That favors an explicit function type, e.g. stashing it as Expiry.ExpiryFunction to hide it somewhere.

Another flaw is that Cache.policy() allows for inspection, where expireAfterWrite() is a Policy.FixedExpiration type. Instead users would be expected to realize are using a convenience method for expireVariably() which returns a VarExpiration<K, V> type. We'd also need be inclined to add the fixed expireAfterCreate(Duration) method for convenience and therefore a corresponding fixed inspection.

This mess stems from the late addition of variable expiration as we required amortized O(1) policies. Therefore, we started with an eager fixed approach (time-bounded linked list) rather than provide a problematic alternatives like a priority queue or require size eviction to lazily discard. If we had the option of a redo then everything would be variable (use our timer wheel) and have the spectrum of convenient builder methods for quick configuration.

Thankfully Cache.policy() is meant as a dumping ground to unblock developers and not intended to be pristine. So we could just document and hope those few who need it don't get too annoyed by the inconsistencies. 🤷‍♂️

Static factories

A factory approach is less discoverable but side-steps some of that legacy api. That could be singular or chained, e.g.

Caffeine.newBuilder()
    .expireAfter(Expiry.creating((Key key, Value value) -> duration))
    .expireAfter(Expiry.creating((Key key, Value value) -> duration)
        .updating((key, value, currentDuration) -> duration)
        .reading((key, value, currentDuration) -> duration))
    .expireAfter(Expiry.writing((Key key, Value value, Duration currentDuration) -> duration))

This would also need the triFunction and use a chain-of-responsibility pattern to call its delegate function. Then the duration conversion would only be needed on read if actually defined, with warnings of the performance impact.

Builder

This would be the same as a static factory, but carry the noise of a full-on type. That would allow for new'ing it up with the type arguments so the triFunction would be an optional choice but probably still desirable.

Caffeine.newBuilder()
    .expireAfter(new Expiry.Builder<K, V>()
        .create((key, value) -> duration)
        .update((key, value, currentDuration) -> duration)
        .read((key, value, currentDuration) -> duration)
        .build())
    .expireAfter(new Expiry.Builder<K, V>()
        .write((key, value, currentDuration) -> duration)
        .build())
    .build();

It is not too awful to read, though internally a builder is a lot of boilerplate.

Preferences? Alternatives?

I personally like the discoverability on the cache builder, as it is pretty obvious and natural for users to gravitate towards. Unfortunately how the apis have aged makes me lean towards the other options.

The static factory approach is concise. A possible flaw is that often those are overlooked, like our CacheLoader.bulk(function) or Guava's CacheLoader.from(function). The generics can also be a little confusing to infer so we might observe a lot of type witness code, e.g. Expiry.<K, V>creating(...), which is not overly attractive. The JavaDoc would probably hint that the type parameters can go in the lambda expression. I would only slightly favor this one but would be hard pressed to explain why.

The builder is the most verbose for everyone, but also perhaps the most familiar. It is easiest to extend with more convenience methods if needed, as it gathers the state until construction.

thoughts?

@ben-manes
Copy link
Owner

Experimenting with the api a bit, and the triFunction to include a currentDuration parameter doesn't make sense. The reasons why I thought it might didn't pan out, which is a good reduction. I think the cleanest approach is to use a static factory for create, write, and access which take a biFunction to return a Duration.

...expireAfter(Expiry.creating((Key key, Graph graph) -> duration))
...expireAfter(Expiry.writing((Key key, Graph graph) -> duration))
...expireAfter(Expiry.accessing((Key key, Graph graph) -> duration))

This isn't that much worse than having it directly on the cache builder, though if the api was redone then I'd have preferred that and reworked the fixed vs variable distinction. Perhaps I am being a bit too anxious as few look at the policy() api, so we can always promote it later if this is somehow not discoverable enough.

ben-manes pushed a commit that referenced this issue Feb 20, 2024
* Add Expiry static factory methods (fixes #1499)

* Bump Wandalen/wretry.action from 1.3.0 to 1.4.4

Bumps [Wandalen/wretry.action](https://github.com/wandalen/wretry.action) from 1.3.0 to 1.4.4.
- [Release notes](https://github.com/wandalen/wretry.action/releases)
- [Commits](Wandalen/wretry.action@a163f62...62451a2)

---
updated-dependencies:
- dependency-name: Wandalen/wretry.action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
ben-manes pushed a commit that referenced this issue Feb 20, 2024
* Add Expiry static factory methods (fixes #1499)

* Bump gitleaks/gitleaks-action from 2.3.2 to 2.3.3

Bumps [gitleaks/gitleaks-action](https://github.com/gitleaks/gitleaks-action) from 2.3.2 to 2.3.3.
- [Release notes](https://github.com/gitleaks/gitleaks-action/releases)
- [Commits](gitleaks/gitleaks-action@1f2d10f...cb7149a)

---
updated-dependencies:
- dependency-name: gitleaks/gitleaks-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@ben-manes
Copy link
Owner

Released in 3.2.0 (sorry for the delay)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants