-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expiry javadoc is a bit misleading #217
Comments
Sorry about that. We should add some unit tests against this then. The time calculation should be tolerant to overflows by following the advice in System.nanoTime. It should be using If I recall correctly, the Async situation is slightly different. It is used as a signal that the future is in-flight and should not be evaluated for expiration. Since the future can complete anytime, it is helpful to have that flag separated for querying the future's state. By capping the expiry that a user can set, it lets us differentiate between the states. But I don't recall enough if that safe-guard was necessary, as there is a lot of mental overhead to keep track of these details. Can you write a unit test and we can fix your issue? |
Hi @facboy, I did a quick sanity check and was unable to reproduce this issue. We'll keep digging and make sure there is test coverage. Did you observe this bug or was it a mistake when reading the JavaDoc? |
sorry i've not had a chance to write a unit test yet. i observed the bug. |
In case this helps, here is my passing test. Hopefully you can tweak it to fail and we can understand the problem. @Test
public void issue217() {
FakeTicker ticker = new FakeTicker();
ticker.advance(ThreadLocalRandom.current().nextLong());
Cache<Integer, Integer> cache = Caffeine.newBuilder()
.ticker(ticker::read)
.executor(Runnable::run)
.expireAfter(new Expiry<Integer, Integer>() {
@Override public long expireAfterCreate(Integer key, Integer value, long currentTime) {
return Long.MAX_VALUE;
}
@Override public long expireAfterUpdate(Integer key, Integer value, long currentTime,
long currentDuration) {
return Long.MAX_VALUE;
}
@Override public long expireAfterRead(Integer key, Integer value, long currentTime,
long currentDuration) {
return Long.MAX_VALUE;
}})
.removalListener((k, v, c) -> {
Assert.fail(k + " " + v + " " + c);
})
.build();
cache.put(1, 2);
assertThat(cache.getIfPresent(1), is(2));
assertThat(cache.estimatedSize(), is(1L));
ticker.advance(1, TimeUnit.MINUTES);
assertThat(cache.getIfPresent(1), is(2));
assertThat(cache.estimatedSize(), is(1L));
} |
well this is pretty annoying, i cannot reproduce it in a unit test so started wondering if i'd imagined the whole thing. back to our production code and when i use Long.MAX_VALUE, it definitely breaks, so i'll have to whittle back from there. |
ah, the simple approach: https://github.com/facboy/caffeine-test |
Thanks! That is very odd, but something we can debug with. Interestingly it takes 1.5M operations to fail, though this is using real time instead of a fake source so that may be understandable. Since my test suite primarily uses TestNG, here is a slight adaption to run there. @rash67 wants to get his hands dirty, so hopefully he can take a look. If not, I'll try to resolve it over the weekend. @Test
@SuppressWarnings({"unchecked", "RedundantStringConstructorCall", "InfiniteLoopStatement"})
void maxExpiryShouldNotExpireImmediately() {
LoadingCache<Integer, String> cache = Caffeine.newBuilder()
.expireAfter(new Expiry<Integer, String>() {
@Override
public long expireAfterCreate(Integer key, String value, long currentTime) {
return Long.MAX_VALUE;
}
@Override
public long expireAfterUpdate(Integer key, String value, long currentTime,
long currentDuration) {
return Long.MAX_VALUE;
}
@Override
public long expireAfterRead(Integer key, String value, long currentTime,
long currentDuration) {
return Long.MAX_VALUE;
}
})
.removalListener((k, v, c) -> {
//Assert.fail(k + " " + v + " " + c);
}).build(key -> new String("10"));
int counter = 0;
String expected = cache.get(10);
Stopwatch stopwatch = Stopwatch.createStarted();
while (stopwatch.elapsed(TimeUnit.MINUTES) < 1) {
String val = cache.get(10);
assertThat("Failed after " + counter + " iterations", val, sameInstance(expected));
counter++;
}
} |
I don't think it's related but is there also a bug with the Possibly the error is here in the generated
Should it append 'W' if |
To expire without any interaction with the cache would require a thread to schedule on. In Guava's Cache timeframe, there were some environments like JavaEE and AppEngine that disallowed creating threads. It is a fairly high cost when typically the promptness is not necessary. Java 9 added a JVM-wide scheduler (via |
oh i see, i thought that was what the timerWheel did but I didn't look at it in great detail. i guess the wheel only advances on interaction with the cache? EDIT: doesn't it schedule evictions etc using the ForkJoinPool? |
Yes, the timerwheel is a passive data structure. There is a nice article regarding Kafka's implementation that schedules on a thread. They combine the timerwheel with a |
would be interesting if this was added as an option :). for my current use-case i don't need them to expire in a particularly timely manner, i've just added an external background thread that invokes |
#195 is basically asking for this. It has definitely come up many times for Guava, etc. The cost of maintaining dedicated threads is high for a library, and while we can use FJP in JDK8 it doesn't support any scheduling semantics. The hope is that we can use optionally enable using CompletableFuture.html#delayedExecutor to trigger a That's the idea of how to do it properly, at least. I am pretty behind (on this bug fix and v3.0 release in general). |
@facboy Can you try testing with I think the issue might be that threads read different timestamps when calling |
hmm. so it fixes the test-case i gave you, but it doesn't fix the 'actual' problem i was having which is still in private code. if it helps this is the partial stack trace of the removalListener when it is expired (the reason
there might be an additional issue. the code where I first observed this (ie not the test-case) is not doing a lot of reads...probably in the low 10s at most as it's a small integration test. |
It’s likely that the async thread has an older time stamp and the overflow occurs since it’s a tight boundary. I just need to better accustom myself to this. I think that the solution you gave will be the right fix, though. |
…217) When the duration is set to the maximum length, Long.MAX_VALUE nanoseconds, the calcuation of expirationTime - currentTime > 0 may overflow and be negative. This will not occur if the same thread calculates both timestamps. It may occur across threads when the expirationTime is concurrently updated using a later base time than t1's reading of the currentTime. This can occur whenever the maintenance work is triggered to sweep expired entries and a user thread accesses the entry. The later timestamp plus the maximum duration results in an overflow, causing the remaining time to be negative, and therefore causes the cache to expire the entry. The internal maximum is now capped at Long.MAX_VALUE / 2 or ~150 years. This should give a broad safety net to avoid these concurrency-inducing overflows in normal code.
I generalized the test to a pass/fail scenario, showing that the concurrency due to threading was the cause. When the read buffer is full (~16 reads per stripe), the async maintenance is triggered. The entry continues to be read, bumping its expiration time beyond the overflow point. This means that the maintenance's currentTime read is less than the entry's currentTime read, so t1 - (t0 + duration) will overflow with Long.MAX_VALUE. If all operations are using a single thread, this cannot occur due to order of operations causing linear timestamps. Since the unit tests default to same-thread cleanup for simplicity, it was an oversight on my part. I updated the internals to always enforce a Long.MAX_VALUE / 2 as you suggested to provide enough leeway for the overflow to not occur in practical code. I'll make a release tomorrow and will let the CI do its thing tonight. Thanks for reporting the issue. |
…217) When the duration is set to the maximum length, Long.MAX_VALUE nanoseconds, the calcuation of expirationTime - currentTime > 0 may overflow and be negative. This will not occur if the same thread calculates both timestamps. It may occur across threads when the expirationTime is concurrently updated using a later base time than t1's reading of the currentTime. This can occur whenever the maintenance work is triggered to sweep expired entries and a user thread accesses the entry. The later timestamp plus the maximum duration results in an overflow, causing the remaining time to be negative, and therefore causes the cache to expire the entry. The internal maximum is now capped at Long.MAX_VALUE / 2 or ~150 years. This should give a broad safety net to avoid these concurrency-inducing overflows in normal code.
…217) When the duration is set to the maximum length, Long.MAX_VALUE nanoseconds, the calcuation of expirationTime - currentTime > 0 may overflow and be negative. This will not occur if the same thread calculates both timestamps. It may occur across threads when the expirationTime is concurrently updated using a later base time than t1's reading of the currentTime. This can occur whenever the maintenance work is triggered to sweep expired entries and a user thread accesses the entry. The later timestamp plus the maximum duration results in an overflow, causing the remaining time to be negative, and therefore causes the cache to expire the entry. The internal maximum is now capped at Long.MAX_VALUE / 2 or ~150 years. This should give a broad safety net to avoid these concurrency-inducing overflows in normal code.
ah great, thanks for that! |
…217) When the duration is set to the maximum length, Long.MAX_VALUE nanoseconds, the calcuation of expirationTime - currentTime > 0 may overflow and be negative. This will not occur if the same thread calculates both timestamps. It may occur across threads when the expirationTime is concurrently updated using a later base time than t1's reading of the currentTime. This can occur whenever the maintenance work is triggered to sweep expired entries and a user thread accesses the entry. The later timestamp plus the maximum duration results in an overflow, causing the remaining time to be negative, and therefore causes the cache to expire the entry. The internal maximum is now capped at Long.MAX_VALUE / 2 or ~150 years. This should give a broad safety net to avoid these concurrency-inducing overflows in normal code.
…217) When the duration is set to the maximum length, Long.MAX_VALUE nanoseconds, the calcuation of expirationTime - currentTime > 0 may overflow and be negative. This will not occur if the same thread calculates both timestamps. It may occur across threads when the expirationTime is concurrently updated using a later base time than t1's reading of the currentTime. This can occur whenever the maintenance work is triggered to sweep expired entries and a user thread accesses the entry. The later timestamp plus the maximum duration results in an overflow, causing the remaining time to be negative, and therefore causes the cache to expire the entry. The internal maximum is now capped at Long.MAX_VALUE / 2 or ~150 years. This should give a broad safety net to avoid these concurrency-inducing overflows in normal code.
Released. Thanks! |
The javadoc for
com.github.benmanes.caffeine.cache.Expiry
states:This isn't strictly accurate as if you use
Long.MAX_VALUE
the expiry (now + duration
) overflows and you end up with a negative eviction time, ie the entry gets evicted almost immediately.It looks like this scenario is guarded against in the Async code by restricting the effective duration to
(Long.MAX_VALUE >> 1)
but this isn't done incom.github.benmanes.caffeine.cache.BoundedLocalCache
.The text was updated successfully, but these errors were encountered: