Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JCache implementation not accepting value for maximumSize #171

Closed
nickrobison opened this issue Jul 4, 2017 · 24 comments
Closed

JCache implementation not accepting value for maximumSize #171

nickrobison opened this issue Jul 4, 2017 · 24 comments

Comments

@nickrobison
Copy link
Contributor

When configuring JCache to use the Caffeine implementation, I'm running into an issue where the value I've selected for maximum cache size is not being set in the configured cache. It looks like in the CacheFactory class, when a MutableConfiguration is passed in, only the copierFactory is copied from the default configuration.

When I modified the code to copy the maximumSize from the default, I can see the new value in the Cache config, but I'm not getting any evicted events when adding values beyond the maximum size.

@ben-manes
Copy link
Owner

Sorry that you are running into these issues. I probably relied too much on the TCK and use the native API in my projects.

It looks like the builder should be configured by the configureMaximumSize method. We should write a unit test.

@ben-manes
Copy link
Owner

ben-manes commented Jul 4, 2017

I see evictions occurring in the following test. Note that the processing is asynchronous, so we could add a configuration setting of the Executor (uses FJP.commonPool), which makes testing harder.

@Test(singleThreaded = true)
public final class JCacheMaximumSizeTest extends AbstractJCacheTest {
  private static final int MAXIMUM = 10;

  private final RemovalListener listener = new RemovalListener();

  @Override
  protected CaffeineConfiguration<Integer, Integer> getConfiguration() {
    CaffeineConfiguration<Integer, Integer> configuration = new CaffeineConfiguration<>();
    configuration.setMaximumSize(OptionalLong.of(MAXIMUM));
    CacheEntryListenerConfiguration<Integer, Integer> listenerConfiguration =
        new MutableCacheEntryListenerConfiguration<Integer, Integer>(() -> listener,
            /* filterFactory */ null, /* isOldValueRequired */ false, /* isSynchronous */ true);
    configuration.addCacheEntryListenerConfiguration(listenerConfiguration);
    return configuration;
  }

  @Test
  public void evict() {
    for (int i = 0; i < 2 * MAXIMUM; i++) {
      jcache.put(i, i);
    }
    Awaitility.await().until(() -> {
      for (Cache.Entry<Integer, Integer> entry : jcache) {
        jcache.get(entry.getKey()); // simulate usage to trigger pending evictions
      }
      return (listener.count.get() == MAXIMUM);
    });
  }

  private static final class RemovalListener
      implements CacheEntryRemovedListener<Integer, Integer> {
    final AtomicInteger count = new AtomicInteger();

    @Override
    public void onRemoved(Iterable<CacheEntryEvent<? extends Integer, ? extends Integer>> events) {
      count.incrementAndGet();
    }
  }
}

@ben-manes
Copy link
Owner

I updated the test. I think the confusion is because of the asynchronous call to the eviction listener. Instead of blocking on a global lock on every write, a tryLock is used instead with a state machine to coerce a drain if replaying a write is still needed. The operations are buffered in a write-ahead log approach, which is defers and batches the penalties which is how concurrency is achieved.

When you quickly write a bunch of values it triggers an eviction cycle and further writes move the state to REQUIRED. But that new write races with the async thread completing, which leaves it in the required state. When the next operation occurs, e.g. a read, then a new async cycle is scheduled. This means you can insert 20 items, but only 7 are evicted with the cache over capacity until the next read will evict the remaining 3. If the test doesn't do that it fails, which is why I simulate usage.

This high water mark generally works well in practice and the native APIs make it easier to test with, but the JCache API is more restrictive. If a same-thread executor was configurable then testing would be simpler, but without that option you have to do a little more work to coerce the state,

@ben-manes
Copy link
Owner

The only issue I can find is an unexpected merge behavior of on top of the default. If the default has the value set (e.g. maximum.size = 500) and the cache disables that (e.g. maximum.size = null), then on merge the result is 500. This fallback behavior bleeds through the underlying value when unset. To resolve this the query has to be more complex to identify that the configuration was disabled.

Initial prototyping shows this method seems to work,

private boolean isSet(String path) {
  return merged.hasPath(path) && customized.hasPathOrNull(path) && !customized.getIsNull(path);
}

where customized is the cache and merged has the fallback to default.

By chance is that related to your issue?

@nickrobison
Copy link
Contributor Author

nickrobison commented Jul 4, 2017

Regarding the 'no-evictions' issue, that's an error on my part. I didn't put together the fact that evictions were async, so my test code was flawed.

Regarding the max size, I can see your new changes, but the problem seems to be that the newly merged defaults are not getting returned from resolveConfigurationFor in CacheFactory. It looks like the configuration variable is of type MutableConfiguration which is a subclass of CompleteConfiguration` is thus stepping into the if block on line 95, which simply returns the configuration with only the copier factory added from the defaults.

If I modify the method as follows:

CaffeineConfiguration<K, V> defaults = TypesafeConfigurator.defaults(rootConfig);
    if (configuration instanceof MutableConfiguration<?, ?>) {
      defaults.setTypes(configuration.getKeyType(), configuration.getValueType());
      defaults.setStoreByValue(configuration.isStoreByValue());
      return defaults;

    }
    CaffeineConfiguration<K, V> config = new CaffeineConfiguration<>(
            (CompleteConfiguration<K, V>) configuration);
    config.setCopierFactory(defaults.getCopierFactory());
    return config;

Then I can see a new cache being created with the merged values from the config file, of course, that change breaks the TCK tests, as I'm not sure the difference between the two JCache config types.

@ben-manes
Copy link
Owner

Can you provide your unit test? Or if you think the changes are good enough for a PR then bundle the test with that. I'm too detached from the problem to fully grok the flow without a debugger.

The CaffeineConfiguration constructor does a lot of copying, including from MutableConfiguration, so my guess was that it was assumed to have done that work. But I don't see how this loses the maximum size yet, so failing code to dig into would be appreciated.

@ben-manes
Copy link
Owner

Oh, I think I see what your code change does. I guess before it wasn't merging the defaults at all properly, and since the TCK fails there must be a gap. We have to merge the fields from Configuration or CompleteConfiguration interfaces into the defaults, most likely.

I've been trying to fix travis and doing a bunch of forced pushes as I hack on the settings. If you can put a test failure together then I can help fix that between builds.

@ben-manes
Copy link
Owner

I think the assumption might have been that if you are not passing in a Configuration explicitly, then you are providing the full settings. But if you are resolving them from an external resource configuration, then any of those should be templated by the resource file's default type.

What it sounds like you want is to provide the standard JCache type, but also get any defaults from the configuration file applied? If so, then we'd want to merge them in fully in the resolve (and actually follow what the JavaDoc states). If we do that then the TCK is probably failing because we have settings like maximum size interfering with its tests.

Or if you think the two should continue to be independent, then defaults was a bad term and should have a pointer to a template. Then you could have multiple templates to customize on, rather than a global default.

I guess it comes down to best honors the principle of least surprise. I'm also willing to redesign the configuration file for v3 for clarity.

@nickrobison
Copy link
Contributor Author

The way I've been approaching my caching implementation, is to set as many properties as possible through the JCache API (into a MutableConfiguration) and then have caffeine specific implementation details be filled in via the reference.conf file. I'm also willing to modify my approach to what seems best to you. I'll admit to being a relatively new Java developer, with minimal production experience.

Here's a quick Unit test I put together, which is how I've implemented my JCache configuration, as well as how I've been thinking about the problem.

public class JCacheCaffeineConfigurationTest {
    private static final String PROVIDER_NAME = CaffeineCachingProvider.class.getName();
    private CacheManager cacheManager;
    private MutableConfiguration<String, String> cacheConfig;

    @BeforeClass
    public void beforeClass() {
        final CachingProvider provider = Caching.getCachingProvider(PROVIDER_NAME);
        cacheManager = provider.getCacheManager();
        cacheManager.destroyCache("cache-not-in-config-file");
        cacheConfig = new MutableConfiguration<>();
        cacheConfig
                .setTypes(String.class, String.class)
                .setStatisticsEnabled(true);
    }

    @Test
    public void validMaximumSize_createCache() {
        checkCaffeineConfiguration(() -> cacheManager.createCache("cache-not-in-config-file", cacheConfig), 500L);
        checkCaffeineConfiguration(() -> cacheManager.createCache("test-cache-2", cacheConfig), 1000L);
    }

    @Test
    public void validMaximumSize_getCache() {
        checkCaffeineConfiguration(() -> cacheManager.getCache("cache-not-in-config-file", String.class, String.class), 500L);
        checkCaffeineConfiguration(() -> cacheManager.getCache("test-cache-2", String.class, String.class), 1000L);
    }

    private void checkCaffeineConfiguration(Supplier<Cache<?, ?>> cacheSupplier, Long expectedValue) {
        final Cache<?, ?> cache = cacheSupplier.get();

        @SuppressWarnings("unchecked")
        final CaffeineConfiguration configuration = cache.getConfiguration(CaffeineConfiguration.class);
        assertThat(configuration.getMaximumSize(), is(expectedValue));
    }

}

@nickrobison
Copy link
Contributor Author

Also, major thanks for being so responsive on this issue! Sorry if I've interrupted other development plans.

@ben-manes
Copy link
Owner

This change passes the TCK and I think does what you wanted (and what I probably originally intended).

private <K, V> CaffeineConfiguration<K, V> resolveConfigurationFor(
    Configuration<K, V> configuration) {
  if (configuration instanceof CaffeineConfiguration<?, ?>) {
    return new CaffeineConfiguration<>((CaffeineConfiguration<K, V>) configuration);
  }

  CaffeineConfiguration<K, V> template = TypesafeConfigurator.defaults(rootConfig);
  if (configuration instanceof CompleteConfiguration<?, ?>) {
    CompleteConfiguration<K, V> complete = (CompleteConfiguration<K, V>) configuration;
    template.setReadThrough(complete.isReadThrough());
    template.setWriteThrough(complete.isWriteThrough());
    template.setManagementEnabled(complete.isManagementEnabled());
    template.setStatisticsEnabled(complete.isStatisticsEnabled());
    template.getCacheEntryListenerConfigurations()
        .forEach(template::removeCacheEntryListenerConfiguration);
    complete.getCacheEntryListenerConfigurations()
        .forEach(template::addCacheEntryListenerConfiguration);
    template.setCacheLoaderFactory(complete.getCacheLoaderFactory());
    template.setCacheWriterFactory(complete.getCacheWriterFactory());
    template.setExpiryPolicyFactory(complete.getExpiryPolicyFactory());
  }

  template.setTypes(configuration.getKeyType(), configuration.getValueType());
  template.setStoreByValue(configuration.isStoreByValue());
  return template;
}

@ben-manes
Copy link
Owner

I was planning on spending the 4th catching up on this project. Thanks for being patient and understanding regarding these issues. The JCache doesn't get as much attention as it is a quirky JSR, not very popular, and I don't use it myself.

The JSR really shouldn't have provided its own configuration classes, because all implementors have to extend them anyways. Since there is no bounds in the JSR's, its kind of useless. So my assumption was that almost everything would be defined in the configuration file since that is what the spec authors expected in their implementations, too.

Let me dig into your test and see how things match up.

@nickrobison
Copy link
Contributor Author

That makes sense. Well that's for being flexible with me. I think your changes look great. Ideally it would be nice to pull the named cache configs, if possible, and then fall back to the default. But that's something I can work on for a separate pull request.

@ben-manes
Copy link
Owner

I think this case is wrong,

checkCaffeineConfiguration(() -> cacheManager.createCache("test-cache-2", cacheConfig), 1000L);

From the JavaDoc it seems like either this should be honored as is and the declarative value ignored, or the declarative caches should be created immediately and an exception thrown. I'd have to dig into the other implementations to see if any eagerly resolve their caches rather than lazily as done now.

For the cache-not-in-config-file the changes pass your test.

@nickrobison
Copy link
Contributor Author

Not sure I'm following. Are you saying it's wrong to expect the cache to have a maximum size of 1000, as described in the typesafe config file?

@ben-manes
Copy link
Owner

Its not quite clear what the JSR expects. For createCache it states,

If a {@link Cache} with the specified name is unknown the {@link CacheManager}, 
one is created according to the provided {@link Configuration} after which it becomes
managed by the {@link CacheManager}.
...
There's no requirement on the part of a developer to call this method for each 
{@link Cache} an application may use.  Implementations may support the use of 
declarative mechanisms to pre-configure {@link Cache}s, thus removing the requirement
to configure them in an application.  In such circumstances a developer may simply call
either the {@link #getCache(String)} or {@link #getCache(String, Class, Class)} methods
to acquire a previously established or pre-configured {@link Cache}.
...
@throws CacheException if there was an error configuring the {@link Cache}, which
    includes trying to create a cache that already exists.

The configuration file is resolved lazily, so getCacheNames() is initially empty. At that point there is no cache with that name, so it would be created by the provided configuration.

Alternatively we could eagerly initialize the caches, which would cause the method to throw a CacheException. Since you can destroy a cache, then its not clear if a named one would be reloaded, getCache would be expected to load it lazily, or if the configuration shouldn't be queried after initialization.

@ben-manes
Copy link
Owner

It looks like Ehcache loads all of the caches at initialization and is thereafter is ignorant of their configuration file. So in your case that should throw a CacheException and you would use have to getCache. If you think its cleaner to follow that approach, then I could switch it over.

@nickrobison
Copy link
Contributor Author

nickrobison commented Jul 4, 2017

Ok, I think I'm tracking now. The EHCache approach seems to make the most sense. If we need to instantiate any new caches during the program execution (by calling createCache), they'll use the provided caffeine.jcache.default.* values, correct?

If so, that seems like the appropriate way to go.

@ben-manes
Copy link
Owner

And it looks like Hazelcast does it lazily.

@nickrobison
Copy link
Contributor Author

But regardless of when they instantiate the caches, if you have a named cache in the config file, you need to call getCache instead of createCache, correct?

@ben-manes
Copy link
Owner

If we merge the above change, then yes. This might help with the cache annotations which are very error prone because by default they are anonymously named, and the spec's configuration is unbounded. So a merge with caffeine.jcache.default could be helpful there.

If we make the config file resolved eagerly, then yes you would be forced to call getCache instead. But if you call destroyCache(name) then getCache(name) would return null and you would have to call createCache yourself.

Since the spec lead worked on both Ehcache's and Hazelcast's providers, its confusing that they differ here.

@nickrobison
Copy link
Contributor Author

Hah! And now you get to decide how your implementation will differ, fun for you!

I actually don't have a preference as when the caches are resolved (either eagerly or lazily), but clarifying that a named cached (in the config file) will result in a CacheException if you call createCache on it would probably be good, and resolving from the default configuration for any other caches does seem like a good practice.

Sorry if that's not a very helpful response.

@ben-manes
Copy link
Owner

I think the simplest refactoring is to disallow createCache when the name is defined externally. I could then refactor the CacheManager to eagerly resolve later. Then with the changes for defaulting, the exception should make the behavior clearer and you wouldn't have found it so surprising.

I'm planning on doing an overhaul of JCache when I can bump the major version. That's blocked on Java 9 since I'd like to replace the core library's usages of Unsafe with VarHandle.

One flaw is that the configuration doesn't have a key-type and value-type, so you have to use getCache(name) and not getCache(name, String.class, String.class). Is that a configuration option that you'd like me to add while I'm in here?

ben-manes added a commit that referenced this issue Jul 5, 2017
Previously defaults were only honored for resource configured caches. An
explicit cache configuration (via CacheManager#createCache) was not did
not use the defaults for non-JSR settings. This made default a poorly
named template only used by resource configurations.

It is now used in these cases as well, which reduces surprises. A nice
benefit is that anonymous caches created by the annotations are not
stuck on the JSR's defaults (unbounded, serialized). A safe threshold
can be set in the defaults to avoid memory leaks, etc.

Restricted CacheManager#createCache from overriding the externalized
configuration. As the caches are lazily loaded or destroyed, creating
a defined cache could be forced to a setting. That caused confusion on
when the layering, e.g. would the explicit layer on the externalized?

Added missing key-type and value-type configuration settings for getting
a cache by its name+types.
ben-manes added a commit that referenced this issue Jul 5, 2017
Previously defaults were only honored for resource configured caches. An
explicit cache configuration (via CacheManager#createCache) was not did
not use the defaults for non-JSR settings. This made default a poorly
named template only used by resource configurations.

It is now used in these cases as well, which reduces surprises. A nice
benefit is that anonymous caches created by the annotations are not
stuck on the JSR's defaults (unbounded, serialized). A safe threshold
can be set in the defaults to avoid memory leaks, etc.

Restricted CacheManager#createCache from overriding the externalized
configuration. As the caches are lazily loaded or destroyed, creating
a defined cache could be forced to a setting. That caused confusion on
when the layering, e.g. would the explicit layer on the externalized?

Added missing key-type and value-type configuration settings for getting
a cache by its name+types.
ben-manes added a commit that referenced this issue Jul 5, 2017
Previously defaults were only honored for resource configured caches. An
explicit cache configuration (via CacheManager#createCache) was not did
not use the defaults for non-JSR settings. This made default a poorly
named template only used by resource configurations.

It is now used in these cases as well, which reduces surprises. A nice
benefit is that anonymous caches created by the annotations are not
stuck on the JSR's defaults (unbounded, serialized). A safe threshold
can be set in the defaults to avoid memory leaks, etc.

Restricted CacheManager#createCache from overriding the externalized
configuration. As the caches are lazily loaded or destroyed, creating
a defined cache could be forced to a setting. That caused confusion on
when the layering, e.g. would the explicit layer on the externalized?

Added missing key-type and value-type configuration settings for getting
a cache by its name+types.
@ben-manes
Copy link
Owner

Released 2.5.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants