Notable part of caffeine.jar is consumed by generated classes #110

vlsi · 2016-08-09T09:22:16Z

caffeine-2.3.1.jar -- 968137 bytes
caffeine-2.3.1.jar without LocalCacheFactory$... and NodeFactory$... classes -- 248414 bytes

Are there any ideas/plans to reduce that?

ben-manes · 2016-08-09T20:12:59Z

The intent is of course to reduce the in-memory overhead due to the many configuration options at the cost of larger disk footprint (unloaded classes).

I think the large enum used by NodeFactory could be removed. The simplest would be to getFactory compute a numeric code, and return a new factory instance whose methods switch on that code. This would remove the bloat for a tiny runtime cost of selection.

LocalCacheFactory has a string switch statement. Using a numeric code would be slightly smaller by reducing the interned strings.

I tried to use inheritance to reduce duplicate code. I think Node was optimal and LocalCache was close, but not perfect. Perhaps there are some more tricks to optimize reuse, e.g. interfaces with default methods. We could also remove the modifiers, as perhaps the JVM is creating unnecessary bridge methods.

The biggest win would be to review LocalCacheFactory features to determine if they carry their weight. Perhaps that factory should be removed and all the fields present, but null'd if unused. The top-level cost is smaller than per-node, which was my original intent for using codegen.

This patch expands on the suggestion in ben-manes#110 (comment) Instead of the proposed numeric code the name of factory is used as the classname to instantiate using reflection. This reduces the class size with 21K bringing the jar size to 786K.

Maaartinus · 2017-11-04T00:06:18Z

The biggest win would be to review LocalCacheFactory features to determine if they carry their weight. Perhaps that factory should be removed and all the fields present, but null'd if unused. The top-level cost is smaller than per-node, which was my original intent for using codegen.

I could imagine, that keeping the small factories small makes sense, but for example, WISMW has tons of fields. Adding all subclass fields to it would only grow it by a few percent. So you could generate only WISMWAWR (with some runtime tests) and use it in place of WISMW and all its subclasses. I might be using wrong names as without being really familiar with the code, they're hard to tell apart.

But such optimizations are ugly. For it, to be efficient, you may have to reorder the hierarchy so that the heaviest additions come first. Let's say, you deal with evicts first. Any evicting factory is already huge and you can add all fields and save half the hierarchy. Adapting the generator could be rather easy (nice code!), but changing the hierarchy could be confusing and the reward unsure.

The size optimizations for nodes are clearly much more important and in the long run, you'll probably need something smart as the code size doubles with every new feature. A funny optimization would be to declare field1, field2, etc. (so that classes would differ only by their count), and use them via methods using offsets, but for this is even Unsafe too much high-level (and the fields are of different types).

ben-manes · 2017-11-04T00:34:37Z

For variable expiration, I decided to avoid the extra codegen and encoded it onto the existing timestamp fields. Since it cannot be used with fixed expiration (expireAfterAccess or expireAfterWrite), but can be used with refresh (refreshAfterWrite), this was a bit of a dance. But the minimum number of fields and no new classes was the result, with just a few new methods.

The enum in NodeFactory appears to account for 100kb compressed. That's determined by generating with just a single enum and comparing the jar sizes. That enum could be replaced by a factory (strong keys and subclassed for weak). If we use MethodHandles, we should avoid most of the cost of reflection and near the raw byte code.

The unit tests run against every combination allowed by the @CacheSpec annotation. That helps to ensure that a change doesn't break a forgotten configuration by brute force. Any codegen size change can have some assurances that works if the CI passes.

ben-manes · 2017-11-04T02:45:57Z

As advertised, the overhead of method handles is near noise compared to a direct call. So I think we can switch to that for the enum and any other reflective calls. The difference is that a direct call has a much more stable throughput, whereas the handle varies more per iteration making the JIT sensitive. Since the difference is 1-2ns and only on an insertion, its a nice win.

Benchmark                                   Mode  Cnt          Score          Error  Units
FactoryBenchmark.direct                    thrpt   10   94865111.969 ±  5923818.011  ops/s
FactoryBenchmark.methodHandle_invoke       thrpt   10  102297433.106 ± 29351617.853  ops/s
FactoryBenchmark.methodHandle_invokeExact  thrpt   10  102624426.326 ± 28142556.289  ops/s
FactoryBenchmark.reflectionFactory         thrpt   10   41791855.873 ± 25826017.256  ops/s

The subset the code of interest is,

static final class MethodHandleFactory {
   private static final MethodHandles.Lookup LOOKUP = MethodHandles.lookup();
   private static final MethodType METHOD_TYPE = MethodType.methodType(void.class, int.class);

   private final MethodHandle methodHandle;

   MethodHandleFactory() {
     try {
       methodHandle = LOOKUP.findConstructor(Alpha.class, METHOD_TYPE);
     } catch (NoSuchMethodException | IllegalAccessException e) {
       throw new RuntimeException(e);
     }
   }

   Alpha invoke(int x) {
     try {
       return (Alpha) methodHandle.invoke(x);
     } catch (Throwable e) {
       throw new RuntimeException(e);
     }
   }

   Alpha invokeExact(int x) {
     try {
       return (Alpha) methodHandle.invokeExact(x);
     } catch (Throwable e) {
       throw new RuntimeException(e);
     }
   }
 }

jvassev · 2017-11-04T12:44:02Z

Have a look at #199
The trick is to collapse multiple responsibilities (node+factory) into the same class to spare us the metadata bloat. Also, after the enum is gone there is not need to instantiate all possible factories in advance.

ben-manes · 2017-11-04T18:38:49Z

Okay, let me take a look to understand what you did. :)

I was playing around with removing the enum in NodeFactory with method handles. I had a partial prototype and am fine dropping it. Please also take a look to see if any of that is useful with your technique.
https://github.com/ben-manes/caffeine/tree/nodefactory

ben-manes · 2017-11-05T05:58:38Z

I think we can close this thanks to @jvassev shaving 300kb off the jar. Do you agree @jvassev or are there more ideas to play with?

johnou · 2017-11-05T09:12:05Z

I'm sure @ben-manes checked this too but I thought it would be worth adding a comment. I checked out master, built the jar locally and used javap -l to verify debug symbols are still present in the core classes.

jvassev · 2017-11-06T12:35:04Z

Last effort: #200

If we use non-nested classes that's extra 49KB in savings (shorter FQN in the classfiles).

ben-manes · 2017-11-06T19:02:24Z

Thanks @jvassev! We're at 648K now.

Is everyone satisfied or do we want to leave this open for future iterations?

jvassev · 2017-11-06T19:10:43Z

I think at this point only a minifier/obfuscator can help. The only low-hanging fruit left is to put the generated classes in a package with a shorter name but I wouldn't like that (and it may not play well with java modules).

If runtime code-gen is used this the total would be around ~400K (250core + 90asm + 50generator) but its a whole new territory.

So probably this can be closed.

ben-manes · 2017-11-06T19:46:07Z

Yeah, runtime codegen might be a bit too much magic. Its nice to know a lower bound, so thanks for that insight.

The only idea I'd have is to review the LocalCache subtypes for which pull their weight. For example, is special casing removalListener a good tradeoff? It doubles the number of classes for a single field. I don't have a strong opinion as the decisions are a bit arbitrary.

ben-manes · 2017-11-06T19:48:18Z

Also, a note to myself, We should add a little JavaDoc to each generated class with the decoded configuration. Then debugging with the source jar (e.g. IDE) will be easier.

ben-manes · 2017-11-07T01:52:46Z

Added the JavaDoc so that anyone reading the types don't need to decode the names, but it states the generated and inherited features. Should make it a tiny bit easier for a user to debug.

Closing, thanks all the help.

jvassev · 2017-11-08T10:51:29Z

Regarding your comment about removalListener why wouldn't just a few (even one) factory be enough? Is a factory on the hot path somehow? I got the idea that nodes need to be fast and with smal footprint?

I see BoundedLocalCache defines almost all methods as throw new UnsupportedOperationException().
Since the only code calling these methods is generated, it seems safe to back the methods by fields in the parent and don't generate classes for factories at all?

ben-manes · 2017-11-08T16:50:57Z

Yes, the cache might have been overkill. I wrote the generator and iterated as features were implemented. There are a lot of fields only used for particular configurations, which could be null. We could drop that codegen at the small cost of many unused fields, which is probably okay.

jvassev · 2017-11-09T19:16:30Z

When I removed the nodeFactory classes (making a node a factory for self) there were 90K in savings.
I think by removing the CacheFactory classes we can save about the same amount.
Would you feel like merging it if I try it out or you have some concerns wrt performance?

ben-manes · 2017-11-09T21:11:46Z

I think its worth giving it a try, if you are up for it. It would be a worthwhile experiment worst case. In this case I think we'd end up removing all of the LocalCacheFactory code generation, right? So we'd see how many unused fields are introduced as the trade-off, and discuss which is the better tradeoff. I don't have a strong opinion yet.

ben-manes · 2017-12-28T02:29:17Z

Released. Thanks @jvassev!

jvassev mentioned this issue Nov 1, 2017

Compile generated classes withoug debug info #197

Closed

jvassev mentioned this issue Nov 2, 2017

Reflect #198

Merged

ben-manes closed this as completed Nov 7, 2017

ben-manes mentioned this issue Jul 6, 2020

Caffeine error at runtime when using native build quarkusio/quarkus#10420

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notable part of caffeine.jar is consumed by generated classes #110

Notable part of caffeine.jar is consumed by generated classes #110

vlsi commented Aug 9, 2016

ben-manes commented Aug 9, 2016

Maaartinus commented Nov 4, 2017

ben-manes commented Nov 4, 2017

ben-manes commented Nov 4, 2017

jvassev commented Nov 4, 2017

ben-manes commented Nov 4, 2017

ben-manes commented Nov 5, 2017

johnou commented Nov 5, 2017 •

edited

Loading

jvassev commented Nov 6, 2017 •

edited

Loading

ben-manes commented Nov 6, 2017

jvassev commented Nov 6, 2017

ben-manes commented Nov 6, 2017

ben-manes commented Nov 6, 2017

ben-manes commented Nov 7, 2017

jvassev commented Nov 8, 2017

ben-manes commented Nov 8, 2017

jvassev commented Nov 9, 2017

ben-manes commented Nov 9, 2017

ben-manes commented Dec 28, 2017

Notable part of caffeine.jar is consumed by generated classes #110

Notable part of caffeine.jar is consumed by generated classes #110

Comments

vlsi commented Aug 9, 2016

ben-manes commented Aug 9, 2016

Maaartinus commented Nov 4, 2017

ben-manes commented Nov 4, 2017

ben-manes commented Nov 4, 2017

jvassev commented Nov 4, 2017

ben-manes commented Nov 4, 2017

ben-manes commented Nov 5, 2017

johnou commented Nov 5, 2017 • edited Loading

jvassev commented Nov 6, 2017 • edited Loading

ben-manes commented Nov 6, 2017

jvassev commented Nov 6, 2017

ben-manes commented Nov 6, 2017

ben-manes commented Nov 6, 2017

ben-manes commented Nov 7, 2017

jvassev commented Nov 8, 2017

ben-manes commented Nov 8, 2017

jvassev commented Nov 9, 2017

ben-manes commented Nov 9, 2017

ben-manes commented Dec 28, 2017

johnou commented Nov 5, 2017 •

edited

Loading

jvassev commented Nov 6, 2017 •

edited

Loading