Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notable part of caffeine.jar is consumed by generated classes #110

Closed
vlsi opened this issue Aug 9, 2016 · 19 comments
Closed

Notable part of caffeine.jar is consumed by generated classes #110

vlsi opened this issue Aug 9, 2016 · 19 comments

Comments

@vlsi
Copy link
Contributor

vlsi commented Aug 9, 2016

caffeine-2.3.1.jar -- 968137 bytes
caffeine-2.3.1.jar without LocalCacheFactory$... and NodeFactory$... classes -- 248414 bytes

Are there any ideas/plans to reduce that?

@ben-manes
Copy link
Owner

The intent is of course to reduce the in-memory overhead due to the many configuration options at the cost of larger disk footprint (unloaded classes).

I think the large enum used by NodeFactory could be removed. The simplest would be to getFactory compute a numeric code, and return a new factory instance whose methods switch on that code. This would remove the bloat for a tiny runtime cost of selection.

LocalCacheFactory has a string switch statement. Using a numeric code would be slightly smaller by reducing the interned strings.

I tried to use inheritance to reduce duplicate code. I think Node was optimal and LocalCache was close, but not perfect. Perhaps there are some more tricks to optimize reuse, e.g. interfaces with default methods. We could also remove the modifiers, as perhaps the JVM is creating unnecessary bridge methods.

The biggest win would be to review LocalCacheFactory features to determine if they carry their weight. Perhaps that factory should be removed and all the fields present, but null'd if unused. The top-level cost is smaller than per-node, which was my original intent for using codegen.

jvassev added a commit to jvassev/caffeine that referenced this issue Nov 2, 2017
This patch expands on the suggestion in
ben-manes#110 (comment)

Instead of the proposed numeric code the name of factory is used as the
classname to instantiate using reflection.
This reduces the class size with 21K bringing the jar size to 786K.
@jvassev jvassev mentioned this issue Nov 2, 2017
jvassev added a commit to jvassev/caffeine that referenced this issue Nov 2, 2017
This patch expands on the suggestion in
ben-manes#110 (comment)

Instead of the proposed numeric code the name of factory is used as the
classname to instantiate using reflection.
This reduces the class size with 21K bringing the jar size to 786K.
@Maaartinus
Copy link

The biggest win would be to review LocalCacheFactory features to determine if they carry their weight. Perhaps that factory should be removed and all the fields present, but null'd if unused. The top-level cost is smaller than per-node, which was my original intent for using codegen.

I could imagine, that keeping the small factories small makes sense, but for example, WISMW has tons of fields. Adding all subclass fields to it would only grow it by a few percent. So you could generate only WISMWAWR (with some runtime tests) and use it in place of WISMW and all its subclasses. I might be using wrong names as without being really familiar with the code, they're hard to tell apart.

But such optimizations are ugly. For it, to be efficient, you may have to reorder the hierarchy so that the heaviest additions come first. Let's say, you deal with evicts first. Any evicting factory is already huge and you can add all fields and save half the hierarchy. Adapting the generator could be rather easy (nice code!), but changing the hierarchy could be confusing and the reward unsure.

The size optimizations for nodes are clearly much more important and in the long run, you'll probably need something smart as the code size doubles with every new feature. A funny optimization would be to declare field1, field2, etc. (so that classes would differ only by their count), and use them via methods using offsets, but for this is even Unsafe too much high-level (and the fields are of different types).

@ben-manes
Copy link
Owner

For variable expiration, I decided to avoid the extra codegen and encoded it onto the existing timestamp fields. Since it cannot be used with fixed expiration (expireAfterAccess or expireAfterWrite), but can be used with refresh (refreshAfterWrite), this was a bit of a dance. But the minimum number of fields and no new classes was the result, with just a few new methods.

The enum in NodeFactory appears to account for 100kb compressed. That's determined by generating with just a single enum and comparing the jar sizes. That enum could be replaced by a factory (strong keys and subclassed for weak). If we use MethodHandles, we should avoid most of the cost of reflection and near the raw byte code.

The unit tests run against every combination allowed by the @CacheSpec annotation. That helps to ensure that a change doesn't break a forgotten configuration by brute force. Any codegen size change can have some assurances that works if the CI passes.

@ben-manes
Copy link
Owner

As advertised, the overhead of method handles is near noise compared to a direct call. So I think we can switch to that for the enum and any other reflective calls. The difference is that a direct call has a much more stable throughput, whereas the handle varies more per iteration making the JIT sensitive. Since the difference is 1-2ns and only on an insertion, its a nice win.

Benchmark                                   Mode  Cnt          Score          Error  Units
FactoryBenchmark.direct                    thrpt   10   94865111.969 ±  5923818.011  ops/s
FactoryBenchmark.methodHandle_invoke       thrpt   10  102297433.106 ± 29351617.853  ops/s
FactoryBenchmark.methodHandle_invokeExact  thrpt   10  102624426.326 ± 28142556.289  ops/s
FactoryBenchmark.reflectionFactory         thrpt   10   41791855.873 ± 25826017.256  ops/s

The subset the code of interest is,

static final class MethodHandleFactory {
   private static final MethodHandles.Lookup LOOKUP = MethodHandles.lookup();
   private static final MethodType METHOD_TYPE = MethodType.methodType(void.class, int.class);

   private final MethodHandle methodHandle;

   MethodHandleFactory() {
     try {
       methodHandle = LOOKUP.findConstructor(Alpha.class, METHOD_TYPE);
     } catch (NoSuchMethodException | IllegalAccessException e) {
       throw new RuntimeException(e);
     }
   }

   Alpha invoke(int x) {
     try {
       return (Alpha) methodHandle.invoke(x);
     } catch (Throwable e) {
       throw new RuntimeException(e);
     }
   }

   Alpha invokeExact(int x) {
     try {
       return (Alpha) methodHandle.invokeExact(x);
     } catch (Throwable e) {
       throw new RuntimeException(e);
     }
   }
 }

@jvassev
Copy link
Contributor

jvassev commented Nov 4, 2017

Have a look at #199
The trick is to collapse multiple responsibilities (node+factory) into the same class to spare us the metadata bloat. Also, after the enum is gone there is not need to instantiate all possible factories in advance.

@ben-manes
Copy link
Owner

Okay, let me take a look to understand what you did. :)

I was playing around with removing the enum in NodeFactory with method handles. I had a partial prototype and am fine dropping it. Please also take a look to see if any of that is useful with your technique.
https://github.com/ben-manes/caffeine/tree/nodefactory

@ben-manes
Copy link
Owner

I think we can close this thanks to @jvassev shaving 300kb off the jar. Do you agree @jvassev or are there more ideas to play with?

@johnou
Copy link

johnou commented Nov 5, 2017

I'm sure @ben-manes checked this too but I thought it would be worth adding a comment. I checked out master, built the jar locally and used javap -l to verify debug symbols are still present in the core classes.

@jvassev
Copy link
Contributor

jvassev commented Nov 6, 2017

Last effort: #200

If we use non-nested classes that's extra 49KB in savings (shorter FQN in the classfiles).

@ben-manes
Copy link
Owner

Thanks @jvassev! We're at 648K now.

Is everyone satisfied or do we want to leave this open for future iterations?

@jvassev
Copy link
Contributor

jvassev commented Nov 6, 2017

I think at this point only a minifier/obfuscator can help. The only low-hanging fruit left is to put the generated classes in a package with a shorter name but I wouldn't like that (and it may not play well with java modules).

If runtime code-gen is used this the total would be around ~400K (250core + 90asm + 50generator) but its a whole new territory.

So probably this can be closed.

@ben-manes
Copy link
Owner

Yeah, runtime codegen might be a bit too much magic. Its nice to know a lower bound, so thanks for that insight.

The only idea I'd have is to review the LocalCache subtypes for which pull their weight. For example, is special casing removalListener a good tradeoff? It doubles the number of classes for a single field. I don't have a strong opinion as the decisions are a bit arbitrary.

@ben-manes
Copy link
Owner

Also, a note to myself, We should add a little JavaDoc to each generated class with the decoded configuration. Then debugging with the source jar (e.g. IDE) will be easier.

@ben-manes
Copy link
Owner

Added the JavaDoc so that anyone reading the types don't need to decode the names, but it states the generated and inherited features. Should make it a tiny bit easier for a user to debug.

Closing, thanks all the help.

@jvassev
Copy link
Contributor

jvassev commented Nov 8, 2017

Regarding your comment about removalListener why wouldn't just a few (even one) factory be enough? Is a factory on the hot path somehow? I got the idea that nodes need to be fast and with smal footprint?

I see BoundedLocalCache defines almost all methods as throw new UnsupportedOperationException().
Since the only code calling these methods is generated, it seems safe to back the methods by fields in the parent and don't generate classes for factories at all?

@ben-manes
Copy link
Owner

Yes, the cache might have been overkill. I wrote the generator and iterated as features were implemented. There are a lot of fields only used for particular configurations, which could be null. We could drop that codegen at the small cost of many unused fields, which is probably okay.

@jvassev
Copy link
Contributor

jvassev commented Nov 9, 2017

When I removed the nodeFactory classes (making a node a factory for self) there were 90K in savings.
I think by removing the CacheFactory classes we can save about the same amount.
Would you feel like merging it if I try it out or you have some concerns wrt performance?

@ben-manes
Copy link
Owner

I think its worth giving it a try, if you are up for it. It would be a worthwhile experiment worst case. In this case I think we'd end up removing all of the LocalCacheFactory code generation, right? So we'd see how many unused fields are introduced as the trade-off, and discuss which is the better tradeoff. I don't have a strong opinion yet.

@ben-manes
Copy link
Owner

Released. Thanks @jvassev!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants