Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise boilerplate generators, use instance constructors #3871

Merged
merged 10 commits into from
Jul 15, 2022

Conversation

joroKr21
Copy link
Member

@joroKr21 joroKr21 commented Apr 26, 2021

Make it a bit more readable by formatting constraints on a new line.

See #3870

This change results in around 40% jar size reduction on average of cats-kernel 😲

Artifact This PR main Reduction
cats-kernel_2.12.jar 3068152 5034406 39.0%
cats-kernel_2.13.jar 3299524 5262009 37.3%
cats-kernel_3.0.0-RC3.jar 1418767 2491382 43.0%
cats-kernel_sjs1_2.12.jar 6336434 10614940 40.3%
cats-kernel_sjs1_2.13.jar 6883734 10614940 35.1%
cats-kernel_sjs1_3.0.0-RC3.jar 2197991 4255929 48.3%
cats-kernel_native0.4_2.12.jar 6330092 10796161 41.4%
cats-kernel_native0.4_2.13.jar 6862749 11322497 39.4%

I've also give cats-core a similar treatment, but the savings will be much smaller there.

@joroKr21
Copy link
Member Author

Hmm I still see classes generated - is that expected?

@joroKr21
Copy link
Member Author

joroKr21 commented Apr 27, 2021

TLDR - it's good for Scala 3, Scala 2 is deceiving itself into generating a class file 😭 See scala/scala3#5928

@joroKr21
Copy link
Member Author

As an alternative - use instance constructors that take functions. The externally provided functions will be delambdafied in both Scala 2 and Scala 3.

@joroKr21 joroKr21 changed the title Refactor KernelBoiler, use SAM instances when possible Refactor KernelBoiler, use instance constructors Apr 28, 2021
@joroKr21 joroKr21 changed the title Refactor KernelBoiler, use instance constructors Refactor boilerplate, use instance constructors May 8, 2021
@joroKr21 joroKr21 changed the title Refactor boilerplate, use instance constructors Optimise boilerplate generators, use instance constructors May 8, 2021
Copy link
Contributor

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is worth doing but I do wonder if we can make it a bit more principled by using InvariantFunctor, ContravariantCartesian, or similar typeclasses on these kernel typeclasses (typeclasses can have typeclasses too!)

/**
* Create a `CommutativeGroup` instance from the given inverse and combine functions and empty value.
*/
@inline def instance[A](emp: A, inv: A => A, cmb: (A, A) => A): CommutativeGroup[A] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if instead we make an instance of InvariantFunctor[CommutativeGroup] and then we use the InvariantFunctor instances in the tuple code Gen?

Copy link
Member Author

@joroKr21 joroKr21 Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably mean InvariantSemigroupal so that we can use tupledN right? That's doable but it would mean more allocations and reduced performance at runtime (going from (a, b, c, d) to (a, (b, (c, d))) and back) so I'm not sure that would be acceptable for cats-kernel which is also used by algebra.

@joroKr21
Copy link
Member Author

Also optimized AlgebraBoilerplate now that #3918 was merged

@armanbilge
Copy link
Member

Thanks for being on top of this!

johnynek
johnynek previously approved these changes Aug 26, 2021
@@ -115,4 +115,13 @@ trait RingFunctions[R[T] <: Ring[T]] extends AdditiveGroupFunctions[R] with Mult

object Ring extends RingFunctions[Ring] {
@inline final def apply[A](implicit ev: Ring[A]): Ring[A] = ev

private[algebra] def instance[A](z: A, o: A, neg: A => A, add: (A, A) => A, mul: (A, A) => A): Ring[A] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are these private[algebra] but it seems the others are @inline. Is there a reason they aren't all the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cats already has public instance methods for some (but not all) type classes and they are @inline whereas Algebra doesn't have any. I just tried to follow the conventions of each project, but I think it doesn't hurt to add the annotation in Algebra too.

@joroKr21
Copy link
Member Author

I don't know why the "Microsite" job is failing but I'm quite sure it's not related to this PR

rossabaker
rossabaker previously approved these changes Aug 27, 2021
Copy link
Member

@rossabaker rossabaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build failure could be related to #3974. I'll bring it up there.

Make it a bit more readable by formatting constraints on a new line.
Due to limitations in Scala 2 SAM types often end up generating
classes after all. `instance` constructors don't suffer from this
issue and also let us handle type classes with multiple abstract
methods.
johnynek
johnynek previously approved these changes Aug 29, 2021
rossabaker
rossabaker previously approved these changes Nov 18, 2021
@armanbilge
Copy link
Member

armanbilge commented Nov 26, 2021

Personally, I'm not convinced we need or even want this change. Apologies if I'm missing the point.

  • In Move typelevel/algebra into cats repo #3877 (comment) @joroKr21 mentioned that classes are loaded/unloaded dynamically.
  • Scala.js and Native involve a static linking step that removes all unused code from the final generated JS/binary.
  • I understand the relative numbers are significant, but in absolute terms we're basically talking megabytes, right?

Additionally, isn't this de-optimizing the current implementations? By replacing dedicated classes with lambdas and thus adding a level of indirection and a larger per-instance footprint. Like, these are micro-optimizations, but isn't saving a few megabytes? :)

@joroKr21
Copy link
Member Author

In Move typelevel/algebra into cats repo #3877 (comment) @joroKr21 mentioned that classes are loaded/unloaded dynamically.

Yes, that's mostly true on the JVM these days unless you are using the CMS GC without class unloading enabled.

Scala.js and Native involve a static linking step that removes all unused code from the final generated JS/binary.

Oh that's cool, I didn't think much about that. Do you know how fine-grained that is? Does it mean that the concerns for including java.time instances are unfounded (e.g. #3910)?

I understand the relative numbers are significant, but in absolute terms we're basically talking megabytes, right?

Well yes, that's megabytes per download per jar. I don't know what's the multiplier, but there is some multiplier. I would prefer to fix that in the compiler to be honest but we can't because of binary compatibility constraints. (Aside: at least that's what I've been told. Now that I think about it - those classes should be private anyway so why not? 🤔 Maybe I will give it a try but let's not discuss that here).

Additionally, isn't this de-optimizing the current implementations? By replacing dedicated classes with lambdas and thus adding a level of indirection and a larger per-instance footprint. Like, these are micro-optimizations, but isn't saving a few megabytes? :)

That's a good question - do we have any benchmarks on that? I can't say without trying.

@joroKr21
Copy link
Member Author

Some shower thoughts - the biggest cost of these instances are probably boxing of primitives (would be interesting to check in which cases it occurs) and tuple allocations. Lambda calls should not be that expensive because otherwise we would never get SAMs on the JVM. That leaves open the question about the cost of the additional indirection (which is necessary because of the compiler bug).

@armanbilge armanbilge mentioned this pull request Feb 8, 2022
@joroKr21
Copy link
Member Author

joroKr21 commented Feb 9, 2022

Since we've reached the limit of our JVM knowledge - what about applying this only to the SAM type classes and leaving the multiple method type classes as they were? SAM conversion is supposed to work automatically by the compiler but because of the bug it's not the case. Obviously that means our jar size savings will be much less, but there will be no danger of performance regressions.

@armanbilge
Copy link
Member

👍 that sounds like a good way forward.

@joroKr21
Copy link
Member Author

joroKr21 commented Feb 10, 2022

@armanbilge done - now only SAM type classes are optimised in this way. I converted multi method type class instances back to anonymous classes. There were not that many actually - only Hash, Group, CommutativeGroup, Ring, Rig, Rng and Semiring.

def show[A](f: A => String): Show[A] =
new Show[A] {
def show(a: A): String = f(a)
}
def show[A](f: A => String): Show[A] = f(_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a practical difference between these? e.g. no class is emitted.

Copy link
Member Author

@joroKr21 joroKr21 Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed 👍 - because Show has only one abstract method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks. Is this affected by the aforementioned Scala 2 bug?

Also, can Semigroup etc. get the same treatment?

@inline def instance[A](cmb: (A, A) => A): Semigroup[A] =
new Semigroup[A] {
override def combine(x: A, y: A): A = cmb(x, y)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for Semigroup it doesn't matter because it has other (non-abstract) methods. So then the bug applies and there would be an anonymous classes generated even if we use the SAM syntax. So the only benefit would be aesthetics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so it's not just that Show has one abstract method, it's that it has no other methods.

I agree it's only aesthetics on Scala 2, but does the bug apply to Scala 3 as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's only aesthetics on Scala 2, but does the bug apply to Scala 3 as well?

I don't remember - that's a good question.

Copy link
Member

@armanbilge armanbilge Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I ask is, I don't care so much about the handful of Semigroup.instance etc. that can be re-written.

But I'm wondering if in Scala 3 the boilerplate instances themselves could be written directly like this instead of relying on instance. So we get the win in terms of jar size without introducing any indirection at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if that's possible it would be quite hard to split like this 😄

Copy link
Member Author

@joroKr21 joroKr21 Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR Scala 3 doesn't have this bug - but again I'm sceptical about version-specific boilerplate generators 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's interest it can be a followup PR. I think it shouldn't be too hard (famous last words): currently all the instances are in the same file I think, when they could easily be split among a few files. And some of those files can go into the scala-2 and scala-3 srcs.

Copy link
Member

@armanbilge armanbilge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review the boilerplate code in detail, but I did spent some time studying the decompiled bytecode of the generated classes. The extra indirection is unfortunate but quite possibly not a big deal (and potentially avoidable for Scala 3 in follow-up work). Given the various constraints I think this is good 👍

@DavidGregory084
Copy link
Member

This inv lambda must also be storing references to A1 and A2.

Sorry for coming to this late but there is a great page about the lambda translation process here and a good blog post with some further explanation of what happens at runtime here.

Scalac uses the LambdaMetafactory provided in java.lang.invoke and it works in a similar way.

In the example that you provide:

val cmb = (x, y) => (A1.cmb(x._1, y._1), A2.cmb(x._2, y._2))
val inv = x => (A1.inv(x._1), A2.inv(x._2))

These lambdas are capturing lambdas, but they are not instance-capturing lambdas (they do not use this or super because A1 and A2 are constructor arguments of TupleGroup), so they are encoded as static methods. That does make me wonder if it behaves differently when you use an implicit val argument instead of using a context bound or just an implicit argument though!

The static methods will be declared with the lambda parameters with any captured variables prepended to the parameter list. You can actually see this in the bytecode for the INVOKEDYNAMIC call that @joroKr21 posted above:

// handle kind 0x6 : INVOKESTATIC
cats/kernel/instances/TupleMonoidInstances.$anonfun$catsKernelMonoidForTuple2$1(Lcats/kernel/Monoid;Lcats/kernel/Monoid;Lscala/Tuple2;Lscala/Tuple2;)Lscala/Tuple2; itf, 

The first two parameters are cats.kernel.Monoid instances. The ALOAD 1 and ALOAD 2 calls just before the INVOKEDYNAMIC bytecode stack those "static arguments" for the bootstrap method that creates the lambda object.

The VM spec says that:

At run time, evaluation of a lambda expression is similar to evaluation of a class instance creation expression, insofar as normal completion produces a reference to an object. Evaluation of a lambda expression is distinct from execution of the lambda body.

Either a new instance of a class with the properties below is allocated and initialized, or an existing instance of a class with the properties below is referenced. If a new instance is to be created, but there is insufficient space to allocate the object, evaluation of the lambda expression completes abruptly by throwing an OutOfMemoryError.

So like @armanbilge suggests, these lambda expressions will capture A1 and A2 separately as the static arguments to those lambda expression objects. Each static lambda expression will probably result in the creation of one lambda object with a reference to each of those static arguments, although the behaviour is deliberately left completely up to the VM implementors so it's hard to be sure that different VMs will even behave the same:

These rules are meant to offer flexibility to implementations of the Java programming language, in that:

A new object need not be allocated on every evaluation.

Objects produced by different lambda expressions need not belong to different classes (if the bodies are identical, for example).

Every object produced by evaluation need not belong to the same class (captured local variables might be inlined, for example).

If an "existing instance" is available, it need not have been created at a previous lambda evaluation (it might have been allocated during the enclosing class's initialization, for example).

As of relatively recent versions of the JDK it seems that:

In the current implementation, the metafactory delegates to code that uses an internal, shaded copy of the ASM bytecode libraries to spin up an inner class that implements the target type.

@joroKr21
Copy link
Member Author

joroKr21 commented May 3, 2022

Thanks for detailed explanation @DavidGregory084 - I think that supports our current approach to apply this change to SAM type classes only

@armanbilge armanbilge mentioned this pull request May 16, 2022
@joroKr21
Copy link
Member Author

Hey, is there any reason not to get this merged? 🙏

@armanbilge
Copy link
Member

I'm happy to move forward with this as of #3871 (review). I feet like I remember someone (you?) saying we should still benchmark it or something before merging, but maybe I'm confused.

@joroKr21
Copy link
Member Author

I feet like I remember someone (you?) saying we should still benchmark it or something before merging, but maybe I'm confused.

I thought that referred to the instances which have more than one abstract method which are now removed.
I personally won't have time to benchmark the rest 😄

Copy link
Member

@danicheg danicheg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tremendous.

@armanbilge armanbilge added this to the 2.9.0 milestone Jul 15, 2022
@armanbilge
Copy link
Member

We have previous approvals from Ross and Oscar as well, let's go ahead 🚀

@armanbilge armanbilge merged commit fbad4be into typelevel:main Jul 15, 2022
@joroKr21 joroKr21 deleted the sam-boiler branch July 16, 2022 09:55
@armanbilge
Copy link
Member

In unrelated research, I found out that SAMs still generate classes on JS (unsurprisingly). So this PR actually didn't help things there, although I don't think it made things worse.

Not sure the situation on Native, but if I had to guess it would be the same as JS.

@joroKr21
Copy link
Member Author

I assume both JS and Native have functions though - otherwise I can't explain the reduction of jar size I observed in both. Note that we use functions here, not SAM syntax directly to work around the Scala 2 bug.

@armanbilge
Copy link
Member

Ah, sorry, to clarify I'm not talking about the bytecode/SJSIR size. I'm talking about the size of the final generated JavaScript. Since JS is delivered into browsers this is a sensitive subject 😉

@armanbilge
Copy link
Member

Note that we use functions here, not SAM syntax directly to work around the Scala 2 bug.

Right, sorry, I forgot this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants