Topic/jit optimize #5348

dolio · 2024-09-13T22:05:10Z

This PR has some quick stuff to make some arithmetic work faster on the JIT. There's more work to do, but this is a first step at least.

One change is that using (max 0 ...) in the implementation of Nat.drop was slow, so if you use that in a loop, you lose performance. It's been replaced with something that should be equivalent, but faster.

I've also changed the way that unison definitions get 'curried.' It's back to the original strategy of generating case-lambda expressions for every definition. My experiments suggest that this optimizes better in various cases. I've also added machinery to selectively apply this behavior, because it causes compilation to be a lot slower.

According to my tests, it shouldn't be necessary for every definition to use this strategy. It's mostly recursive functions that the optimizer refuses to handle well with pre-defined currying functions. But I also couldn't get the optimizer to optimize builtins properly in actual code without them also using this sort of currying. At this point I'm unsure of what the difference between my test cases and the actual code is, so I thought I'd just push this to get the optimization out, and try to figure out how to be more intelligent about it later.

With this, counting up to 1 billion takes around 1.5s on my machine, which matches a loop written directly in racket. This is only testing a couple operations, though, so there may be random other things like the (max 0 ...) situation out there that I haven't looked at yet.

- Apparently `(max 0 n)` used in `Nat.drop` was slow, so it's been replaced with something that should act the same on natural numbers. - Switched back to the original currying macro behavior. This seems to optimize better in various ways. According to my tests, it should only really be necessary for recursive functions, and so I've added some capabilities to only apply the full macro locally on those. But the racket optimizer also seems very fickle, so using predefined curry functions on various builtins seems to _not_ optimize properly like they do in my localized tests, even when various inlining suggestions are enabled. Hopefully this can be fixed in the future as it makes compile times significantly worse. This also fixes a latent bug where there wouldn't be enough pre-defined currying functions for procedures that take more than 20 arguments. I've instead lowered the predefined functions to a maximum of 9 arguments, and made anything over that just use the macro directly, since those are presumably rare. None of the currying functions are currently used, but hopefully they can be in the future.

pchiusano

Nice!!

dolio added 2 commits September 11, 2024 15:02

Switch to custom max0 operation in Nat.drop

289a3b6

dolio requested review from pchiusano and aryairani September 13, 2024 22:06

pchiusano approved these changes Sep 14, 2024

View reviewed changes

pchiusano merged commit eda2f0e into trunk Sep 14, 2024
32 checks passed

pchiusano deleted the topic/jit-optimize branch September 14, 2024 03:17

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topic/jit optimize #5348

Topic/jit optimize #5348

dolio commented Sep 13, 2024

pchiusano left a comment

Topic/jit optimize #5348

Topic/jit optimize #5348

Conversation

dolio commented Sep 13, 2024

pchiusano left a comment

Choose a reason for hiding this comment