Proposal: IL optimization step #15929

benaadams · 2016-12-15T16:40:13Z

The jit can only apply so many optimizations as it is time constrained at runtime.

AOT/NGen which has more time ends up with asm but loses some optimizations the jit can do at runtime as it needs to be conservative as to cpu architecture; static readonly to consts etc

The compile to il compilers (C#/VB.NET/etc); which aren't as time constrained but have a lot of optimizations considered out of scope.

Do we need a 3rd compiler between roslyn and jit that optimizes the il as part of the regular compile or a "publish" compile?

This could be a collaboration between the jit and roslyn teams?

I'm sure there are lots of low hanging fruit between the two; that the jit would like to do but are too expensive.

There is also the whole program optimization or linker + tree shaking which is also a component in this picture. (e.g. Mono/Xamarin linker). Likely also partial linking (e.g. nugets/non-platform runtime libs)

From #15644 (comment)

/cc @migueldeicaza @gafter @jkotas @AndyAyersMS

HaloFour · 2016-12-15T20:09:43Z

Does something like rewriting LINQ as loops fit into a process like this?

alrz · 2016-12-15T21:52:29Z

Speaking of optimziations,

int square(int num) {
  int a = 0;
  for (int x = 0; x < num; x+=2) {
    if (x % 2) {
      a += x;
    }
  }
  return a;
}

gcc changes the above function to return zero. 😄

iam3yal · 2016-12-16T03:44:11Z

And when we're at it! maybe constexpr can be taken into account! when and if it will ever get implemented. 😄

I don't know but maybe this is the right phase for handling constexpr just a thought.

fanoI · 2016-12-16T09:36:44Z

I think IL rewrite and constexpr are two separate things.

iam3yal · 2016-12-16T17:50:17Z

@fanoI They are two different things but what lead me to write this is really this part:

a "publish" compile?

Maybe during debug compilation the compiler won't evaluate functions as constexpr to reduce compilation time and then when you do publish or release it evaluate it at compile-time.

Just because they are two different things doesn't mean it's not the right phase to handle it.

AndyAyersMS · 2016-12-16T21:40:13Z

Part of the challenge in building an IL-IL optimizer is that in general the scope of optimizations is limited to a method or perhaps a single assembly. One can do some cool things at this scope but quite often the code one would like to optimize is scattered across many methods and many assemblies.

To broaden the scope, you can gather all the relevant assemblies together sometime well before execution and optimize them all together. But by doing this you inevitably build in strong dependence on the exact versions of the assemblies involved, and so the resulting set of optimized IL assemblies must now be deployed and versioned and serviced together. For some app models this is a natural fit; for others it is not.

Engineering challenges abound. @mikedn touched on some of them over in #15644, but there are many more. Just to touch on a few:

The optimizer may not have enough context to build a coherent call graph, which can severely limit interprocedural fact propagation.
For a given method the optimizer typically can't know all callers. To take advantage of caller-specific information the optimizer must clone or inline the callee. This in turn can cause significant code growth. Linking or tree-shaking can counteract this but typically needs some extra information to help limit the viral impact of reflection and/or interop.
Determining which generic instantiations will be required at runtime is generally intractable
The number of possible generic instantiations can be huge (in fact, unbounded). The optimizer must be designed to carefully balance between the added precision that comes from knowing instantiated types and the need to describe them all concisely.
Instantiated generics cannot be described directly in IL, so if the optimizer instantiates generics to further optimize them then there needs to be some extra mechanism in the output IL to represent them, and downstream components must be updated to recognize them.
The optimizer must anticipate the generic sharing strategy (if any) employed by the runtime, and or implement its own sharing. Shared generic instantiation(s) also cannot be described directly by IL.
Runtime features like generic virtual methods, covariance and contravariance, reflection, reflection emit, and assembly loading complicate and limit the applicability of many interprocedural analyses.
It's not always clear what "optimal" IL looks like. One must anticipate the capabilities and limitations of the downstream code generator.

I'm not saying that building an IL-IL optimizer is a bad idea, mind you -- but building one that is broadly applicable is a major undertaking.

iam3yal · 2016-12-16T23:01:32Z

@AndyAyersMS

To broaden the scope, you can gather all the relevant assemblies together sometime well before execution and optimize them all together. But by doing this you inevitably build in strong dependence on the exact versions of the assemblies involved, and so the resulting set of optimized IL assemblies must now be deployed and versioned and serviced together. For some app models this is a natural fit; for others it is not.

In C++ we face the exact same problem, don't we? static linking vs dynamic linking? there's trade-offs and yet many people choose to have static linking for their applications.

All this can be part of a compiler flag, I guess it needs to be an opt-in feature.

mikedn · 2016-12-17T07:09:29Z

In C++ we face the exact same problem, don't we? static linking vs dynamic linking? there's trade-offs and yet many people choose to have static linking for their applications.

What problem? C++ has headers and many C++ libraries are header only, starting with the C++ standard library (with a few exceptions). Besides, many used static linking for convenience, not for performance reasons. And static linking is certainly not the norm, for example you can find many games where the code is spread across many dlls.

All this can be part of a compiler flag, I guess it needs to be an opt-in feature.

Such a feature is being worked on. It's called corert.

iam3yal · 2016-12-17T08:53:42Z

@mikedn

What problem? C++ has headers and many C++ libraries are header only, starting with the C++ standard library (with a few exceptions). Besides, many used static linking for convenience, not for performance reasons. And static linking is certainly not the norm, for example you can find many games where the code is spread across many dlls.

I think you missed my point... my point being is people that do choose static linking face the following problems which he mentions in his post:

strong dependence on the exact versions of the assemblies involved
deployed and versioned and serviced together

It certainly isn't the norm for games but for many other applications static linking is fairly common.

I wasn't speaking about performance at all and it wasn't my intention to imply that the analogy has anything to do with the discussed feature itself but merely to point out that in the C++ world some people would accept these trade-offs and if people do that out of convenience then they would surely do it if it can increase performance especially in the .NET world so they might pay the price for bundling everything together but they will get optimized code which might worth it for them.

Such a feature is being worked on. It's called corert.

As far as I understand CoreRT is mostly designed for AOT scenarios and the emitted code isn't really IL, this isn't part of Roslyn and there's no compiler flag so how is it related? or as you said being worked on? maybe I'm missing something.

mikedn · 2016-12-17T10:22:37Z

I think you missed my point...

but merely to point out that in the C++ world some people would accept these trade-offs and if people do that out of convenience

Could be. It's somewhat unavoidable when apples and oranges comparisons are used. The C++ world is rather different from the .NET world and attempts to make decisions in one world based on what happens in the other may very well result in failure.

then they would surely do it if it can increase performance especially in the .NET world so they might pay the price for bundling everything together but they will get optimized code which might worth it for them

Why do you think that bundling everything together would increase performance?

Do you even know that you have all the code after bundling? No, there's always Assembly.Load & friends.

Can you transform a virtual call into a direct call because you can see that no class has overridden the called method? No, you can't because someone may create a class at runtime and override that method.

Can you look at all the List<T> code, conclude that count <= array.Length is always true and eliminate a range check? No, you can't because someone might use reflection to set the count field and break the invariant.

As far as I understand CoreRT is mostly designed for AOT scenarios and the emitted code isn't really IL, this isn't part of Roslyn and there's no compiler flag so how is it related? or as you said being worked on? maybe I'm missing something.

Well, do you want better compiler optimizations? Are you willing to merge all the code into a single binary and give up runtime code loading? Are you willing to give up runtime code generation? Are you willing to give up full reflection functionality? Then it sounds to me that CoreRT is exactly what you're looking for.

Do want it to emit IL? Why would you want that? I thought you wanted performance. Do you want it to be part of Roslyn as a compiler flag? Again, I thought you wanted performance.

The problem with this proposal is that it is extremely nebulous. Most people just want better performance and don't care how they get it. Some propose that we might get better performance by doing certain things but nobody knows exactly what and how much performance we can get by doing that.

And then we have the Roslyn team who had to cut a lot of stuff from C# 7 to deliver it. And the JIT team who too has a lot of work to do. And the CoreRT teams who for whatever reasons moves rather slowly. And the proposition is to embark into this new nebulous project that overlaps existing projects and delivers unknowns. Hmm...

iam3yal · 2016-12-17T19:52:50Z

@mikedn

Could be. It's somewhat unavoidable when apples and oranges comparisons are used. The C++ world is rather different from the .NET world and attempts to make decisions in one world based on what happens in the other may very well result in failure.

Okay, first of all I didn't just compare two worlds I made an analogy between two worlds and I merely pointed out that people choose one thing over the other regardless to the trade-offs in one world so they might do the same in a different world where the impact might be greater...

Why do you think that bundling everything together would increase performance?

I never said it will increase performance! I said that if it can increase performance then people will use it just for this fact alone and will accept the trade-offs he pointed out in his post.

I'm not an expert on the subject so I'm not going to pretend like my words are made of steel but based on what @AndyAyersMS wrote it seems like bundling assembly can eliminate some challenges.

By everything I only refer to the assemblies you own not 3rd-party ones.

Do you even know that you have all the code after bundling? No, there's always Assembly.Load & friends.

No, I don't know.

Well, do you want better compiler optimizations? Are you willing to merge all the code into a single binary and give up runtime code loading? Are you willing to give up runtime code generation? Are you willing to give up full reflection functionality?

ATM I don't know what would be the trade-offs, there's no proposal about it or design work that delve into limitations, so how did you decide what would be the limitations for such a feature?

Then it sounds to me that CoreRT is exactly what you're looking for.

Maybe... but I don't think so.

Do want it to emit IL? Why would you want that? I thought you wanted performance. Do you want it to be part of Roslyn as a compiler flag? Again, I thought you wanted performance.

Just because I want to optimize existing IL for performance to reduce run-time overheads it means that now I want to compile it to native code?

The problem with this proposal is that it is extremely nebulous. Most people just want better performance and don't care how they get it. Some propose that we might get better performance by doing certain things but nobody knows exactly what and how much performance we can get by doing that.

I agree but how would you get to something without discussing it first? don't you think we need to have an open discussion about it? and what we really want to get out of it?

This isn't really a proposal, this is the reason it was marked as Discussion.

And then we have the Roslyn team who had to cut a lot of stuff from C# 7 to deliver it. And the JIT team who too has a lot of work to do. And the CoreRT teams who for whatever reasons moves rather slowly. And the proposition is to embark into this new nebulous project that overlaps existing projects and delivers unknowns. Hmm...

You're right but I don't think it's our job as a community to get into their schedule, we can still discuss things and make proposals without stepping into their shoes, when and if these things will see the light of day, that's a different story so to that I'd say patience is a virtue.

AndyAyersMS · 2016-12-17T20:55:53Z

@eyalsk we're always happy to take in feedback and consider new ideas. And yes, we have a lot of work to do, but we also have the responsibility to continually evaluate whether we're working on the right things and adapt and adjust. So please continue to raise issues and make proposals. We do read and think about these things.

I personally find it easier to kickstart a healthy discussion by working from the specific towards the general. When someone brings concrete ideas for areas where we can change or improve, or points out examples from existing code, we are more readily able to brainstorm about how to best approach making things better. When we see something broad and general we can end up talking past one another as broad proposals often create more questions than answers.

The discussion here has perhaps generalized a bit too quickly and might benefit from turning back to discussing specifics for a while. So I'm curious what you have in mind for a constexpr-type optimization, and if you see examples today where you think we should improve.

Likewise for some of the other things mentioned above: linq rewriting, things too expensive to do at jit time, etc....

iam3yal · 2016-12-17T21:41:37Z

@AndyAyersMS

The discussion here has perhaps generalized a bit too quickly and might benefit from turning back to discussing specifics for a while. So I'm curious what you have in mind for a constexpr-type optimization, and if you see examples today where you think we should improve.

I'm still thinking about constexpr and specific scenarios that would make sense in C#, I made a discussion about it #15079 but the discussion hasn't yield anything specific just yet.

Personally, I want constexpr mostly for some mathematical functions or more generally pure functions that can be evaluated at compile-time, I already made a proposal for raising a number to a power so maybe it's worth adding constexpr than adding language specific operators for it like **.

mikedn · 2016-12-17T22:09:16Z

I never said it will increase performance! I said that if it can increase performance then people will use it just for this fact alone and will accept the trade-offs he pointed out in his post.

Well, it was a rhetorical question.

I'm not an expert on the subject so I'm not going to pretend like my words are made of steel but based on what @AndyAyersMS wrote it seems like bundling assembly can eliminate some challenges.
By everything I only refer to the assemblies you own not 3rd-party ones.

Some but not all. As already mentioned, even if you have all the code there are problems with reflection and runtime code generation. And if you limit yourself only to your own code then the optimization possibilities will likely be even more limited.

I agree but how would you get to something without discussing it first?

The trouble is that I have no idea what we are discussing. As I already said, this whole thing is nebulous.

Personally, I want constexpr mostly for some mathematical functions or more generally pure functions that can be evaluated at compile-time,

Dragging constexpr into this discussion only serves to muddy the waters. Constexpr, as defined in C++, has little to do with performance. C++ got constexpr because it has certain contexts where compile time evaluation is required - e.g. array sizes, attributes, template non-type arguments etc. C# too has such contexts (e.g. attributes, const) but fewer and in particular it lacks template non-type arguments and more generally meta-programming.

It may be useful to be able to write [DefaultValue(Math.Cos(42.0))] but what is being discussed here has nothing to do with that. C++ like constexpr needs to be implemented by the C# compiler, it cannot be done by some sort of IL rewriter.

benaadams · 2016-12-17T22:20:32Z

The scenario I'm thinking of in particular

ASP.NET app publish

AoT -> il for artifacts to publish (e.g. pre-compiling MVC views; rather than runtime compile to il).
Copy dependencies to publish (as per current publish step)
Link and optimize everything in publish <- this step

It would be a partial link step of everything above .NET Standard/Platform Runtime; e.g. app+dependencies, which is still quite a lot of things e.g. these are the ASP.NET dependencies brought into the publish directory:

So it would still use platform/global versioned .NET Standard dlls (mscorlib and friends) with PGO compiled System.Private.CoreLib (as now when not doing standalone app); and allow for emergency patching at that level; but link and optimize your local libraries.

Sort of the inverse of what Xamarin does for Android and iOS; where it focuses on mscorlib (discarding unused code), then optionally your own libraries.

So you'd end up with a single dll + pdb for deployment of your app; that would remain platform independent and portable; and be jitted at runtime. With windows or linux, 32 bit or 64 bit remaining a deferred choice and operational decision rather than a pre-baked developer decision.

Then it sounds to me that CoreRT is exactly what you're looking for.

No because I want the benefits the runtime Jit provides. I want Vector acceleration enabled for the specific CPU architecture its deployed on, I want runtime readonly statics converted to consts and branch elimination because of it; etc... I want a portable cross platform executable etc...

mikedn · 2016-12-17T22:28:40Z

So you'd end up with a single dll + pdb for deployment of your app

So far this sounds just like an ordinary IL merge tool.

and be jitted at runtime

Why at runtime? What's wrong with crossgen?

No because I want the benefits the runtime Jit provides. I want Vector acceleration enabled for the specific CPU architecture its deployed on, I want runtime readonly statics converted to consts and branch elimination because of it; etc... I want a portable cross platform executable etc...

Why not add such missing features to CoreRT/Crossgen?

benaadams · 2016-12-17T22:47:10Z

So far this sounds just like an ordinary IL merge tool.

And this is the point I'm suggesting that optimizations can be made that rosyln won't (or can't) do and the jit can't do. Either due to lack of whole program view or time.

Why at runtime? What's wrong with crossgen?

Crossgen doesn't work for VM deployed apps which can be resized and moved between physical hardware yet remain the same fully deployed O/S; unless the first start up step is to recheck every assumption made and potentially re-crossgen.

e.g. on Azure if I switch between an A series VM and an H series; if its compiled for the H series it will fail on the A series; if its compiled for the A series it won't take advantage of the H series improved CPUs.

jkotas · 2016-12-17T23:10:36Z

if its compiled for the H series it will fail on the A series

This is not correct. crossgen will either not generate the native code at all, leave hole in the native image and the method gets JITed at runtime from IL; or it will generate code that works for all supported CPUs. System.Runtime.BypassNGenAttribute or System.Runtime.BypassReadyToRunAttribute can be used to force specific methods to not be crossgened and be JITed at runtime.

We are actually too conservative in some cases and leaving too many holes in the native images, for example https://github.com/dotnet/coreclr/issues/7780

benaadams · 2016-12-17T23:34:09Z

crossgen will either not generate the native code at all, leave hole in the native image and the method gets JITed at runtime from IL; or it will generate code that works for all supported CPUs.

Good to know; I take that back 😁 Will the code it generates be more conservative the otherway also e.g. take less advantage of newer procs, or will it only generate the code that will be the same?

Maybe a better example then would be ifs based on readonly statics read from a config file or based on IntPtr.Size or ProcessorCount where the Jit would eliminate entire branches of code; whereas crossgen would have to leave them in. Might be getting a little esoteric now...

mikedn · 2016-12-17T23:35:05Z

And this is the point I'm suggesting that optimizations can be made that rosyln won't (or can't) do and the jit can't do. Either due to lack of whole program view or time.

The whole program view has been mentioned before and it is problematic. The lack of time is a bit of a red herring. Crossgen doesn't lack time. And one can imagine adding more JIT optimizations that are opt-in if they turn out to be slow. And the fact that you do some IL optimizations doesn't mean that the JIT won't benefit from having more time.

unless the first start up step is to recheck every assumption made and potentially re-crossgen.

That sounds like a reasonable solution unless the crossgen time is problematic.

This is not correct. crossgen will either not generate the native code at all, leave hole in the native image and the method gets JITed at runtime from IL; or it will generate code that works for all supported CPUs.

Yes but sometimes that isn't exactly useful. If I know that all the CPUs the program is going to run have AVX2 then I so no reason why crossgen can't be convinced to generate AVX2 code.

iam3yal · 2016-12-17T23:35:25Z

Well, it was a rhetorical question.

I wasn't trying to answer any question but to clarify what I wrote.

Some but not all. As already mentioned, even if you have all the code there are problems with reflection and runtime code generation. And if you limit yourself only to your own code then the optimization possibilities will likely be even more limited.

Sometimes, some optimization is better than nothing, again we're not speaking about details here so I don't know whether it make sense or not..

I know it's going to be limited but this limit is undefined and maybe, just maybe it would be enough to warrant this feature, I really don't know.

Dragging constexpr into this discussion only serves to muddy the waters.

I guess I can respect that but I wasn't speaking about constexpr per se I said that maybe it would fit into the same phase that is after the compiler compiled the code.

Constexpr, as defined in C++, has little to do with performance.

That's your own interpretation and opinion not a fact because many people would tell you a different story.

However, I did not imply that constexpr has anything to do with performance, like I said above I just thought that if a tool would be developed to optimize IL after the code is compiled then again maybe this would be the right phase to take constexpr into account and instead of evaluating it at compile-time, it would be evaluated at post-compile time.

C++ got constexpr because it has certain contexts where compile time evaluation is required - e.g. array sizes, attributes, template non-type arguments etc.

Yeah? the moment you start asking questions like why it's required? you soon realize that it has a lot with performance and not as little as you think but really, performance is a vague word without context so it's useless to speak about it.

It may be useful to be able to write [DefaultValue(Math.Cos(42.0))] but what is being discussed here has nothing to do with that. C++ like constexpr needs to be implemented by the C# compiler, it cannot be done by some sort of IL rewriter.

Yeah, okay I won't derail the discussion any further about it.

mikedn · 2016-12-17T23:36:09Z

Maybe a better example then would be if based on readonly statics read from a config file or based on IntPtr.Size or ProcessorCount where the Jit would eliminate entire branches of code; whereas crossgen would have to leave them in. Might be getting a little esoteric now...

LOL, it is esoteric. You want some kind of IL optimizer but at the same time you want CPU dependent optimizations 😄

benaadams · 2016-12-17T23:38:18Z

LOL, it is esoteric. You want some kind of IL optimizer but at the same time you want CPU dependent optimizations

I was saying why I wanted an il optimizer and also runtime Jitted code.

benaadams · 2016-12-17T23:54:23Z

@AndyAyersMS for linq optimization I assume the kind of things done by LinqOptimizer or roslyn-linq-rewrite which essentially converting linq to procedural code; dropping interfaces for concrete types etc.

mikedn · 2016-12-18T00:03:02Z

I was saying why I wanted an il optimizer and also runtime Jitted code.

Yeah, I know. But it's funny because the IL optimizer is supposed to do some optimizations without knowing the context and then the JIT has to sort things out.

jkotas · 2016-12-18T00:03:28Z

CoreRT compiler or crossgen are focused on transparent non-breaking optimizations. They have limited opportunities to change shape of the program because of such changes are potentially breaking for the reasons mentioned here - anything can be potentially inspected and accessed via reflection, etc.

I do like the idea of linking or optimizing set of assemblies (not entire app) together at IL level. It may be interesting to look at certain optimizations discussed here as plugins for https://github.com/mono/linker. The linker is changing the shape of the program already, so it is potentially breaking non-transparent optimization. If your program depends on reflection, you have to give hints to the linker about what it can or cannot assume.

If the IL optimizer can see a set of assemblies and it can be assured that nothing (or subset) is accesses externally or via reflection, it opens opportunities for devirtualization and more aggressive tree shaking at IL level. The promised assumptions can be enforced at runtime - DisablePrivateReflectionAttribute is a prior art in this space.

Similarly, if you give optimizer hints that your linq expressions are reasonable (functional and do not depend on the exact side-effects), it opens opportunities for optimizing Linq expressions - like what is done Linq optimizers mentioned by @benaadams .

Or if you are interested in optimizations for size, you can instruct the linker that you do not care about error messages in ArgumentExceptions and it can strip all the error messages. This optimization is actually done in a custom way in .NET Native for UWP toolchain.

tannergooding · 2017-10-10T18:25:18Z

I think the benefit is that a general purpose tool applies to all IL based languages and can include all optimizations.

Things like stripping the init flag from the locals directive so that locals don't have to be zeroed first also applies (this is a big perf boost for some stackalloc scenarios and also applies to methods with a lot of locals).

tannergooding · 2017-10-10T18:27:15Z

Other optimizations like replacing multiple comparisons with a single compare and such work as well:
if (x < 0 || x > 15) can be transformed to be if ((uint)(x) > 15).

There are tens (if not hundreds) of these micro-optimizations that native compilers do, and things like the AOT could do, but which the JIT may not be able to do (due to time constraints).

mikedn · 2017-10-10T18:41:52Z

Other optimizations like replacing multiple comparisons with a single compare and such work as well:
if (x < 0 || x > 15) can be transformed to be if ((uint)(x) > 15)

That happens to work but another similar one - if (x == 2 || x == 4 || x == 8 || x == 33) doesn't work so well because it depends on the target bitness. Sure, you could tell the IL optimizer to assume a particular target bitness but then things start to become a bit blurry, we have IL so we don't need to bother (too much) with that kind of stuff.

There are tens (if not hundreds) of these micro-optimizations that native compilers do

And like the one above many won't be possible in IL because they require knowledge about the target hw. Or the optimization result is simply not expressible in IL. You end up with a tool that can't do its job properly but needs a lot of effort to be written. And it's duplicated effort since you still the need to optimize code.

but which the JIT may not be able to do (due to time constraints).

Eh, the eternal story of time constraints. Except that nobody attempted to implement such optimization in the JIT to see how much time they consume. For example, in some cases the cost of recognizing a pattern might be amortized by generating smaller IR, less basic blocks, less work in the register allocator and so on.

Besides, some people might very well be willing to put up with a slower JIT if it produces faster code.

Ultimately what people want is for code to run faster. Nobody really cares how that happens (unless it happens in such a cumbersome manner that it gives headaches). That MS never bothered too much with the JIT doesn't mean that there's no room for improvement or that IL optimizers needs to be created.

Note to say that IL rewriters are completely pointless. But they're not a panacea to CQ problems.

JosephTremoulet · 2017-10-10T18:42:42Z

Thanks, @tannergooding, I realize that roslyn-link-rewrite's scope is narrower than "everything". What I'm trying to ask is, for the sort of transforms that are in its scope, is there a benefit/desire to have an IL rewrite step perform those same ones? If so, why, and if not, what similar transforms have people had in mind when pointing to it as an example?

tannergooding · 2017-10-10T18:53:21Z

I would think that a general purpose IL rewriter should be scoped to whatever optimizations are valid and would likely duplicate some of the ones the JIT/AOT already cover. As @mikedn pointed out, there are plenty that can't be done (or can only be partially done) based on the target bitness/endianness/etc.

However, I would think (at least from a primitive perspective) this could be made easier by building on-top of what the JIT/AOT compilers already have.

That is, today the JIT/AOT compilers support reading and parsing IL, as well as generating and transforming the various trees it creates from the IL>

If that code was generalized slightly, I would imagine it would be possible to perform machine-independent transformations and save them back to the assembly outside of the JIT itself.

This would mean that the JIT can effectively skip most of the machine-independent transformations (or prioritize them lower) for code that has gone through this pass and it can instead focus on the machine-dependent transformations/code-gen.

Not that it would be easy to do so, but I think it would be beneficial in the long run (and having the JIT/AOT code shared/extensible would also, theoretically, allow other better tools to be written as well).

JosephTremoulet · 2017-10-10T19:42:14Z

Yes, I also realize that an IL rewrite step needn't be limited to things that are in scope for roslyn-linq-rewrite. This thread has many specific suggestions of those. I am trying to ask a question specifically about the linq-related suggestions that have been made on this thread, please don't interpret it as a statement on anything beyond that. Since my question keeps getting buried by answers to a more general one that wasn't what I was trying to ask, I'll repeat it: for the sort of transforms that roslyn-linq-rewrite performs, is there a benefit/desire to have an IL rewrite step perform those same ones? If so, why, and if not, what similar transforms have people had in mind when pointing to it as an example?

tannergooding · 2017-10-10T19:56:16Z

Sorry. I had misunderstood your question originally, my bad 😄 (if I still managed to derail your question below, just ping me and I'll remove).

for the sort of transforms that roslyn-linq-rewrite performs, is there a benefit/desire to have an IL rewrite step perform those same ones?

I don't (personally) think there is any major benefit to having the transforms roslyn-linq-rewrite performs also duplicated in an IL rewrite step.

However, I do think there are some minor benefits:

You only have to run one tool, instead of 2 or 3
Hypothetically, there could be some roslyn-linq-rewrite transformations that are only possible after some other optimization or analysis has been done first. Having all the logic in a single tool would make that easier

what similar transforms have people had in mind when pointing to it as an example?

I think LINQ is a big one just because it makes writing your code so much easier, but it can also slow your code down if not down carefully.

I think auto-vectorization and auto-parallelization would be other similar transformations (just thinking of the more complex optimizations a native compiler might do). I think both of these are generally considered machine-independent (but of course, there are exceptions).

Pzixel · 2017-10-11T08:29:00Z

@tannergooding

I don't (personally) think there is any major benefit to having the transforms roslyn-linq-rewrite

You're so wrong here :) It has huge benefit.

I think LINQ is a big one just because it makes writing your code so much easier, but it can also slow your code down if not down carefully.

It slows down your code whenever you care about it or not. Because it involves tons of delegate callbacks instead of pure imperative code. If you are working with DB then you don't care because you already are on slow path, but in-memory transformations via linq are so sweet and this slow too, up to 100 times IIRC. For example, in my current project I use LINQ 1733 times in 8051 files. That seems to be a lot.

mikedn · 2017-10-11T08:35:19Z

It has huge benefit

Well, what is that benefit? That was the question.

Pzixel · 2017-10-11T08:40:52Z

@mikedn see Steno project, I don't know. MS research:

mikedn · 2017-10-11T08:44:52Z

Steno project, I don't know. MS research, if you know:

The question was not about the effect of the optimization. I don't think anybody questions the fact that LINQ is not exactly efficient and that replacing the zillion of calls and allocations it generates would speed things up.

The question it about the various optimization approaches. Roslyn rewriter. IL rewriter. And, why not, even "JIT rewriter".

Pzixel · 2017-10-11T08:53:49Z

@mikedn

IL rewriter - very complicated, and provide almost same possibilities as Roslyn. See Code Contracts library. Why restore code trees manually if we already have a Roslyn? And if we do, why not perform it at compile time instead of transforming stuff back and forth?
Roslyn - is guaranteed to generate valid IL, much more user-frienly, has already an Analyzer API and going to provide replace/original API which is ideal for this purpose.
JIT doesn't perform much simpler optimizations so I don't expect its developers even bother to implement something similar. Another the problem is that in this case compiler should know about System.Enumerable and all this stuff. It's ok for a custom analyzer, but it's not for general purpose compiler.

mikedn · 2017-10-11T11:11:12Z

So I understand correctly you prefer the Roslyn approach. Not because you actually need such optimizations to be implemented this way but because the other approaches seem more complicated.

Makes sense but at the same time it means that this implementation is tied to Roslyn and thus available to C# and VB only. Other languages will have to do their own thing.

benaadams · 2017-10-11T11:38:45Z

The way I see it is

JIT - Can optimize beyond what is expressible in IL; however is more focused on per function optimizations (always applied)
Roslyn - converts C# and VB to verifiable IL; some optimization in release (always applied)
AoT IL rewriter - opt-in configuration; whole program optimizations; linking/tree-shaking; transforms than may produce unverifiable il; function splitting (inlinable fast-path; non-inlining code path) etc

.NET IL Linker is an example of linking/tree-shaking with whole program analysis.

JosephTremoulet · 2017-10-11T17:07:20Z

.NET IL Linker is an example of linking/tree-shaking with whole program analysis.

Yes, exactly. My team is looking at expanding the rewrites available in .NET IL Linker. I'm currently trying to assess what rewrites people are interested in having made available, for planning/prioritization purposes, which naturally brought me to this issue where there's been much discussion of that. LINQ rewriting seems to generate a lot of interest, and for the various reasons already mentioned above doesn't seem like it will have a home in the JIT or in Roslyn proper (where by "proper" I mean excluding opt-in extensions). So that would make it a good candidate, except that AFAIK anybody who would benefit from having it available to opt into in .NET IL Linker could just as well opt into it by adding roslyn-linq-rewrite to their build process. But I've been wondering if I'm somehow glossing over something with that line of reasoning, hence my questions about it. The takeaway I'm getting from the responses is that no, there isn't any benefit to LINQ rewriting in .NET IL Linker over what's already available via roslyn-linq-rewrite, and that LINQ rewriting has come up on this thread simply as an example of something useful that has been done that doesn't fit in the JIT or Roslyn proper.

jkotas · 2017-10-11T17:19:35Z

anybody who would benefit from having it available to opt into in .NET IL Linker could just as well opt into it by adding roslyn-linq-rewrite to their build process.

The difference is credibility. roslyn-linq-rewrite is one-man project, last updated one year ago. It is a custom build of Roslyn compiler. It is hard to use for any serious project in the current form.

mikedn · 2017-10-11T18:12:39Z

and for the various reasons already mentioned above doesn't seem like it will have a home in the JIT

Come to think of it, the real reason such optimizations may not belong in the JIT hasn't been mentioned. These LINQ "optimizations" aren't optimizations in the true sense, those rewriters don't understand the System.Linq code and optimize it, they assume that LINQ's methods do certain things and generate code that supposedly behaves identical. That is, they treat those methods as intrinsics.

That's something that the JIT could probably do too, except it requires generating significant amounts of IR and that may be cumbersome.

But probably the main problem with doing this in the JIT is that, for better or worse, there's not a single JIT. There's RyuJIT, there's .NETNative, there's Mono... Sheesh, one way or another some duplicate work will happen.

I'm currently trying to assess what rewrites people are interested in having made available, for planning/prioritization purposes, which naturally brought me to this issue where there's been much discussion of that

So where's IL linker's repository? Let people create issues, discuss, vote on them etc. That's better than "hijacking" an existing thread like this. Granted, you may end up with a bunch of noise but that's life.

Here's a fancy idea. What if the IL linker would CSE and hoist everything it can and then the JIT would do some kind of rematerialization to account for target architecture realities? Perhaps it would be cheaper for the JIT to do that instead of CSE. Granted... that's a bit beyond the idea of "linker" :)

jkotas · 2017-10-11T20:49:59Z

That's something that the JIT could probably do too, except it requires generating significant amounts of IR and that may be cumbersome.

I believe that it is hard for the more interesting Linq optimizations to preserve all side-effects. It should not matter for well-written Linq queries, but it is a problem for poorly written Linq queries.

It is ok for a opt-in built-time tool to change behavior of poorly written Linq queries. It is not ok for JIT to do it at runtime.

JosephTremoulet · 2017-10-11T21:04:23Z

So where's IL linker's repository?

https://github.com/mono/linker

Let people create issues, discuss, vote on them etc. That's better than "hijacking" an existing thread like this

Yes, I completely agree it will be more productive to discuss potential new rewrites over there, with separate issues for each. I wasn't trying to extend general discussion on this thread, just ask a clarifying question about a few specific comments on it (to which I now have the answer, thanks @jkotas).

gafter · 2018-02-09T22:48:17Z

I think the conclusion here is that such work would not likely be part of Roslyn, but more likely in https://github.com/mono/linker . However, we'll leave this open so people can find this.

Pzixel · 2018-02-10T03:26:10Z

But in this case it's tied to mono, isn't it? What's about core, full framework etc?

jkotas · 2018-02-10T03:39:30Z

It is not tied to mono. dotnet/announcements#30

benaadams mentioned this issue Dec 15, 2016

Compiler should emit access to auto-prop backing store for access in same compilation unit #15644

Closed

Pilchie added the Area-Compilers label Dec 15, 2016

gafter added Area-External Discussion labels Dec 15, 2016

gafter added this to the Unknown milestone Feb 9, 2018

gafter removed the Area-Compilers label Feb 9, 2018

gafter added Code Gen Quality Room for improvement in the quality of the compiler's generated code and removed Concept-Code Quality Improvement labels Sep 17, 2019

danmoseley mentioned this issue Jan 31, 2020

Bring CoreRT and CoreCLR's CoreLibs into sync dotnet/runtime#7394

Closed

7 tasks

Joe4evr mentioned this issue Apr 17, 2020

Reuse Local and Anonymous Functions where possible (compiler optimization) #43307

Closed

jkotas mentioned this issue Nov 30, 2020

Remove most LINQ usage from Microsoft.Extensions.Configuration dotnet/runtime#44825

Merged

Joe4evr mentioned this issue Oct 21, 2021

unify ThrowHelper methods dotnet/runtime#60703

Closed

Proposal: IL optimization step #15929

Proposal: IL optimization step #15929

Comments

benaadams commented Dec 15, 2016 • edited by gafter Loading

HaloFour commented Dec 15, 2016

alrz commented Dec 15, 2016 • edited Loading

iam3yal commented Dec 16, 2016 • edited Loading

fanoI commented Dec 16, 2016

iam3yal commented Dec 16, 2016 • edited Loading

AndyAyersMS commented Dec 16, 2016

iam3yal commented Dec 16, 2016

mikedn commented Dec 17, 2016

iam3yal commented Dec 17, 2016 • edited Loading

mikedn commented Dec 17, 2016

iam3yal commented Dec 17, 2016 • edited Loading

AndyAyersMS commented Dec 17, 2016

iam3yal commented Dec 17, 2016

mikedn commented Dec 17, 2016

benaadams commented Dec 17, 2016 • edited Loading

mikedn commented Dec 17, 2016

benaadams commented Dec 17, 2016 • edited Loading

jkotas commented Dec 17, 2016

benaadams commented Dec 17, 2016 • edited Loading

mikedn commented Dec 17, 2016

iam3yal commented Dec 17, 2016 • edited Loading

mikedn commented Dec 17, 2016

benaadams commented Dec 17, 2016 • edited Loading

benaadams commented Dec 17, 2016

mikedn commented Dec 18, 2016

jkotas commented Dec 18, 2016

tannergooding commented Oct 10, 2017

tannergooding commented Oct 10, 2017

mikedn commented Oct 10, 2017

JosephTremoulet commented Oct 10, 2017

tannergooding commented Oct 10, 2017

JosephTremoulet commented Oct 10, 2017

tannergooding commented Oct 10, 2017

Pzixel commented Oct 11, 2017

mikedn commented Oct 11, 2017

Pzixel commented Oct 11, 2017 • edited Loading

mikedn commented Oct 11, 2017

Pzixel commented Oct 11, 2017 • edited Loading

mikedn commented Oct 11, 2017

benaadams commented Oct 11, 2017

JosephTremoulet commented Oct 11, 2017

jkotas commented Oct 11, 2017

mikedn commented Oct 11, 2017

jkotas commented Oct 11, 2017

JosephTremoulet commented Oct 11, 2017

gafter commented Feb 9, 2018

Pzixel commented Feb 10, 2018 • edited Loading

jkotas commented Feb 10, 2018

benaadams commented Dec 15, 2016 •

edited by gafter

Loading

alrz commented Dec 15, 2016 •

edited

Loading

iam3yal commented Dec 16, 2016 •

edited

Loading

iam3yal commented Dec 16, 2016 •

edited

Loading

iam3yal commented Dec 17, 2016 •

edited

Loading

iam3yal commented Dec 17, 2016 •

edited

Loading

benaadams commented Dec 17, 2016 •

edited

Loading

benaadams commented Dec 17, 2016 •

edited

Loading

benaadams commented Dec 17, 2016 •

edited

Loading

iam3yal commented Dec 17, 2016 •

edited

Loading

benaadams commented Dec 17, 2016 •

edited

Loading

Pzixel commented Oct 11, 2017 •

edited

Loading

Pzixel commented Oct 11, 2017 •

edited

Loading

Pzixel commented Feb 10, 2018 •

edited

Loading