-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linker Size Work Proposals #607
Comments
Technique: Dynamically breaking down Conditionals. Since the linker does not currently support dead code elimination, it cannot break down any conditionals inside method bodies. One trick that we use to work around this is to move those conditional pieces into separate methods; then we can give the linker a list of those methods and tell it to replace their bodies with exceptions. The common situation is that you have
and then
In the example above, the first conditional calls some method in a type that has not been referenced yet. Since the linker cannot rewrite the IL code of that conditional, this external type will be referenced and thus preserved. The second conditional block doesn't reference any external types, but it is quite complex, so we would benefit from a size reduction by removing it. In both cases, we solve the problem by moving the code inside those conditionals into their own method because then the linker can remove that entire method. We already use this for See mono/corefx#298 for an example. This will also help us at a later time, when we might add a basic block scanner, flow analysis and dead code elimination. Conditionals of the sort
will be most easily removed by such an engine, even more so if that method is static and parameterless. |
Filed upstream dotnet/corefx#38281 for the |
The following in an older design document about my new linker module - which I call the "Linker Optimizer" that I wrote on April 4th. A lot of the things mentioned in this document have become obsolete in the meantime, but I'm including it here in full for future reference and to avoid losing it. Historic: And Older Design Document about my "Linker Optimizer"All new files are kept in a separate directory to keep changes to the original linker code to an absolute minimum. As of this writing, the diff against the linker code is very tiny:
In comparision, the newly added code is huge!
As you can see by the size of the newly added code, this is more like an additional module that's added on-top of the linker than a set of changes to the existing linker. This new module consists of the following components:
Before we dive deep into those components, let me first give you a brief overview of the new configuration. At the moment, all the new configuration is in a separate section in the linker description file. I will give a more detailed overview in a separate section in this file, but as a brief overview, you can do the following:
Basic Block ScannerWhen enabled, the new module replaces the The main entry point is First, the entire method body will be broken down into basic blocks and each block assigned both a In this regard, the An important part of the Basic Block Scanner is the This makes the higher-level code much simpler and cleaner as it doesn't have to worry about any of those low-level details. You can easily insert / modify / delete instructions in a block and the We currently support the following branch types:
The branch instruction will always be the last instruction of the block and for each branch instruction with an explicit target, the target block will have a jump origin pointing back to us. For The scanner also looks at the target of each If no linker conditionals are found, then by default the scan result will be discarded and we continue with the normal linker's code-path by calling the This behavior can be overridden by the Scanning all method bodies is required to detect linker conditionals in them. Should performance be an issue, then we can use the new XML to explicitly tell the linker which classes need to be scanned. For the moment, I wanted to keep things as simple as possible and not require explicit registration via XML. If any linker conditionals have been found (or Linker ConditionalsThere are two kinds of linker conditionals:
Explicit Conditional MethodsWe currently support the following conditionals. Please note that some of these have become obsoleted by newer design philosophy and functionality (namely the new fully complete flow analysis and dead code elimination). This code was written before the dead code eliminator and with the old and obsoleted design philosophy in mind that "features" should be dynamically detectable at runtime - we now require all features to be explicitly declared with link-time options (either in XML or on the command-line). I believe that all of them except for All of the following methods are The core part is:
The following are still in active use by the test suite (though we could remove them from corlib and only resolve from the test helper assembly):
And I belive all of those can go away:
Since each of these conditionals lives in its own class, it is really easy to remove some of them. The enum
We could in theory replace that enum with a non-enum based approach, but again this will make registration and looking a bit more complex and I wanted to keep it simple for the moment. Adding an additional enum value is one line of code and literally takes me one minute because those names are also automatically resolved from the XML options. We do not have to spew these all over the place, in fact they can all be contained to one single file. Each of these features can be disabled via either the XML or the command-line, for instance
The We can of course remove them from corlib, but should reserve some values for those regression tests (since you have to use constant loads like I didn't hook up any command-line options yet, but there's an environment variable Implicit Conditional MethodsAs I mentioned above, An "implicit conditional method" currently has the following requirements: it must be parameterless, return Specifically, after all processing, the method body must be one of
or
or
We could of course hook up the XML and just "teach" the linker that an arbitrary method should be treated as such. If the The property does not need to be static and it may contain additional code besides the conditional, such as for instance
You can really put arbitrary code into that property getter (which allows the conditional to be added to existing properties!) as long as the dead code elimination component (which is actually quite advanced) can resolve it all into a boolean constant. Once a method is identified to be such "magic" one, then all calls to it are treated as a linker conditional - the call will be put onto a basic block by itself with an instance of Conditional ResolutionAfter basic block scanning is comnplete, all linker conditionals will be resolved. Each such conditional will be in a basic block by itself and the During resolution, that block will be rewritten to resolve the conditional; the Since some of the conditional methods may have arguments and we're removing the
This mean that this mechanism will work with arbitrarily complex arguments, but only those which will use simple loads in IL will fully benefit from dead code elimination. If you push something complex onto the stack, that something will stay - followed by a The basic block scanner already does this distinction and will put such simple loads into the conditional's basic block. Similarly, the If anything else follows the So again, the IL can be arbitrarily complex, but only the simple cases will be treated specially. If the IL is too complex, then worst case the linker won't be able to detect some basic blocks as being "dead" - but it won't break, you'd still get the correct boolean value, just that value won't say "Hey, I'm a constant!". Flow AnalysisPreliminary ThoughtsAfter all linker conditionals are resolved, flow analysis will be performed. Just having those conditionals alone won't do us any good - sure, the code would be skipped at runtime, but as long as it's still in the IL, everything it references must be kept in the IL as well. And you can easily pull in "the entire world" if you're not careful! The basic idea is that instead of explicitly declaring a bunch of types and methods as conditional either in XML or by some other means, we would like to have as much automation as possible. And the end result should as closely resemble what you would get had the linker conditional been replaced by a hard Or in other words - if you replaced every single And those conditionals should be in as few places as possible to make most use of the automation. Ideally, this will also provide us with a fallback-option because should this linker research project either fail or not finish in time, then we could still easily go back to compiler-based hard-conditionals. The main advantage of the automation is that it will allow our customers to selectively enable certain features for some of their projects. And the idea is to make this as easy as by the click of a button - you want globalization, non-western encodings, crypto, etc. - check this box and you got it. You don't need any of those? Well, here's your decreased code size. To do any of this, we need flow analysis. The CodeThis is actually the second implementation of the flow analysis code, a complete rewrite over the first version. And quite surprisingly, the actual implementation is actually quite small - as of this writing, the entire class is actually less than 150 lines of code. This is because the Basic Block Scanner already takes care of most of the work by computing and keeping track of Branch Types and Jump Origins. The flow analysis code also does not distinguish between "definitely reachable" and "conditionally reachable" - it doesn't need to, all that matters is whether or not a basic block could possibly be reached. And it also doesn't track variable assignments, lifetime or anything thelike. Again, because it doesn't need to. Deep DiveThe algorithm is actually quite simple. We already have Branch Types and Jump Origins. The Branch Type determines whether or not the next block will be reachable (like for instance if you encounter a
One important thing I learned while playing around this this code is that for our set purposes, we can actually make one very important assumption - an assumption which will simplify the algorithm quite significatly!
And, as mentioned above, the second and equally important optimization is
So now let's dive into the algorithm. While iterating over all the basic blocks, we know whether or not control can flow over from the previous block. We maintain a simple Then, we have to look at our Jump Origins. If we've already marked the origin block as reachable, then we can immediately mark the current block as reachable as well. Otherwise, we remember that jump origin on an If the current block is considered to be reachable, then once we looked at all our Jump Origins, we also walk that list of And that is already the entire algorithm. The basic design philosophy is to be conservative - we rather miss marking some block as unreachable than illegitimately removing a reachable one. Dead Code EliminationAfter Flow Analysis has been performed, Dead Code Elimination takes places. This actually comes in several different phases each aiming at eliminating different thigns. Dead BlocksThis is the most basic step - it will simply remove all blocks that have been marked as "dead" by Flow Analysis. This step also fully trusts the Flow Analysis - that is, it will actually remove blocks with active jump origins (removing those origins as well). It is actually the only component that will allow a basic block with active jump origins to be removed, using a low-level API call that by-passes those checks. Therefore, it has to be performed immediately after Flow Analysis with no intermediate steps. Here, special consideration needs to be given to exception blocks. Each exception block consists of several parts: the Each of these "code blocks" can actually consist of one or more basic blocks. While dead code elimination will be performed within each of those "code blocks" - each such "code block" can only be removed if the entire exception block is been removed. Or in other words, if there's some dead code within a Again, the design philosophy is to be on the conservative side - and we need to be a bit careful when it comes to placing conditionals inside exception blocks. Dead JumpsThis is mostly cosmetic, but is be required for full detection of "Implicit Conditional Methods" as described above. What this does is to remove jumps to the immediately following instruction:
Constant JumpsConstant Jump Elimination detects and eliminates the pattern of constant load followed by boolean conditional jump, such as:
These will be replaced with either an unconditional Unused VariablesThis component actually detects two things: unused variables and single-assignemnt of constant value. So for instance, if you have a single
(and optionally a bunch of And it being a constant means that it can be eliminated and directly replaced with that constant. We only detect This is especially handy when it comes to those "Implicit Conditional Methods" - methods that are known to return a boolean constant. So those will actually be turned into constant loads: RestartingAt the moment, if any dead code has been eliminated, the entire process of Flow Analysis followed by Dead Code Elimination will be restarted. We might be able to make some optimizations in this regard, but at the moment the individual Dead Code Elimination components depend on each other to a certain degree. New XMLThis document is already quite long and it's also getting late, so I will only briefly touch on the XML format here, but extend this section later. Here's one example of a more complex XML:
This demonstraces some conditionals as well as the debugging functionality:
The interesting thing about those Both
The
on methods, it currently supports:
Please do not be discouraged by this seemingly huge amount of XML - only a tiny fraction of it will actually be required in the final product (namely the New TestsAt the time of this writing, there are 42 new tests as well as 3 IL tests; 28 of these tests have their own XML file. All of those new tests are in a directory of their own with a special The To make this work, several of the NUnit tests needed to have additional categories added:
Final ThoughtsThe current approach follows the following design philosophies:
I invested both the last weekends into because I really believe in the powers of the new tool and wanted to bring it to a state where I feel confident talking about it and demonstrating it to other people. There have been no major design changes since about late Thursday / early Friday - with the last weekend spent on actually using the tool and most of today writing this document. This means that the tool is still relatively "young" in it's state of development - but at least it has received a few days of intensive use and testing without needing any larger refactorings. |
(the following is from an older design document, with current day editions and corrections) Historic: Another Older Design Document on my Linker OptimizerThe problem we are trying to solve is that even when using the linker, the linked size of The initial plan was to start with conditionally removing the To do that, we need to somehow reliably identify whether the feature is being used by user code. In my initial proof-of-concept I tried to avoid any larger than necessary changes to the linker (so I did not modify or extend the XML description) – so I annotated a few APIs with custom attributes, one custom attribute to “mark” a feature as being used, another to mark a method for conditional elimination. I was able to complete this fairly quickly and it worked fine for SRE – but the concept had serious flaws and was abandoned. As a next step, I wrote a very simple size report tool that computed the size of individual namespaces and classes, to get an idea what would be most useful to start with. One of the biggest chunks was globalization and especially managed collation and we had some discussions about whether globalization should always be disabled (similar to what invariant mode does in CoreFX) or whether this should somehow be done dynamically. The next phase of the project was to research whether removing it dynamically was even possible. But before I dive into that, let me explain what exactly is meant with “removing it dynamically” – because the meaning of that actually changed throughout the project. At the beginning, the idea was to detect at link-time whether the user code was using globalization and then decide whether or not to remove it. However, this can be quite tricky and before putting too much effort in it, we wanted to know what the benefit of it would be. So we decided to start by explicitly telling the linker which "features" to remove - and eventually revisit the concept of the linker deciding that on it's on at a later point. Dynamically removing the globalization code is far from trivial because it is so closely interconnected with many conditional code paths that a simple mark and sweep linker could not remove. And since even a stubbed-out method would still have a certain size both in IL as well as in metadata - doing so for too many tiny methods wouldn't be too effective. If you use a C# To do this, we need to break the IL code down into Basic Blocks, perform some basic Flow Analysis on them, then do Dead Code (including unused variables) elimination. Initially, there were some tricky situations where some code annotations were required, but those are not needed anymore. At first, I felt like this was an incredibly complicated task, but when I finally had my first proof-of-concept of it ready, I was extremely happy and proud of the seemingly unsurmountable task that lay behind me. However, it still wasn't ready for production use and there were also concerns about stability and long-term maintainability So I took the entire thing and refactored it into a stand-alone module, allowing me to move forward without breaking or blocking other people by not modifying the existing linker code base at all. I also added a regression test suite as well as a new XML-based configuration system (to avoid using any custom attributes at all). (end of the historic design document) The above document is probably a month or so old. We eventually adopted what I previously mentioned about in the section called "Technique: Dynamically breaking down Conditionals". However, I believe that at some point in the future we should still revisit and complete the work that's been put into this module. |
Thanks for all of the great details! I didn't have a chance today, but tomorrow I'll give a brain dump of what we have. |
I added my code here: https://github.com/baulig/linker-optimizer. |
Here we go. I'll start high level to get a few posts out quickly and then I'll follow up with more detailed information. I'll start by commenting on the pieces of your project and how they compare to ours
Our implementation mainly operates at the Instruction level. I see similarities in your code to what we are doing. We just don't have this higher level abstraction. That said, we do parse try/catch/finally/fault into a higher level data structure.
I didn't look at your implementation of this yet, but based on your description we have something similar. Some aspects of our implementation may be slightly less flushed out. We mostly lean on our inlining and compare reduction code to reduce branch loads down to values that can be known and thus allow us to remove the dead path. We do support all branch instructions (i.e
Our's is considerably more sophisticated. We can use overload information from Our stack evaluator can parse any instruction and provide much more detail about the loads. This allows us to remove arbitrarily deep To decide what is safe to remove we have code for checking instructions and methods for side effects (it's relatively simple and conservative at the moment). We can check if something may be the first to trigger a cctor.
We have a
We have all of the same things.
We've had similar ideas and desires in the past but we have not implemented anything like this.
We have ~540 tests
There are another 100 or so unit tests checking the behavior of some of the lower level classes and helpers. |
I would say our design philosophy is similar. Although there may be some difference so I'll elaborate on ours. Making a change to a body that results in valid IL but the wrong result is the worst case scenario. We don't want to have invalid IL bugs, but at least those won't run. We have taken a number of measures to ensure quality.
|
Our original motivating use caseOur original dog food scenario was to remove all traces of Step 1 was to have UnityLinker replace calls to Step 2 was a cheat we had to use. I think we are actually close to being able to remove it. We also set From there the general purpose editing logic we put together is able to inline, reduce comparisons, etc, and eventually remove the branches inside a couple of marked methods that were using registry code. Recap of UnityLinkerHere are some more details about our design. First, a recap of how UnityLinker is setup. UnityLinker references monolinker. We use different command line parsing code. Other than that we
UnityLinker is essentially a super set of monolinker. We also have our own Body Editing DesignOur body editing runs as a step before the Our body editing abilities are broken down into "Features". In the code each Our features
EmptyNonReturningMethodRemovalRemoves calls to empty void return methods. This is one of the puzzle that allows us to remove the try/finally/Dispose that is generated in certain EmptyScopeRemovalThe other piece of the puzzle that let's us remove certain try/finally/Dispose patterns. This feature also supports some other try/finally removals. UnreachableBranchRemovalAttempts to figure out the path branches will take. If it can figure one out it will follow all instructions in the body except for the path we know won't be taken and then it removes all instructions that were not visited. It's essentially a Mark & Sweep for the IL in a PlatformFixingThis is the step that will set the value of a couple methods in the class libraries based on the platform we know is being targeted. ArchitectureFixingWe don't make use of this currently because we prefer to keep the managed assemblies architecture agnostic, but it sets the value of architecture information to the value we know it will be. InliningInlines calls to methods that return a constant value. InlineGettersInlines calls to methods that simply return a field. Can optionally respect scoping visibility rules such as private, protected, internal, public, or we can disregard them. When the code is headed to il2cpp for example we can violate visibility rules. InlineSettersInlines calls to methods that simply set a field. Again we can optionally choose to respect scoping visibility or disregard them. ConstantCompareReductionCan replace the various compare instructions such as CleanupStrandedUnusedStoresSame as your CleanupEmptyBranchesSimilar to what you have. It removes empty branches. CleanupUnusedLocalsRemoves locals that are no longer needed |
Behind the feature step implementations are various helper classes that help make this all possible. SmartILProcessorI've mentioned this one before. It wraps cecils StaticConstructorAnalyzerChecks to see if a given instruction in a SideEffectEvaluatorChecks to see if a given instruction could have a side effect. The primary cases that we've implemented are non-call instructions. For example StackEvaluatorParses what is on the stack for a given instruction. Extra details for dup, exceptions, calls, loads are all made available. CallEvaluatorUses information provided by BranchEvaluatorAttempts to figure out which path a branch will take. OtherThere are lots of other helper and extension methods to make operating with cecil types easier. There are a number of other helper classes to store data, handling diagnostics, etc. |
To wrap up, my thoughts are.
|
We solved this with basic constant propagation and feature switches in .net5 |
New meta issue to track and collect all the information about the going BCL size reduction work.
Relevant PR's:
Allow some Globalization code to be dynamically removed. #596 - there is a lot of interesting discussion in here.
Add option to the XML input to rewrite methods to throw / stubs. #555 - this was a proposal that was abandoned.
Add custom logic for reflection-emit to RemoveFeaturesStep. #590 - this is the technique that we're currently using to dynamically remove the
System.Reflection.Emit
code, which will shortly also be used for some globalization stuff.For Globalization (Japanese, Taiwan and Hebrew Calendars): Make
DateTimeFormatInfo
andDateTimeParse
more linker friendly. mono/corefx#298, MakeCalendarData
more linker friendly. mono/mono#14825, Allow some Globalization code to be dynamically removed. #596, Bump CoreFX to pickup https://github.com/mono/corefx/pull/298. mono/mono#14851.Interesting design discussion: Add Link-time framework feature removal spec designs#42.
Upstreaming and discussing with CoreFX: Disable some Calendars when using GlobalizationMode.Invariant. corefx#38281.
I will continually update this issue and document things in here while we move forward.
The text was updated successfully, but these errors were encountered: