Improve testing related to tracer / dependency tracking #3078

jtschuster · 2022-10-19T21:13:42Z

As we create more tools for investigating size after trimming to answer "why is this kept/not kept," it's crucial to improve tracing / dependency tracking. Right now, we only have the most basic testing and there are lots of methods in MarkStep that start with if (Annotations.IsMarked(thing)) return; without tracking the dependency passed in. This early return helps performance of linking but will only mark with a single reason, and will lead people to believe that if they remove the first dependency, they'll be able to remove thing, even if it's not true.

My first thought is to replace KeptAttribute with KeptByAttribute(string MetadataProvider, DependencyKind) to indicate an item should be marked by a specific type/method and a specific reason. I've found a couple places where an item was marked as expected, but due to unrelated/buggy logic, and hopefully this would help catch it.

The text was updated successfully, but these errors were encountered:

marek-safar · 2022-10-20T08:33:56Z

I did a version of this in #1228

jtschuster · 2022-10-31T17:22:00Z

@agocke and I spoke a little about this and debated the benefits of recording every single dependency that marks a node vs. only recording the first dependency.

Marking first dependency:

Pros:
- Avoids cycles in the dependency graph
- Creates a minimum spanning tree of the dependencies that can be easier to reason about
Cons:
- Doesn't give a full picture of all the nodes that are marking a node (you could remove node S creating the dependency that keeps node T, re-link, and find another dependency keeping T)
- Makes it harder to test that a certain dependency is keeping a node (requires the test to be constructed so that there's only the one dependency keeping it)

Marking all dependencies:

Pros:
- Gives a more complete picture
- Makes it easier to test that a certain dependency is keeping a node
Cons;
- Can be very cluttered with lots of noise
- Cycles can make it harder to traverse up to find the "top level node" that is causing a node to be marked

I'm still in favor of marking all dependencies, and I think we could add a little extra information to be able to construct a tree from the graph and avoid cycles, like the order the linker came across the dependencies. We also could put the burden on the visualizers to come up with a solution for investigating a graph with cycles.

vitek-karas · 2022-11-01T08:11:05Z

I don't think linker should be the one to reason about what dependency is important - it doesn't know... and doesn't care.
The tool should simply dump all dependencies it knows about. The consumers of the output should then apply whatever logic makes sense to turn it into usable data shape.

We need the system to be:

Almost nonexistent if tracing is turned off (vast majority of linker runs will not have tracing turned on, so no perf cost)
Reasonably efficient if it's on, but it should not prefer speed over data completeness

If later on we find that it makes sense to look at the data with the "first dependency only" view we can create a helper library to process the dumped data into that shape. (If we have more than one user of the data, it would make sense to have such library anyway, to deal with parsing the file format and mapping the strings back onto some kind of object model - maybe even type system eventually).

agocke · 2022-11-01T20:04:05Z

That makes sense to me as a design goal, but if the system currently produces information that's good enough I'm not particularly inclined to improve it until we have a specific request to do so.

vitek-karas · 2022-11-01T20:39:49Z

I didn't mean to imply that we should do anything about it. Just a guiding principle. I agree that if it currently works, no need to change anything.

MichalStrehovsky · 2022-11-02T01:14:38Z

Almost nonexistent if tracing is turned off (vast majority of linker runs will not have tracing turned on, so no perf cost)

This is how tracing in ILCompiler.DependencyAnalysisFramework is implemented e.g. here's the first edge logging strategy: https://github.com/dotnet/runtime/blob/82aa87f9cdf9ac20f0f685016648c36247a76ff1/src/coreclr/tools/aot/ILCompiler.DependencyAnalysisFramework/FirstMarkLogStrategy.cs and here's a no-op logging strategy: https://github.com/dotnet/runtime/blob/82aa87f9cdf9ac20f0f685016648c36247a76ff1/src/coreclr/tools/aot/ILCompiler.DependencyAnalysisFramework/NoLogStrategy.cs

Notice these are structs - we then instantiate a generic type over these structs. Because of how generic specialization works, this means that all of this is 100% inlineable. The no-op logging strategy has zero impact (not even an if check anywhere but at the beginning, when we're creating the graph/analyzer).

jtschuster added the area-Linker: Tests label Oct 27, 2022

jtschuster mentioned this issue Oct 31, 2022

Add KeptByAttribute to validate an item was kept due to a specific dependency #3096

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve testing related to tracer / dependency tracking #3078

Improve testing related to tracer / dependency tracking #3078

jtschuster commented Oct 19, 2022 •

edited

Loading

marek-safar commented Oct 20, 2022

jtschuster commented Oct 31, 2022 •

edited

Loading

vitek-karas commented Nov 1, 2022

agocke commented Nov 1, 2022

vitek-karas commented Nov 1, 2022

MichalStrehovsky commented Nov 2, 2022

Improve testing related to tracer / dependency tracking #3078

Improve testing related to tracer / dependency tracking #3078

Comments

jtschuster commented Oct 19, 2022 • edited Loading

marek-safar commented Oct 20, 2022

jtschuster commented Oct 31, 2022 • edited Loading

vitek-karas commented Nov 1, 2022

agocke commented Nov 1, 2022

vitek-karas commented Nov 1, 2022

MichalStrehovsky commented Nov 2, 2022

jtschuster commented Oct 19, 2022 •

edited

Loading

jtschuster commented Oct 31, 2022 •

edited

Loading