Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve testing related to tracer / dependency tracking #3078

Open
jtschuster opened this issue Oct 19, 2022 · 6 comments
Open

Improve testing related to tracer / dependency tracking #3078

jtschuster opened this issue Oct 19, 2022 · 6 comments

Comments

@jtschuster
Copy link
Member

jtschuster commented Oct 19, 2022

As we create more tools for investigating size after trimming to answer "why is this kept/not kept," it's crucial to improve tracing / dependency tracking. Right now, we only have the most basic testing and there are lots of methods in MarkStep that start with if (Annotations.IsMarked(thing)) return; without tracking the dependency passed in. This early return helps performance of linking but will only mark with a single reason, and will lead people to believe that if they remove the first dependency, they'll be able to remove thing, even if it's not true.

My first thought is to replace KeptAttribute with KeptByAttribute(string MetadataProvider, DependencyKind) to indicate an item should be marked by a specific type/method and a specific reason. I've found a couple places where an item was marked as expected, but due to unrelated/buggy logic, and hopefully this would help catch it.

@marek-safar
Copy link
Contributor

I did a version of this in #1228

@jtschuster
Copy link
Member Author

jtschuster commented Oct 31, 2022

@agocke and I spoke a little about this and debated the benefits of recording every single dependency that marks a node vs. only recording the first dependency.

Marking first dependency:

  • Pros:
    • Avoids cycles in the dependency graph
    • Creates a minimum spanning tree of the dependencies that can be easier to reason about
  • Cons:
    • Doesn't give a full picture of all the nodes that are marking a node (you could remove node S creating the dependency that keeps node T, re-link, and find another dependency keeping T)
    • Makes it harder to test that a certain dependency is keeping a node (requires the test to be constructed so that there's only the one dependency keeping it)

Marking all dependencies:

  • Pros:
    • Gives a more complete picture
    • Makes it easier to test that a certain dependency is keeping a node
  • Cons;
    • Can be very cluttered with lots of noise
    • Cycles can make it harder to traverse up to find the "top level node" that is causing a node to be marked

I'm still in favor of marking all dependencies, and I think we could add a little extra information to be able to construct a tree from the graph and avoid cycles, like the order the linker came across the dependencies. We also could put the burden on the visualizers to come up with a solution for investigating a graph with cycles.

@vitek-karas
Copy link
Member

I don't think linker should be the one to reason about what dependency is important - it doesn't know... and doesn't care.
The tool should simply dump all dependencies it knows about. The consumers of the output should then apply whatever logic makes sense to turn it into usable data shape.

We need the system to be:

  • Almost nonexistent if tracing is turned off (vast majority of linker runs will not have tracing turned on, so no perf cost)
  • Reasonably efficient if it's on, but it should not prefer speed over data completeness

If later on we find that it makes sense to look at the data with the "first dependency only" view we can create a helper library to process the dumped data into that shape. (If we have more than one user of the data, it would make sense to have such library anyway, to deal with parsing the file format and mapping the strings back onto some kind of object model - maybe even type system eventually).

@agocke
Copy link
Member

agocke commented Nov 1, 2022

That makes sense to me as a design goal, but if the system currently produces information that's good enough I'm not particularly inclined to improve it until we have a specific request to do so.

@vitek-karas
Copy link
Member

I didn't mean to imply that we should do anything about it. Just a guiding principle. I agree that if it currently works, no need to change anything.

@MichalStrehovsky
Copy link
Member

Almost nonexistent if tracing is turned off (vast majority of linker runs will not have tracing turned on, so no perf cost)

This is how tracing in ILCompiler.DependencyAnalysisFramework is implemented e.g. here's the first edge logging strategy: https://github.com/dotnet/runtime/blob/82aa87f9cdf9ac20f0f685016648c36247a76ff1/src/coreclr/tools/aot/ILCompiler.DependencyAnalysisFramework/FirstMarkLogStrategy.cs and here's a no-op logging strategy: https://github.com/dotnet/runtime/blob/82aa87f9cdf9ac20f0f685016648c36247a76ff1/src/coreclr/tools/aot/ILCompiler.DependencyAnalysisFramework/NoLogStrategy.cs

Notice these are structs - we then instantiate a generic type over these structs. Because of how generic specialization works, this means that all of this is 100% inlineable. The no-op logging strategy has zero impact (not even an if check anywhere but at the beginning, when we're creating the graph/analyzer).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants