Optimize jit-analyze #306

kunalspathak · 2020-12-01T07:13:38Z

While working on #305 , I realized that we do "git diff" only to check if there are any files that has textual diff and no metric diff. This was added to not do the analysis of metric diff if there are no real diffs in base/diff. However, common case is that there are diffs that users analyze using jit-analyze. It is very rare scenario where there would be no diffs. Additionally, the code that was added to perform "git diff" so we can skip the actual analysis was broken until now and was fixed recently in #305. So far, we always did the analysis of base/diff regardless of "git diff" told that there were diffs or not. So overall, I do not find doing "git diff" in the beginning much useful and it consumes 20~30 seconds to complete. More common cases (where there are diffs), we start the analysis of various metrics once "git diff" completes. So, I have moved the part of doing "textual diff" towards the end with an option of -skiptextdiff to skip if user thinks it is not required. That speeds up getting the interesting report from jit-analyze faster.

Also, I have made the process that reads each file to extract metrics execute in parallel using .AsParallel() and that performs the analysis even faster than what it used to be before.

kunalspathak · 2020-12-01T07:13:57Z

@dotnet/jit-contrib

sandreenko · 2020-12-01T07:22:05Z

I realized that we do "git diff" only to check if there are any files that has textual diff and no metric diff. I never found it useful and it consumes 20~30 seconds to complete after which we start doing actual diffing of various metrics.

I thought the idea of getting textual diff before the actual analysis was to prevent running metric analysis if two files have no textual diffs.

kunalspathak · 2020-12-01T07:24:38Z

I realized that we do "git diff" only to check if there are any files that has textual diff and no metric diff. I never found it useful and it consumes 20~30 seconds to complete after which we start doing actual diffing of various metrics.

I thought the idea of getting textual diff before the actual analysis was to prevent running metric analysis if two files have no textual diffs.

Yes, sorry I forgot to mention that part. Until now the part of "getting textual diff before the actual analysis was to prevent running metric analysis if two files have no textual diffs." was broken and it used to always do the analysis. I fixed that part in #305. However, what I feel is that the common case is where there are diffs and that scenario becomes faster with this approach. It is rare that we run into diffs that are similar.

Edit: Updated the PR description to include this part.

BruceForstall · 2020-12-01T18:01:05Z

It is very rare scenario where there would be no diffs.

That's true, but it seems somewhat dangerous to not determine and display this by default, because you might then assume there are no diffs but in actuality there are some real diffs that don't show up in the metrics diffs.

AndyAyersMS · 2020-12-01T18:45:25Z

Until now the part of "getting textual diff before the actual analysis was to prevent running metric analysis if two files have no textual diffs." was broken

This surprises me -- when and why did it break?

As far as changing things around goes, I share Bruce's concerns -- especially since we don't have any tests for jit-analyze to confirm whether or not we're changing behavior.

kunalspathak · 2020-12-01T19:15:27Z

It is very rare scenario where there would be no diffs.

That's true, but it seems somewhat dangerous to not determine and display this by default, because you might then assume there are no diffs but in actuality there are some real diffs that don't show up in the metrics diffs.

"there are some real diffs" as in "text diffs" ? Currently, we just show total addition+deletion for textual diffs as seen here. With what I am proposing, we will continue to show it only if user passed -textdiff flag. If you prefer, I can change it to -hidediff so we show textual diff stats by default.

kunalspathak · 2020-12-01T19:18:29Z

Until now the part of "getting textual diff before the actual analysis was to prevent running metric analysis if two files have no textual diffs." was broken

This surprises me -- when and why did it break?

Earlier, we used to initialize Diff inside DiffInText() and returned if ExitCode == 0. Then the caller would do a check for null instead of diffCounts.Count == 0 and that condition was always false and we continued analyzing the metric diff.

As far as changing things around goes, I share Bruce's concerns -- especially since we don't have any tests for jit-analyze to confirm whether or not we're changing behavior.

Right, as I mentioned above, I can change the flag to -hidetextdiff so we show text diff stats by default.

AndyAyersMS · 2020-12-03T21:18:13Z

that condition was always false

Still not quite following. At one point this all worked as expected and the diff was an effective filter. When did that break?

AndyAyersMS · 2020-12-04T00:00:43Z

Ah, looks like I broke things in in #137.

src/jit-analyze/jit-analyze.cs

Co-authored-by: Andy Ayers <andya@microsoft.com>

kunalspathak added 3 commits November 30, 2020 22:55

Use parallel

481fd02

Use .AsParallel()

925ae3e

Just use .AsParallel()

3a8f607

kunalspathak requested a review from BruceForstall December 1, 2020 07:13

Convert showTextDiff to skipTextDiff

abfccb7

AndyAyersMS approved these changes Dec 4, 2020

View reviewed changes

src/jit-analyze/jit-analyze.cs Outdated Show resolved Hide resolved

Update src/jit-analyze/jit-analyze.cs

722720e

Co-authored-by: Andy Ayers <andya@microsoft.com>

kunalspathak merged commit bd80267 into dotnet:master Dec 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize jit-analyze #306

Optimize jit-analyze #306

kunalspathak commented Dec 1, 2020 •

edited

Loading

kunalspathak commented Dec 1, 2020

sandreenko commented Dec 1, 2020

kunalspathak commented Dec 1, 2020 •

edited

Loading

BruceForstall commented Dec 1, 2020

AndyAyersMS commented Dec 1, 2020

kunalspathak commented Dec 1, 2020

kunalspathak commented Dec 1, 2020

AndyAyersMS commented Dec 3, 2020

AndyAyersMS commented Dec 4, 2020

Optimize jit-analyze #306

Optimize jit-analyze #306

Conversation

kunalspathak commented Dec 1, 2020 • edited Loading

kunalspathak commented Dec 1, 2020

sandreenko commented Dec 1, 2020

kunalspathak commented Dec 1, 2020 • edited Loading

BruceForstall commented Dec 1, 2020

AndyAyersMS commented Dec 1, 2020

kunalspathak commented Dec 1, 2020

kunalspathak commented Dec 1, 2020

AndyAyersMS commented Dec 3, 2020

AndyAyersMS commented Dec 4, 2020

kunalspathak commented Dec 1, 2020 •

edited

Loading

kunalspathak commented Dec 1, 2020 •

edited

Loading