Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize jit-analyze #306

Merged
merged 5 commits into from
Dec 4, 2020
Merged

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Dec 1, 2020

While working on #305 , I realized that we do "git diff" only to check if there are any files that has textual diff and no metric diff. This was added to not do the analysis of metric diff if there are no real diffs in base/diff. However, common case is that there are diffs that users analyze using jit-analyze. It is very rare scenario where there would be no diffs. Additionally, the code that was added to perform "git diff" so we can skip the actual analysis was broken until now and was fixed recently in #305. So far, we always did the analysis of base/diff regardless of "git diff" told that there were diffs or not. So overall, I do not find doing "git diff" in the beginning much useful and it consumes 20~30 seconds to complete. More common cases (where there are diffs), we start the analysis of various metrics once "git diff" completes. So, I have moved the part of doing "textual diff" towards the end with an option of -skiptextdiff to skip if user thinks it is not required. That speeds up getting the interesting report from jit-analyze faster.

Also, I have made the process that reads each file to extract metrics execute in parallel using .AsParallel() and that performs the analysis even faster than what it used to be before.

@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

@sandreenko
Copy link

I realized that we do "git diff" only to check if there are any files that has textual diff and no metric diff. I never found it useful and it consumes 20~30 seconds to complete after which we start doing actual diffing of various metrics.

I thought the idea of getting textual diff before the actual analysis was to prevent running metric analysis if two files have no textual diffs.

@kunalspathak
Copy link
Member Author

kunalspathak commented Dec 1, 2020

I realized that we do "git diff" only to check if there are any files that has textual diff and no metric diff. I never found it useful and it consumes 20~30 seconds to complete after which we start doing actual diffing of various metrics.

I thought the idea of getting textual diff before the actual analysis was to prevent running metric analysis if two files have no textual diffs.

Yes, sorry I forgot to mention that part. Until now the part of "getting textual diff before the actual analysis was to prevent running metric analysis if two files have no textual diffs." was broken and it used to always do the analysis. I fixed that part in #305. However, what I feel is that the common case is where there are diffs and that scenario becomes faster with this approach. It is rare that we run into diffs that are similar.

Edit: Updated the PR description to include this part.

@BruceForstall
Copy link
Member

It is very rare scenario where there would be no diffs.

That's true, but it seems somewhat dangerous to not determine and display this by default, because you might then assume there are no diffs but in actuality there are some real diffs that don't show up in the metrics diffs.

@AndyAyersMS
Copy link
Member

Until now the part of "getting textual diff before the actual analysis was to prevent running metric analysis if two files have no textual diffs." was broken

This surprises me -- when and why did it break?

As far as changing things around goes, I share Bruce's concerns -- especially since we don't have any tests for jit-analyze to confirm whether or not we're changing behavior.

@kunalspathak
Copy link
Member Author

It is very rare scenario where there would be no diffs.

That's true, but it seems somewhat dangerous to not determine and display this by default, because you might then assume there are no diffs but in actuality there are some real diffs that don't show up in the metrics diffs.

"there are some real diffs" as in "text diffs" ? Currently, we just show total addition+deletion for textual diffs as seen here. With what I am proposing, we will continue to show it only if user passed -textdiff flag. If you prefer, I can change it to -hidediff so we show textual diff stats by default.

@kunalspathak
Copy link
Member Author

Until now the part of "getting textual diff before the actual analysis was to prevent running metric analysis if two files have no textual diffs." was broken

This surprises me -- when and why did it break?

Earlier, we used to initialize Diff inside DiffInText() and returned if ExitCode == 0. Then the caller would do a check for null instead of diffCounts.Count == 0 and that condition was always false and we continued analyzing the metric diff.

As far as changing things around goes, I share Bruce's concerns -- especially since we don't have any tests for jit-analyze to confirm whether or not we're changing behavior.

Right, as I mentioned above, I can change the flag to -hidetextdiff so we show text diff stats by default.

@AndyAyersMS
Copy link
Member

that condition was always false

Still not quite following. At one point this all worked as expected and the diff was an effective filter. When did that break?

@AndyAyersMS
Copy link
Member

Ah, looks like I broke things in in #137.

src/jit-analyze/jit-analyze.cs Outdated Show resolved Hide resolved
Co-authored-by: Andy Ayers <andya@microsoft.com>
@kunalspathak kunalspathak merged commit bd80267 into dotnet:master Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants