Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch System.IO usage with Microsoft.IO usage to reduce string allocations during scanning directory (and some other path manipulation too.) #6075

Closed
Tracked by #6551
arkalyanms opened this issue Jan 22, 2021 · 12 comments · Fixed by #6771
Assignees
Labels
performance Performance-Scenario-Solution-Open This issue affects solution open performance. Priority:1 Work that is critical for the release, but we could probably ship without size:7 triaged
Milestone

Comments

@arkalyanms
Copy link
Member

Child of #6034

CPS switch is here - https://devdiv.visualstudio.com/DevDiv/_git/CPS?path=%2Fsrc%2FMicrosoft.VisualStudio.ProjectSystem%2FUtilities%2FPathHelper.cs&version=GBmain&line=22&lineEnd=23&lineStartColumn=1&lineEndColumn=1&lineStyle=plain&_a=contents

Highlights of the switch

  • The assembly Microsoft.IO.Redist.dll is 118KB in size and loaded in CPS based projects
  • Fixes globalization issues in CPU based projects (Eg: Previously could not open projects in a path containing "%C3%A7" or \wsl$)
  • It has memory and cpu performance benefits over System.IO and used in CPS PathHelper.
    • Solution load numbers below show 10-12% load time improvements in both cold and warm load scenarios.
    • API benchmarks show marked improvements of 2-3 times less CPU per API call and a big gain in the memory saved per API call.
      • The API benchmarks can also be used as data to propose wider usage of the Microsoft.IO assembly to replace System.IO across the org
      • Disclaimer: Scale tests may not see a marked improvement atleast in this PR since the benchmarks show no Gen 1 or Gen 2 impact before or after with memory reclaimed in Gen 0 GC. It is intended to relieve the Gen 0 GC pressure
VS build Solution Cold Load(First load in ms) Solution Warm Load(Second Load in ms)
Before master.30209.217 7878 7810
After master.30209.218 7236 6886
Method Mean Error StdDev Gen 0 Allocated
SystemIOMakeRelative (Short paths) 2,613.1 ns 23.79 ns 21.09 ns 0.6180 2608 B
MicrosoftIOMakeRelative (Short paths) 764.4 ns 2.41 ns 2.01 ns 0.0210 88 B
SystemIOMakeRelative (Long paths) 4,395.9 ns 19.68 ns 18.41 ns 1.4801 6219 B
MicrosoftIOMakeRelative (Long paths) 1,014.3 ns 2.98 ns 2.79 ns 0.0496 209 B
Method Mean Error StdDev Gen 0 Allocated
SystemIOSamePath (2 Short paths) 976.82 ns 19.226 ns 27.573 ns 0.1488 626 B
MicrosoftIOSamePath (2 Short paths) 537.74 ns 10.348 ns 9.680 ns 0.0134 56 B
SystemIOSamePath (Short and long paths) 48.93 ns 0.975 ns 1.855 ns - -
MicrosoftIOSamePath (Short and long paths) 54.19 ns 0.994 ns 0.881 ns - -
SystemIOSamePath (Long and short paths) 49.50 ns 0.813 ns 0.760 ns - -
MicrosoftIOSamePath (Long and short paths) 55.68 ns 1.134 ns 1.061 ns - -
SystemIOSamePath (2 Long paths) 1,903.02 ns 6.298 ns 7.001 ns 0.2632 1107 B
MicrosoftIOSamePath (2 Long paths) 687.57 ns 6.361 ns 5.639 ns 0.0229 96 B
Method Mean Error StdDev Gen 0 Allocated
SystemIOMakeRooted (Short paths) 508.2 ns 10.22 ns 12.55 ns 0.0839 353 B
MicrosoftIOMakeRooted (Short paths) 353.3 ns 1.33 ns 1.24 ns 0.0095 40 B
SystemIOMakeRooted (Long paths) 1,132.1 ns 4.14 ns 3.46 ns 0.1411 594 B
MicrosoftIOMakeRooted (Long paths) 533.7 ns 2.37 ns 2.22 ns 0.0095 40 B
Method Mean Error StdDev Median Gen 0 Allocated
SystemIOChangeFileName (Long path with filename) 2,949.0 ns 67.68 ns 194.19 ns 2,930.4 ns 0.7401 3105 B
MicrosoftIOChangeFileName (Long path with filename) 762.8 ns 15.10 ns 15.51 ns 763.0 ns 0.0668 281 B
SystemIOChangeFileName (Long with directory\filename) 2,841.2 ns 54.62 ns 105.23 ns 2,830.4 ns 0.7477 3145 B
MicrosoftIOChangeFileName (Long with directory\filename) 776.3 ns 13.98 ns 31.83 ns 771.0 ns 0.0706 297 B
SystemIOChangeFileName (Short path with filename) 1,123.9 ns 22.55 ns 51.80 ns 1,104.3 ns 0.2632 1107 B
MicrosoftIOChangeFileName (Long path with filename) 467.1 ns 9.29 ns 17.90 ns 462.9 ns 0.0210 88 B
SystemIOChangeFileName (Long path with directory\filename) 1,215.4 ns 24.04 ns 39.50 ns 1,200.6 ns 0.2747 1155 B
MicrosoftIOChangeFileName (Long path with directory\filename) 488.5 ns 9.03 ns 11.42 ns 484.0 ns 0.0248 104 B
Method Mean Error StdDev Gen 0 Allocated
SystemIOCombine (Short paths) 272.9 ns 4.23 ns 3.31 ns 0.0153 64 B
MicrosoftIOCombine (Short paths) 102.9 ns 2.12 ns 2.09 ns 0.0153 64 B
SystemIOCombine (Long paths) 407.1 ns 2.45 ns 2.05 ns 0.0439 185 B
MicrosoftIOCombine (Long paths) 118.6 ns 2.35 ns 6.15 ns 0.0439 185 B
@arkalyanms arkalyanms added performance needs-triage Have yet to determine what bucket this goes in. labels Jan 22, 2021
@AR-May AR-May added Performance-Scenario-Solution-Open This issue affects solution open performance. and removed needs-triage Have yet to determine what bucket this goes in. labels Jan 28, 2021
@panopticoncentral panopticoncentral added the Priority:1 Work that is critical for the release, but we could probably ship without label Mar 23, 2021
@marcpopMSFT marcpopMSFT added Epic Groups multiple user stories. Can be grouped under a theme. and removed Epic Groups multiple user stories. Can be grouped under a theme. labels Jun 11, 2021
@AR-May AR-May self-assigned this Jul 9, 2021
@rokonec
Copy link
Contributor

rokonec commented Jul 12, 2021

It is maybe not relevant here, but please be aware of past failed attempts:

@Therzok
Copy link
Contributor

Therzok commented Jul 17, 2021

@ladipro
Copy link
Member

ladipro commented Jul 19, 2021

Microsoft.IO.Redist brings some of the new .NET Core System.IO functionality to .NET Framework. As such it is Windows only. For a multi-targeted project like MSBuild, we should be able to use namespace #ifdef's (i.e. using Microsoft.IO only on Framework plus maybe also some type aliasing) so the builds consumes the right type -- from BCL on Core and from the Redist package on Framework.

@ladipro
Copy link
Member

ladipro commented Jul 19, 2021

It is maybe not relevant here, but please be aware of past failed attempts:

Since NGEN seems to have been a problem in the past attempt, I'm going to link #6666. Treating the new dependency just like the other system stuff there should be enough to generate a good native image.

@AR-May
Copy link
Member

AR-May commented Aug 2, 2021

Exit criteria & Estimate:

  • Include Microsoft.IO.Reddist without getting ngen errors. (1 dev days)
  • Find one location where it makes sense to use Microsoft.IO (0.5 dev days)
  • Measure initial perf. (0.5 dev days)
  • Implement changes (use API to enumerate files in the directory from Microsoft.IO). (3 days)
  • Measure (improved) perf. (0.5 dev days)

@AR-May AR-May added the size:7 label Aug 2, 2021
@Therzok
Copy link
Contributor

Therzok commented Aug 2, 2021

Microsoft.IO.Redist brings some of the new .NET Core System.IO functionality to .NET Framework. As such it is Windows only. For a multi-targeted project like MSBuild, we should be able to use namespace #ifdef's (i.e. using Microsoft.IO only on Framework plus maybe also some type aliasing) so the builds consumes the right type -- from BCL on Core and from the Redist package on Framework.

That will break everything running on Mono though.

@AR-May
Copy link
Member

AR-May commented Aug 2, 2021

Yeah, I saw about Mono in the PR comments for the previous attempt. At the moment my condition for Microsoft.IO usage is Condition="'$(TargetFrameworkIdentifier)' == '.NETFramework' and '$(MonoBuild)' != 'true'".

@AR-May
Copy link
Member

AR-May commented Aug 20, 2021

Status update: I have created a draft PR that replaces IO enumeration calls for the default implementation of IFileSystem.

The experimental insertion uncovered that there is a difference in behavior between enumeration API from System.IO and Microsoft.IO.

Measurements were done on my dev laptop for System.IO and Microsoft.IO enumeration (using ETW events).
I measured the incremental build of OrchardCore.

API IO enumeration calls Evaluation
System.IO 2800 ms 33000 ms
Microsoft.IO 1730 ms 25000 ms

So, here are two thoughts:

  • First, in this case IO calls takes ~8% of evaluation time. The improvement of IO enumeration calls is up to 40%. However these parameters can vary even on the same machine in different times.
  • Second, we can rely on the improvement of evaluation for ~1100 ms out of ~33000 ms (that is ~3,3%). Why the evaluation time improved for 24% instead (from 30000 to 25000)? Well, my guess that evaluation results given by APIs differs and that somehow affects the rest of evaluation code (it goes through another branch of the evaluation code or just processes less data). It may be that we do some unnecessary work in case of System.IO enumeration and our gain when we fix errors will be higher than 3,3%. But I think more probably this gain would be reduced back to 3,3% when we fix errors.

@arkalyanms
Copy link
Member Author

Thanks @AR-May. The bigger gains should be in memory in terms of gen 0 survival and allocated. That indirectly translates to perf gains which may be minimized slightly by 64 bit VS.

@AR-May
Copy link
Member

AR-May commented Aug 20, 2021

Hmm, right, memory wins should also translate to less time in evaluation. I computed the gains in memory when I did initial investigation, you can see them here (numbers that you see above is a second iteration of measurements, computed for the current code from PR).

Well, memory gains were there, but I have doubt it is a big part of these additional 20% of gain, i would be surprised if that is so. I am not really sure how memory gains translates to time gains though, but I do not expect them be that high...

@AR-May
Copy link
Member

AR-May commented Aug 20, 2021

The second part of status update: the risks.

Working on this issue we first broke the unit tests in msbuild repo and then broke >244 tests in the experimental insertion PR (required, optional & DDRITs, all of them). The reason is that similar API for enumeration seems to have different outputs in some cases. These cases were hit in unit tests and, more important, in experimental insertion tests.

Even when we fix all the test above, without additional investigation the risks are high.
To make the risks of this change low, we will need to

  • find out all the differences of API (note, some of the differences I already saw I have not found in docs)
  • figure out how they affect the code in all places where we use IFileSystem.Enumerate....
  • and to use opt-in or opt-out feature switches (which one we should use is still under the discussion).

@AR-May
Copy link
Member

AR-May commented Oct 5, 2021

The failures of DDRITs and other tests were addressed. I verified that there is no difference in the API for enumeration.
The root cause of the failures that I saw were dll problems. This problem was covered by throwing misleading errors, making us think there was some difference in behavior. To fix the failures we need add a binding redirect to the proper version of "System.Buffer" (one of Microsoft.IO.Redist dependency assemblies).

State: I fixed a problem & misleading errors. The PR is in review. One more problem was uncovered during the PR review: the code fails one of our sanity checks related to our projects' structure. This failure is actually not quite related to the changes, but rather the changes uncovered the existing problem with that. The milestone should not be affected by that, we have got a plan how to deal with that in the PR and upon that we will merge the PR.

ladipro pushed a commit that referenced this issue Oct 15, 2021
Fixes #6075

### Context
Microsoft.IO.Redist brings some of the new .NET Core System.IO functionality to .NET Framework. In particular, enumeration in Microsoft.IO was optimized comparing to System.IO: new enumeration API was added and old one was improved. We consider it is beneficial to switch default file system enumeration to the old API from Microsoft.IO.Redist.

### Changes Made

- Added Microsoft.IO.Redist
- Default file system enumeration uses Microsoft.IO instead of System.IO

### Testing
Unit tests & DDRITs

### Notes
We should be wary of possible regressions. There were differences in behavior, see `.NET Framework only` notes in [doc](https://docs.microsoft.com/en-us/dotnet/api/system.io.directory.enumeratefiles?view=net-5.0). The change therefore is under change wave 17_0.
@ladipro ladipro added this to the VS 17.1 milestone Dec 8, 2021
@AR-May AR-May added the triaged label Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance-Scenario-Solution-Open This issue affects solution open performance. Priority:1 Work that is critical for the release, but we could probably ship without size:7 triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants