Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Startup cost of XamlServices parsing is 27% slower on .NETCore than on .NET Framework #94

Open
dotMorten opened this issue Dec 5, 2018 · 44 comments
Labels
Bug Product bug (most likely) Performance Performance related issue rank20 Rank: Priority/rank on a scale of (1..100) regression status: This issue is a regression from a previous build or release
Milestone

Comments

@dotMorten
Copy link
Contributor

  • .NET Core Version: 3.0 Preview1
  • Windows version: 1803
  • Does the bug reproduce also in WPF for .NET Framework 4.8?: No

Problem description:
I expected/hoped that with the XmlReader hopefully being Span<T> based. the XAML parser would be faster in .NET Core than the .NET Framework equivalent. However that's not what I'm seeing with this benchmark:

    public class BenchmarkTests
    {
        static readonly string xamlString = @"<MyObject xmlns=""clr-namespace:XamlBenchmark;assembly=XamlBenchmark"" StringProperty=""Hello World"" Int32Property=""1234"" DoubleProperty=""123.4567890"" FloatProperty=""-0.9876"" />";

        [Benchmark]
        public object Test1()
        {
            object instance = System.Xaml.XamlServices.Parse(xamlString);
            return instance;
        }
    }
    public class MyObject
    {
        public string StringProperty { get; set; }
        public int Int32Property { get; set; }
        public double DoubleProperty { get; set; }
        public float FloatProperty { get; set; }
    }

Here are the results:

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.17763.1 (1809/October2018Update/Redstone5)
Intel Xeon CPU E5-1620 v3 3.50GHz, 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3190.0
  Job-FQKGZY : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.7.3190.0
  Job-YTBJBT : .NET Core 3.0.0-preview-27122-01 (CoreCLR 4.6.27121.03, CoreFX 4.7.18.57103), 64bit RyuJIT

Method Runtime Toolchain Mean Error StdDev Ratio RatioSD
Test1 Clr net472 363.3 us 7.139 us 7.639 us 1.00 0.00
Test1 Core netcoreapp3.0 461.6 us 9.078 us 8.048 us 1.27 0.03
@fabiant3-zz fabiant3-zz added the Bug Product bug (most likely) label Dec 6, 2018
@fabiant3-zz fabiant3-zz added this to the 3.0 milestone Dec 6, 2018
@fabiant3-zz
Copy link

Thanks @dotMorten. This is likely to be expected, as Core is not currently ngened. We'll address before Core3 ships.

@karelz
Copy link
Member

karelz commented Dec 6, 2018

cc @brianrob @billwert

@karelz
Copy link
Member

karelz commented Dec 6, 2018

Baiting @benaadams @AndreyAkinshin if they have interest to look for low-hanging fruit here 😉

@dotMorten
Copy link
Contributor Author

Also looking at allocations by running this parse method many many times, I see this hit over 40 times per parsing:
image

And here's some of the CPU analysis:
image

@0xd4d
Copy link

0xd4d commented Dec 6, 2018

Loading the assemblies' custom attributes takes a lot of time. The more assemblies that are loaded, the slower it will take. If you create a simple hello world app that tests this, .NET Framework loads 5 assemblies but .NET Core 3.0 preview loads 23 assemblies.

There's a XamlSchemaContext that could perhaps be reused since it's the class that loads all assemblies.

EDIT: If a XamlSchemaContext is used, .NET Core 3.0 preview is faster than .NET Framework 4.7.2 (tested 64-bit)

@richlander
Copy link
Member

I love seeing this kind of issue. Great to see the focus on fundamentals.

@karelz karelz added the Performance Performance related issue label Dec 6, 2018
@danmoseley
Copy link
Member

Re GetCustomAttributes, it was improved by @NickCraver in dotnet/coreclr#20779 but that change would already be in preview 1.

@0xd4d
Copy link

0xd4d commented Dec 6, 2018

@danmosemsft It's also 24 loaded assemblies (.NET Core 3.0 latest daily build) vs 5 loaded assemblies (.NET Framework 4.7.2).

If you patch System.Xaml to only check the first 5 returned assemblies (.NET Framework only loads 5 assemblies), then .NET Core 3.0 code executes faster.

So it seems the 5x number of assemblies is causing the slowdown. It has to execute more code.

Original .NET Core 3.0 daily build:

Name Exc % Exc Inc % Inc
module system.xaml <<system.xaml!XamlServices.Parse>> 24.0 1,073 91.6 4,090
module system.private.corelib <<system.private.corelib!Attribute.GetCustomAttributes>> 14.8 662 36.3 1,619
module coreclr <<coreclr!JIT_New>> 10.3 462 10.6 472
module clrjit <<clrjit!CILJit::compileMethod>> 5.4 239 5.4 243
module coreclr <<coreclr!JIT_NewArr1>> 5.4 239 5.7 253
module coreclr <<coreclr!ModuleHandle::ResolveType>> 5.2 234 5.2 234
module coreclr <<coreclr!MetaDataImport::GetCustomAttributeProps>> 5.2 231 5.6 248
module system.private.corelib <<system.private.corelib!CustomAttributeData.GetCustomAttributesInternal>> 4.9 219 22.9 1,023
module coreclr <<coreclr!Attribute::ParseAttributeArguments>> 4.1 183 5.4 241
module coreclr <<coreclr!MetaDataImport::Enum>> 3.4 152 4.0 178
module ntdll <<ntdll!LdrpDispatchUserCallTarget>> 2.6 116 2.6 116
module coreclr <<coreclr!MetaDataImport::GetParentToken>> 2.6 116 2.8 127

@brianrob
Copy link
Member

brianrob commented Dec 7, 2018

Finally had a chance to look into this. Really great to see folks engaged here.

I was able to reproduce the difference and captured traces of 10K iterations of the parse operation to compare.

As was mentioned above, the biggest issue that I see is significantly higher cost in and caused
by Attribute.GetCustomAttributes. The higher cost shows up in both CPU usage in Attribute.GetCustomAttributes and in allocation helpers that are zeroing memory to support increased allocations in this path.

The biggest CPU usage differences are:

  • Attribute.GetCustomAttributes: 560ms on Core vs. 265ms on Desktop = +111%
  • Allocation helpers (JIT_New and JIT_NewArr1): 907ms on Core vs. 597ms on Desktop = +52%

Breaking down the allocations that contribute to the additional allocation helper CPU (as well as memory cost), these are the top 3 allocators:

  • System.Reflection.CustomAttributeRecord[]
  • System.Xaml.MS.Impl.XmlNsInfo
  • Node[System.Reflection.Assembly.System.Xaml.MS.Impl.XmlNsInfo]

The first step here should be for this to be investigated from the reflection angle. @joshfree, can you get eyes on this from a reflection expert? I can share traces offline and help ramp someone up on the data.

@NickCraver
Copy link
Member

Having taken some steps here already, it definitely seems like we can eliminate CustomAttributeRecord[] in some way. These are allocated only to be iterated in the caller in all cases, but the way they're populated is non-trivial to change. I have a branch locally that tried to eliminate these but it creates some very odd null refs in ways I didn't readily understand (but someone else may quickly see why I'm an idiot there).

If you're curious, this is where they're generated for a great many attribute code paths.

@0xd4d
Copy link

0xd4d commented Dec 7, 2018

@brianrob

Attribute.GetCustomAttributes: 560ms on Core vs. 265ms on Desktop = +111%

This is probably because .NET Core loads 24 assemblies but .NET Framework only loads 5 assemblies. The code gets custom attributes of all loaded assemblies.

@brianrob
Copy link
Member

brianrob commented Dec 7, 2018

@0xd4d, you're right - the iteration over all assemblies happens here:

As @NickCraver points out, there is possibly some optimization that can happen in the reflection path, but it is also true that there may need to be some work done in Xaml parsing to either limit the number of assemblies that get processed, or cache the results. Without a clear understanding of the WPF technology, it's not clear to me what should be done here. @fabiant3, are you the right person to look at this?

@karelz
Copy link
Member

karelz commented Dec 7, 2018

cc @steveharter for Reflection

@benaadams
Copy link
Member

benaadams commented Dec 8, 2018

18k calls from Parse -> XamlXmlReader.Initalize -> NodeStreamSorter.ctor -> StartNewNodeStreamWithSettingsPreamble does become 14M calls to FilterCustomAttributeRecord and that's 44% of the time for Parsing.

image

image

@benaadams
Copy link
Member

or cache the results.

XamlSchemaContext has 10 ConcurrentDictionarys (instance rather than static) so I wonder if it had that in mind?

The main cost for Parsing (64%) happens under System.Xaml.XamlSchemaContext.TryGetCompatibleXamlNamespace(String, out String) where its still the same number of calls as Parse (18k in this case) and then begins to fan out.

@benaadams
Copy link
Member

Would it be valid to make those ConcurrentDictionaries static?

Sounds like the context is quite flexible on usage:

/// SchemaContexts can potentially be shared between multiple callers in an AppDomain, including
/// both full and partial trust callers. To be safe for sharing, the default implementation should
/// be idempotent and order-independent--i.e. functionally immutable.

Though I don't know if lookup should be unique in some way (e.g. different Xaml files having explicitly different contexts)

@NickCraver
Copy link
Member

There are multiple things we can yet do in reflection and I'd like to get to them, but I second @benaadams's approach here on an actual solution that would make the differences really desired here. The problem with reflection performance optimizations is they hit a hard wall with contracts: they must return a new attribute set on each call - we cannot return the same instances (this would break all sorts of existing code).

The only way we can cache instances (after deciding it's valid to do so) is well above the layer those allocations happen at, e.g. in a lookup dictionary such as those in play here.

@NickCraver
Copy link
Member

I just wanted to add here: in doing some benchmark down we discovered that while my optimizations went into 3.0 Preview 1, some steps backwards also landed in there. Details are in https://github.com/dotnet/coreclr/issues/21456 and a PR to fix that regression is in dotnet/coreclr#21462. That should help out here, but only marginally compared to not actually calling this path more than once as proposed above.

@joshfree
Copy link
Member

@steveharter can you assist in the investigation from the coreclr/Reflection side of the house?

@steveharter
Copy link
Member

Sure I'll take a look asap at the reflection side

@rladuca rladuca added the regression status: This issue is a regression from a previous build or release label May 20, 2019
@grubioe grubioe added the rank20 Rank: Priority/rank on a scale of (1..100) label Jul 16, 2019
@weltkante
Copy link

weltkante commented Aug 24, 2019

and I don't fully understand the consequences of keeping it alive "forever"

If thats the chosen path the cache might need weak references (making it even more expensive) to not block unloadable assemblies "forever", otherwise it would be a memory leak with apps having reloadable plugin architectures or are emitting generated code in collectible assemblies at runtime. We are doing the latter on desktop, I haven't used AssemblyLoadContext in Core yet so I don't know entirely the implications, but its something to keep in mind.

@grubioe grubioe modified the milestones: 3.1, 5.0 Oct 21, 2019
@SamBent SamBent changed the title XAML parsing is 27% slower on .NETCore than on .NET Framework Startup cost of XamlServices parsing is 27% slower on .NETCore than on .NET Framework Nov 21, 2019
@SamBent
Copy link
Contributor

SamBent commented Nov 21, 2019

Changed the title to reflect the real issue.

@bugproof
Copy link

bugproof commented Dec 4, 2019

executing HTTP requests in WPF command is much slower on .NET Core idk what causes this but it works fine in .NET Framework. (I'm using refit btw)

not sure if it's WPF problem or .NET Core problem or refit problem... https://github.com/dotnet/corefx/issues/23401

EDIT:

SOLVED. See this comment https://github.com/dotnet/corefx/issues/23401#issuecomment-561671803

@SamBent
Copy link
Contributor

SamBent commented Dec 4, 2019

For reference: @bugproof comment is unrelated to this issue or to WPF. (.NET Core implements HTTPRequest differently from .NET Framework.)

@rollsch
Copy link

rollsch commented Apr 13, 2020

Just FYI it is issues like this that have caused us to stop our port to netcore. I had hoped performance would be better. I hope these issues continue to see activity and improvement.

@SamBent
Copy link
Contributor

SamBent commented Apr 14, 2020

@rolandh - We determined that this issue only affects apps that load lots of small XAML snippets using XamlServices.Parse. Is that your case? (We believe it's not all that common, but we don't have any hard evidence to either support or refute that belief.)

Please continue to open issues, or upvote existing issues, about specific perf concerns. Comments about "issues like this" are not really actionable, as no one knows what you're referring to specifically.

@rollsch
Copy link

rollsch commented Apr 14, 2020

I haven't experienced any issues as we haven't ported to net core yet. I was just saying that seeing multiple issues like this has meant that we won't be considering porting ove from net 4.7 in the short term. I've seen quite a few benchmarks showing slow downs, I haven't seen any showing performance increases in WPF yet, when I do I'll look at porting. I thought that might be good information for MS on how to prioritise issues and development.

I know its not an actionable comment for you, but it might be for someone higher up who wants to see a higher percentage off people port their applications.

@Lxiamail
Copy link
Member

Lxiamail commented May 6, 2020

I was just saying that seeing multiple issues like this has meant that we won't be considering porting ove from net 4.7 in the short term.
@rolandh can you articulate the multiple issues that you have been and concerned you moving to .NET Core besides this one? That will help us to priorities perf investigation.

@rollsch
Copy link

rollsch commented May 7, 2020

@Lxiamail

Pretty much every major library's first iteration to netcore has had slowdowns. Sometimes dramatic, eg 10x slower with entity frameworks 2.1 release.

I see almost nothing showing performance benefits, only contrived small applications seem to benefit. Everything that is a large in production library seems to go backwards and introduce bugs.

There is no incentive for us to upgrade after seeing issues like this repeatedly occur after porting to netcore.

Here is a list of relevant cases I've seen when researching whether netcore is stable enough to upgrade yet.

#1549
#1748

https://supportcenter.devexpress.com/ticket/details/T824984/wpf-core-3-0-performance

This was nopcommerce after upgrading to netcore
https://www.nopcommerce.com/en/boards/topic/54986/nopcommerce-410-high-memory-usage-and-how-weve-handled-it

EF Core large slow downs vs the older entity framework
nopSolutions/nopCommerce#3419

@philipag
Copy link

philipag commented May 7, 2020

Just my 2 cents: I also ported a large WPF+EntityFramework app to core 3.0 and found that load times increased from <4 seconds for .Net Framework to 5+ seconds with .Net Core 3.0. This is with creating native images for both in release and debug builds. Runtime performance seemed largely unchanged so I abandoned this porting effort and will try this again with .Net 5 when it is released. There are numerous reports of slow-downs like in the post above and also this one: https://github.com/dotnet/wpf/issues/94

My impression is that some work is required to address all these already reported WPF + .Net Core performance issues in order to make it worthwhile to perform this migration for more complex applications with lots of assemblies and XAML resources.

Cheers
Philip

@danmoseley
Copy link
Member

@ajcvickers re EF feedback above.

@laggage
Copy link

laggage commented Jun 15, 2020

I seemed to encounter the performance problem too; The same wpf code target .net core 3.1 and another .net 4.8, the latter memory usage is about 20MB while the former is 32MB+ almost 60MB at the startup of program;
image
Read a same file as hex and show on a TextBox, the memory usage is above, still, net core cost more
memory than net48 ; I felt confused
image

@benaadams
Copy link
Member

@laggage your .NET 48 is running as 32-bit and .NET Core is running as 64-bit

Does adding <RuntimeIdentifier>win-x86</RuntimeIdentifier> to your .csproj help? (moving it to 32-bit)

@laggage
Copy link

laggage commented Jun 16, 2020

@benaadams I tried your suggestion, but still, core wpf just cost more memory than net4.x wpf;

image

@philipag
Copy link

@SamBent This issue was milestoned to core 5.0 but I don't see any updates even though 5.0 is being released. Is there reason to believe these performance regressions were fixed and the people on this forum should evaluate this again in their applications.

@SamBent
Copy link
Contributor

SamBent commented Oct 29, 2020

The original issue - startup cost of XamlServices.Parse - is not yet addressed in 5.0. It only comes up if the app explicitly asks for it, so it's not a factor in load times of most apps.

A number of other issues have been mentioned here, most of them outside WPF's scope. You can follow the links to check on the status with the appropriate owners.

@philipag
Copy link

philipag commented Nov 1, 2020

@SamBent In my case the problems may be DataTemplate instantiation via XamlReader.Parse which I assume internally calls XamlServices.Parse.

Will this issue be assigned to a new milestone or will WPF apps which make using of XAML parsing see permanent load time degradation if ported to .Net Core?

@ryalanms ryalanms modified the milestones: 5.0.0, 7.0.0 Aug 26, 2021
@ekalchev
Copy link

ekalchev commented Nov 11, 2022

This was milestoned to 7.0 but the startup time of WPF apps running .net7.0 are still very slow compared to .net 4.8

@TimexPeachtree
Copy link

Pls is there any updates on this from .net 8

@rollsch
Copy link

rollsch commented Aug 27, 2024

This is a very old thread but can someone give any benchmarks for net 8 vs net 4.7/4.8 ?

@ekalchev
Copy link

ekalchev commented Aug 30, 2024

This is a very old thread but can someone give any benchmarks for net 8 vs net 4.7/4.8 ?

The issue is still present. We recently migrated our net48 product to .net8. We have automated benchmarking after each build. We can clearly see how almost everything at startup is performing better except the xaml parsing. This negates all performance gains of the other parts of the code where thing are performing better. We end up with around 10% slower startup times, because wpf performance issues. Everyone is very disappointed because .net 8 promised performance but I hasn't been delivered in wpf apps.

@rollsch
Copy link

rollsch commented Aug 31, 2024

Thank you for saving us an immense amount of effort for little gain.

@ThomasGoulet73
Copy link
Contributor

ThomasGoulet73 commented Aug 31, 2024

I ran the benchmarks in the original post locally but I now compare .Net Framework 4.8 vs .Net 8.0 and here are my results:

BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4037/23H2/2023Update/SunValley3)
AMD Ryzen 7 7745HX with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100-preview.7.24407.12
  [Host]     : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-XTDWJG : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-ELJHHU : .NET Framework 4.8.1 (4.8.9261.0), X64 RyuJIT VectorSize=256
Method Runtime Mean Error StdDev
Test1 .NET 8.0 150.6 us 3.01 us 6.08 us
Test1 .NET Framework 4.8 183.2 us 3.31 us 4.54 us

It looks like .Net 8.0 is faster than .Net Framework 4.8 for this specific benchmark on my machine. If others encounter any performance regression from .Net Framework, it would be great if they could share them and ideally include a repro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Product bug (most likely) Performance Performance related issue rank20 Rank: Priority/rank on a scale of (1..100) regression status: This issue is a regression from a previous build or release
Projects
None yet
Development

No branches or pull requests