-
Notifications
You must be signed in to change notification settings - Fork 194
Allocations in FormReader #553
Comments
|
Addressing Task allocations in aspnet#553 Unfortunately this has a return type split between in public api net451 and dotnet5.4
Addressing Task allocations in aspnet#553 Unfortunately this has a return type split between in public api net451 and dotnet5.4
Addressing Task allocations in aspnet#553 Unfortunately this has a return type split between in public api net451 and dotnet5.4
Would it be better overall to just make formreader a "push" parser? Using value task might kill the allocs, but it's still a lot of code to run. We could also avoid a second pass on the "output" stream. Pseudocode:
|
I'll do some profiling this AM. I want to see where reading form data fits in to the big picture in a sampling profile |
Some sampling data (same benchmark): MVC at a high level The 100% in this case is time spend inside MVC+Routing (only negligible work done in other parts of the pipeline). The 13.39% Some extras: There's a very small amount of overhead here between MVC and the form reader, I was surprised how small it was. So, reading and parsing the input is 13.29% of the request time. Let's drill in more. The bottom-up breakdown provided by dottrace is pretty instructive: We're looking at a pretty significant amount of time spend in async/task overhead 11.63% spent in Additionally, another big cost is hashing. In the common case, we hash each key twice (once for a |
Is this pre or post change? ValueTask should also kill a lot of the AsyncMethodBuilder work |
Re: 18.75% of the total time being spent on hashing and multi-hashing. The cost is in KeyValueAccumulator for the Dictionary? @stephentoub proposed a Other approach would be to not use Also the Dictionary has had security mitigations applied for this scenario - so a different datastructure would be a complicated choice :( |
@benaadams - well we're using dictionary as a multi-map, so really the best choice for us would be a dedicated multi-map type, or something like python's A
or
It has to be built in to Really for our usage, a sorted Example here: https://github.com/aspnet/Mvc/blob/dev/src/Microsoft.AspNetCore.Mvc.ViewFeatures/ViewFeatures/AttributeDictionary.cs Form-data isn't a good general serialization format, and it's not a replacement for JSON (or others). |
This is without any of your changes I think. |
Is user input though so want to avoid a data structure that might behave like this did http://www.troyhunt.com/2012/08/fixing-hash-dos-good-and-proper-and.html |
Actually |
@benaadams, @rynowak, if the concern is that we need to frequently double-hash, have you considered storing a: struct StringWithHash
{
public int HashCode;
public string Value;
...
} as the key rather than a raw string? You can either compute the hash upfront and store it in the struct, and override GetHashCode to return it, or you can compute the hash lazily in GetHashCode if it wasn't already computed. |
@stephentoub - that introduces yet another tradeoff, on the common-case path we're directly building the dictionary that's going to exposed to the user as A custom dictionary implementation would get us out of this mess, but I'd rather try literally everything else first. Do you have any thoughts on the |
Where in
Depends where the costs are. |
BTW @benaadams - this benchmark is here https://github.com/aspnet/Performance/tree/dev/testapp/BigModelBinding
|
In that case, yes, I'd expect @benaadams' PR at #556 to help. |
@stephentoub some thoughts, not conclusions... Not sure storing the hash code is possible; doesn't Dictionary have randomised string hashing for strings that was specifically introduced for aspnet and user-input, would hashing on an int cause issues? However, string does look like it uses I did briefly look previously as to whether string could cache its hash before realising the special relationship between Not changing the public signature (e.g. returning This could be hidden via implementing the interface |
@rynowak it wasn't happy with the wacky date format ;) So set it to 1st Jan, but have the tests working now |
Hitting disk; due to different Length behaviour with fixed sized MemoryStream, correcting. |
Addressing Task allocations in aspnet#553 Unfortunately this has a return type split between in public api net451 and dotnet5.4
Addressing Task allocations in aspnet#553 Unfortunately this has a return type split between in public api net451 and dotnet5.4
Re: Dictionary hashing on non string types https://github.com/dotnet/coreclr/issues/2279 |
Summary:
|
There's a lot of overhead right now in FormReader
Data is from 3000 requests to https://github.com/aspnet/Performance/tree/dev/testapp/BigModelBinding (x64). The client is doing a form post of about 100 form fields - we kinda consider this the upper bound for the amount of data (and shape of data) that you'd want to put through MVC model binding.
Based on this profile, my napkin math shows about 77mb of allocations from our underlying representation (
string
+System.Collections.Generic.Dictionary<String, StringValues>
+ etc) and about 122mb coming from various overheads.I'm excluding from the 122mb stuff that's in theory covered by @benaadams current work (byte/char buffers, etc).
Breaking this down further
Task<string>
- 47mbCost of being async
Task<Nullable<KeyValuePair<string, string>>>
- 28mbCost of being async
Dictionary<string, List<String>>
,List<string>
,string[]
- 25mb 12mb 9mbScratch data used in
KeyValueAccumulator
to build the finalDictionary<string, StringValues>
Note that 5mb of
string[]
allocations are coming from a hashset in MVC, so I excluded it from this totalSome thoughts
We should consider a design for
FormReader
where we pass the accumulator around or store values in fields/properties instead of returning them viaTask<T>
. Usingasync
plusTask<T>
at such a chatty level is what causes all of this overhead. UsingTask
on the other hand can avoid this issue.If we're in a state where we know the entire body is buffered in memory, we might want to just optimize that path to do a synchronous read. That's probably a simpler fix, and it will result in running much more efficient code without any async overhead.
We should also consider changing
KeyValueAccumulator
to just operate onDictionary<string, StringValues
directly. Having multiple values for the same fields is the less-common case. If we wanted to be smart about it, we could basically buildList<T>
's resizing behavior into aStringValues.Add
method (returns aStringValues
since they are immutable), and then all cases would be pretty optimal.Right now we're making the worst case the common case by allocating a
List<string>
andstring[]
for the common case of a single value per key.The text was updated successfully, but these errors were encountered: