-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve XmlDictionaryWriter UTF8 encoding performance #73336
Conversation
…runtime into binary_xml_text
@HongGit who is the right reviewer for this PR? |
@tannergooding and @stephentoub... can you guys take a peek at this one? This is an improvement we'd like to take if it looks good, but we wanted some more eyes on it with the use of Vector256. |
@adamsitnik might be able to help review the vector stuff also. |
src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlStreamNodeWriter.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlStreamNodeWriter.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlStreamNodeWriter.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlStreamNodeWriter.cs
Outdated
Show resolved
Hide resolved
int numRemaining = (int)(charsMax - chars); | ||
int numAscii = charCount - numRemaining; | ||
|
||
return numAscii + (_encoding ?? s_UTF8Encoding).GetByteCount(chars, numRemaining); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the possible values of _encoding
? Can it be something other than Utf8?
Note that it better to call Encoding.UTF8.GetBytes
directly without caching the encoding locally. Encoding.UTF8.GetBytes
allows devitalization optimization to kick in that eliminates the overhead of Encoding being an abstract type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be passed by the user when creating a text XmlDictionaryWriter, but it is only set to _encoding if the codepage is the same as utf8.
So in theory it can be any encoding class even if unlikely .
for s_encoding
it does not use the default constructor but passes is (false, true)
so I did no dare to do that change.
If it does not change the behaviour then that can be a simple follow up fix.
…tem/Xml/XmlStreamNodeWriter.cs Co-authored-by: Stephen Toub <stoub@microsoft.com>
as an aside @Daniel-Svensson how is the coverage of this code in dotnet/performance? ie., are there scenario/s there that will show improvement, and thus protect the improvement from future regression? |
src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlStreamNodeWriter.cs
Show resolved
Hide resolved
src/libraries/System.Private.DataContractSerialization/src/System/Xml/XmlStreamNodeWriter.cs
Outdated
Show resolved
Hide resolved
I've removed the improved utf8 encoding logic so the 25% speedup is gone, but at least is a bit faster for inputs with mixed ascii / non ascii characters. While it is more than 3 times slower than original proposal it is still faster than the current version for larger strings, and for all cases where the input contains one or more "non-ascii" characters. I do not expect any large performance changes from normal "all ascii" cases since many input strings falls in the range 8-25 characters and it should hopefully not change the performance of that case. I have moved the vectorisation code to a sepate branch where I moved the vectorization code to Utf8Encoding and the results would be somewhat better than here, and I might create a separate PR (or not) from it. The input to the method in question will be at most 170 (512/3) in length but depending on entry point the limit can also be 42 or 85 BenchmarkDotNet=v0.13.5, OS=Windows 11 (10.0.22621.1413/22H2/2022Update/SunValley2)
AMD Ryzen 7 5800X, 1 CPU, 16 logical and 8 physical cores
.NET SDK=8.0.100-preview.1.23115.2
[Host] : .NET 8.0.0 (8.0.23.11008), X64 RyuJIT AVX2
Job-XXUKJT : .NET 8.0.0 (8.0.23.11008), X64 RyuJIT AVX2
MaxRelativeError=0.01 IterationTime=250.0000 ms
Ascii Only Case
Mostly Ascii case
|
} | ||
|
||
internal static SealedUTF8Encoding UTF8NoBom { get; } = new SealedUTF8Encoding(encoderShouldEmitUTF8Identifier: false, throwOnInvalidBytes: false); | ||
internal static SealedUTF8Encoding ValidatingUTF8 { get; } = new SealedUTF8Encoding(encoderShouldEmitUTF8Identifier: false, throwOnInvalidBytes: true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was originally a temporary part of moving vector code to Encoding class.
It does not seem to make any impact to datacontract serialisation at the moment so I can revert the changes if you want that. From the code it looks like improvements would mainly be from classes calling into XmlConverter which uses this encoding directly
…or text based DataContractSerializer
|
||
while (true) | ||
// Fast path for small strings, use Encoding.GetBytes for larger strings since it is faster when vectorization is possible | ||
if ((uint)charCount < 25) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling into encoding actually seems to be faster from 25 characters up, but that is when we don't need to handle branch predictions so I increased it to 32 handle misspredictions without to much affect on performance.
The "microbenchmarks" showed >5% regression where a long class name was mixed with many short strings & names when calling encoding from 25 chars and upp for the (text based) DataContractSerializer. (In the same case the binary serializer was 10% faster). Now they are maybe? 1% regression and 5% improvement, but other things might be different since it is no r2r or pgo for local build)
Test failure appears to be unrelated. #64227 Getting back to these serializer PR's... I was going to suggest that the vectorization stuff would be better handled at the encoding layer. I am sure the folks watching over the encoding classes would welcome the kind of improvement your initial testing was showing. But I see that's already been updated. I would remove the sealed encoding classes. The calls to them generate 'callvirt's anyway, so there isn't really a performance win there as you've already noticed. |
/azp run runtime-community |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-community |
Azure Pipelines successfully started running 1 pipeline(s). |
Vectorizing of UTF8 was removed. This is just tweaking the hand-rolling that has already existed here for years.
Summary
Feedback wanted on implementation to choose
2a. the Vector256 version (in this PR)
* and if so what cutoff to use before calling Encoding.GetByteCount (1024 , 2048 ?)
2b. Just call Encoding.GetByteCount (it is faster always faster than current code)
2c. Vector version (with the risk of "AVX downclocking" on older intel hardware when AVX512 support is added)
Original UnsafeGetUTF8Chars benchmarks
UnsafeGetUTF8Chars benchmarks
Source: https://github.com/Daniel-Svensson/ClrExperiments/blob/master/BinaryXmlBenchmarks/ConsoleApp1/Utf8Benchmarks.cs
non ascii the speedup was up to ~3x when only a few characters were by not calling into encoding.GetBytes multiple times) so thoses measurements are not show below
The gains of AVX is somewhat less on older hardware
32bit results:
Original: UnsafeGetUTF8Length benchmarks: removed
Source: https://github.com/Daniel-Svensson/ClrExperiments/blob/master/BinaryXmlBenchmarks/ConsoleApp1/Utf8BenchmarksLength.cs
2023-03-26: Updated PR with vectorisation removed
For latest results se comment below