-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve WebUtility.HtmlEncode / UrlEncode performance #103737
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Member
stephentoub
commented
Jun 19, 2024
- For HtmlEncode, vectorize IndexOfHtmlEncodingChars. Using SearchValues, we can efficiently search for the first ASCII encoding char or the first non-ASCII char, and only then fall back to a scalar loop.
- For HtmlEncode, reduce branching by using a more efficient check to determine whether the ASCII characters need to be encoded.
- For UrlEncode, rather than UTF8-encoding into a new byte[], %-encoding in-place in that, and then creating a string from that, we can use string.Create and just do all the encoding in that buffer.
- For UrlEncode, use SearchValues to vectorize the search for the first non-safe char. Also move the check for ' ' to be inside the if for non-safe char.
- For UrlEncode, use SearchValues to optimize the check for whether an individual character is part of the set (via Contains).
- Simplify IsUrlSafeChar. Rather than multiple checks, one of which is a bitmap, just have a bitmap.
- Remove some leading IsNullOrEmpty checks. Null/empty inputs should be rare, and they're now handled implicitly as part of the subsequent loops.
Method | Toolchain | input | Mean | Ratio | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|
HtmlEncode | \main\corerun.exe | <test(...)test> [25] | 81.576 ns | 1.00 | 96 B | 1.00 |
HtmlEncode | \pr\corerun.exe | <test(...)test> [25] | 80.366 ns | 0.97 | 96 B | 1.00 |
UrlEncode | \main\corerun.exe | <test(...)test> [25] | 118.515 ns | 1.00 | 160 B | 1.00 |
UrlEncode | \pr\corerun.exe | <test(...)test> [25] | 103.039 ns | 0.87 | 96 B | 0.60 |
HtmlEncode | \main\corerun.exe | How (...)ood. [185] | 164.359 ns | 1.00 | - | NA |
HtmlEncode | \pr\corerun.exe | How (...)ood. [185] | 10.142 ns | 0.06 | - | NA |
UrlEncode | \main\corerun.exe | How (...)ood. [185] | 537.945 ns | 1.00 | 664 B | 1.00 |
UrlEncode | \pr\corerun.exe | How (...)ood. [185] | 466.832 ns | 0.86 | 432 B | 0.65 |
HtmlEncode | \main\corerun.exe | https(...)e.com [23] | 21.442 ns | 1.00 | - | NA |
HtmlEncode | \pr\corerun.exe | https(...)e.com [23] | 4.732 ns | 0.22 | - | NA |
UrlEncode | \main\corerun.exe | https(...)e.com [23] | 103.053 ns | 1.00 | 136 B | 1.00 |
UrlEncode | \pr\corerun.exe | https(...)e.com [23] | 76.013 ns | 0.73 | 80 B | 0.59 |
HtmlEncode | \main\corerun.exe | short_name.txt | 13.068 ns | 1.00 | - | NA |
HtmlEncode | \pr\corerun.exe | short_name.txt | 4.876 ns | 0.38 | - | NA |
UrlEncode | \main\corerun.exe | short_name.txt | 14.324 ns | 1.00 | - | NA |
UrlEncode | \pr\corerun.exe | short_name.txt | 3.600 ns | 0.26 | - | NA |
HtmlEncode | \main\corerun.exe | this-(...)g.jpg [75] | 62.800 ns | 1.00 | - | NA |
HtmlEncode | \pr\corerun.exe | this-(...)g.jpg [75] | 6.975 ns | 0.11 | - | NA |
UrlEncode | \main\corerun.exe | this-(...)g.jpg [75] | 70.709 ns | 1.00 | - | NA |
UrlEncode | \pr\corerun.exe | this-(...)g.jpg [75] | 5.611 ns | 0.08 | - | NA |
HtmlEncode | \main\corerun.exe | לילה טוב | 8.436 ns | 1.00 | - | NA |
HtmlEncode | \pr\corerun.exe | לילה טוב | 11.427 ns | 1.36 | - | NA |
UrlEncode | \main\corerun.exe | לילה טוב | 100.852 ns | 1.00 | 184 B | 1.00 |
UrlEncode | \pr\corerun.exe | לילה טוב | 74.544 ns | 0.74 | 112 B | 0.61 |
- For HtmlEncode, vectorize IndexOfHtmlEncodingChars. Using SearchValues, we can efficiently search for the first ASCII encoding char or the first non-ASCII char, and only then fall back to a scalar loop. - For HtmlEncode, reduce branching by using a more efficient check to determine whether the ASCII characters need to be encoded. - For UrlEncode, rather than UTF8-encoding into a new byte[], %-encoding in-place in that, and then creating a string from that, we can use string.Create and just do all the encoding in that buffer. - For UrlEncode, use SearchValues to vectorize the search for the first non-safe char. Also move the check for ' ' to be inside the if for non-safe char. - For UrlEncode, use SearchValues to optimize the check for whether an individual character is part of the set (via Contains). - Simplify IsUrlSafeChar. Rather than multiple checks, one of which is a bitmap, just have a bitmap. - Remove some leading IsNullOrEmpty checks. Null/empty inputs should be rare, and they're now handled implicitly as part of the subsequent loops.
Tagging subscribers to this area: @dotnet/ncl |
This was referenced Jun 20, 2024
Open
MihaZupan
approved these changes
Jun 20, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always love to see SearchValues ratios like these
Method | Toolchain | input | Mean | Ratio | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|
HtmlEncode | \main\corerun.exe | How (...)ood. [185] | 164.359 ns | 1.00 | - | NA |
HtmlEncode | \pr\corerun.exe | How (...)ood. [185] | 10.142 ns | 0.06 | - | NA |
src/libraries/System.Private.CoreLib/src/System/Net/WebUtility.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>
rzikm
pushed a commit
to rzikm/dotnet-runtime
that referenced
this pull request
Jun 24, 2024
* Improve WebUtility.HtmlEncode / UrlEncode performance - For HtmlEncode, vectorize IndexOfHtmlEncodingChars. Using SearchValues, we can efficiently search for the first ASCII encoding char or the first non-ASCII char, and only then fall back to a scalar loop. - For HtmlEncode, reduce branching by using a more efficient check to determine whether the ASCII characters need to be encoded. - For UrlEncode, rather than UTF8-encoding into a new byte[], %-encoding in-place in that, and then creating a string from that, we can use string.Create and just do all the encoding in that buffer. - For UrlEncode, use SearchValues to vectorize the search for the first non-safe char. Also move the check for ' ' to be inside the if for non-safe char. - For UrlEncode, use SearchValues to optimize the check for whether an individual character is part of the set (via Contains). - Simplify IsUrlSafeChar. Rather than multiple checks, one of which is a bitmap, just have a bitmap. - Remove some leading IsNullOrEmpty checks. Null/empty inputs should be rare, and they're now handled implicitly as part of the subsequent loops. * Update src/libraries/System.Private.CoreLib/src/System/Net/WebUtility.cs Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com> --------- Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.