Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add methods to convert between hexadecimal strings and bytes #37546

Merged
merged 13 commits into from
Jul 18, 2020
Merged

Add methods to convert between hexadecimal strings and bytes #37546

merged 13 commits into from
Jul 18, 2020

Conversation

tkp1n
Copy link
Contributor

@tkp1n tkp1n commented Jun 6, 2020

Closes #17837

This adds two methods (with overloads) ToHexString and FromHexString to System.Convert.

As discussed in the API review and Twitter, the implementation is kept minimal and straight-forward with major optimizations left for a potential follow-up PR.

On HexConverter

I tried to re-use and extend the (internal) HexConverter for the actual implementation.
Both HexConverter.ToString and HexConverter.TryDecodeFromUtf16 could easily be optimized using SIMD, but this is intentionally left for a future PR to focus on getting the feature in first. HexConverter.ToString could be substantially improved using "only" scalar code (see "Numbers" below). I assume, however, that it is written the way it is (branch-free) for a reason.

Numbers

For those interested in performance numbers: I've created some benchmarks comparing different implementations here (including results)

Consolidation

Across the libraries, one can find various different classes and methods handling hex in more or less creative ways. This PR moves a lot of those to the implementations to HexConverter.

@Dotnet-GitSync-Bot
Copy link
Collaborator

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@vcsjones
Copy link
Member

vcsjones commented Jun 6, 2020

I missed the API review for this one, was there any discussion or thought about ignoring whitespace / newlines like the To/FromBase64String APIs?

@tkp1n
Copy link
Contributor Author

tkp1n commented Jun 6, 2020

I only recall the decision to only output uppercase/no-whitespace... Nothing about the FromHexString side of things... that's why I'm bringing it up.

Yes, the Base64 From*-methods accept whitespace (to some extent), but that's likely because the respective To*-methods produce whitespace in their output.

@stephentoub
Copy link
Member

uppercase only, no white-space

It should definitely parse lowercase as well.

@tkp1n
Copy link
Contributor Author

tkp1n commented Jun 7, 2020

It should definitely parse lowercase as well.

Done: 71ee4e1

@GSPP
Copy link

GSPP commented Jun 8, 2020

If the check ToBase64_CalculateAndValidateOutputLength is required for correctness then maybe it should test for the maximum string/array length which is less than int.MaxValue.

@tkp1n
Copy link
Contributor Author

tkp1n commented Jun 9, 2020

@GrabYourPitchforks Your thoughts on the topic of supporting white-space input in FromHexString would be highly appreciated.

Also, do you (or someone else for that matter) know who to ask for support regarding the crossgen build issue?

@vcsjones
Copy link
Member

vcsjones commented Jun 9, 2020

/cc @nattress this PR is failing because crossgen errors on build. Are you the right person to tag for help?

@tkp1n tkp1n requested a review from stephentoub June 22, 2020 14:02
@tkp1n
Copy link
Contributor Author

tkp1n commented Jun 30, 2020

@GrabYourPitchforks, @stephentoub: could you please take a look again and help push this across the finish line?

public static byte[] FromHexString(System.ReadOnlySpan<char> chars) { throw null; }
public static string ToHexString(byte[] inArray) { throw null; }
public static string ToHexString(byte[] inArray, int offset, int length) { throw null; }
public static string ToHexString(System.ReadOnlySpan<byte> bytes) { throw null; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any places in the dotnet/runtime libraries where we're currently doing these operations manually such that they could instead be switched to call these APIs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Most byte -> string conversions have already consolidated to use the internal HexConverter as part of Unify int to hexadecimal char conversions #1273. Some places could however profit from the new APIs (e.g. src\libraries\Common\tests\System\Security\Cryptography\ByteUtils.cs)
  • There are a few string -> byte conversions across the libraries. Some (e.g. ActivityTraceId.HexDigitToBinary) are, however, case sensitive...

Would you like to see a cleanup/consolidation as part of this PR or as a follow-up?

Copy link
Member

@stephentoub stephentoub Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question stems more from wanting to make sure the methods are actually useful (that's a question about the approved API design rather than about your work in the PR). If with all of the places we do hex conversions we don't actually have any places that would benefit from these APIs, it makes me question whether they're the right APIs to be adding. As such, I would like to see in this PR us switching call sites that could/should use them to do so, so that we can see that they are actually valuable and get it all done in one go.

cc: @bartonjs, @GrabYourPitchforks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, my ByteArrayToHex() methods could certainly be replaced with this... if all of my compilation targets were net5+ (I don't remember if I have tests that run on netfx anymore).

There's some hex parsing and hex producing code in the crypto libraries themselves... the parser ignores any non-hex data (at this point, for compat; historically, because... uh... clearly it was whitespace?) so probably can't change. The hex producers, possibly... if they're only used in net5 targets (the Pkcs and Xml libraries are netstandard OOBs, but I feel like the hex producing code is in S.S.C.Encoding, which is in the framework, so it might be able to change, unless it does weird things with whitespace).

Copy link
Contributor Author

@tkp1n tkp1n Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub An analysis of the usefulness of the new API for use within the dotnet/runtime libraries

Could profit from new API (ignoring potential issues with compilation targets):

src\libraries\Common\tests\System\Security\Cryptography\ByteUtils.cs:25
src\libraries\Common\tests\System\Security\Cryptography\ByteUtils.cs:53
src\libraries\System.DirectoryServices.AccountManagement\src\System\DirectoryServices\AccountManagement\Utils.cs:87
src\libraries\System.Drawing.Common\tests\mono\System.Drawing\BitmapTests.cs:393
src\libraries\System.Net.NetworkInformation\src\System\Net\NetworkInformation\PhysicalAddress.cs:93
src\libraries\System.Reflection.MetadataLoadContext\tests\src\TestUtils\TestUtils.cs:108
src\libraries\System.Reflection.MetadataLoadContext\tests\src\TestUtils\TestUtils.cs:121
src\libraries\System.Runtime\tests\System\Text\Unicode\Utf8Tests.cs:64
src\libraries\System.Security.Cryptography.Encoding\src\Internal\Cryptography\AsnFormatter.cs:15
src\libraries\System.Security.Cryptography.X509Certificates\src\Internal\Cryptography\Helpers.cs:36
src\libraries\System.Windows.Extensions\tests\X509Certificate2UIManualTests.cs:73

Could profit from making access to the lookup table in HexConverter public:

src\libraries\System.Security.Cryptography.X509Certificates\src\Internal\Cryptography\Helpers.cs:98

Cannot profit from the new methods due to limitations of the new API

src\libraries\Common\tests\System\Net\Http\LoopbackServer.AuthenticationHelpers.cs:287 - Encode lowercase
src\libraries\System.Diagnostics.DiagnosticSource\src\System\Diagnostics\Activity.cs - Encode lowercase
src\libraries\System.DirectoryServices.AccountManagement\src\System\DirectoryServices\AccountManagement\Utils.cs:75 - Encode lowercase
src\libraries\System.Private.CoreLib\src\System\ApplicationId.cs:67 - Decode to Span<char>
src\libraries\System.Runtime.Numerics\tests\BigInteger\MyBigInt.cs:903 - Decode lowercase, space-separated
src\libraries\System.Security.Cryptography.Encoding\src\Internal\Cryptography\AsnFormatter.cs:22 - Encode space-separated
src\libraries\System.Security.Cryptography.Pkcs\src\Internal\Cryptography\PkcsHelpers.cs:348 - Decode (enforce uppercase)
src\libraries\System.Security.Cryptography.X509Certificates\src\Internal\Cryptography\Helpers.cs:24 - Decode to Span<char>
src\libraries\System.Security.Cryptography.X509Certificates\src\Internal\Cryptography\Helpers.cs:41 - Decode white space

It could be that I missed a few additional places...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the analysis!

@bartonjs, @GrabYourPitchforks, we still believe the APIs approved and implemented publicly in this PR are the right ones?

Personally, I'd want to see at least at least a couple of existing call sites switched in this PR, or else I question the real worthwhileness of the particular shapes employed here. We already have BitConverter.ToString which outputs at least one pattern; this (at least the ToString direction) is just outputting another form. I question whether this form is so much more useful that it's worth adding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still believe the APIs approved and implemented publicly in this PR are the right ones?

I don't remember why we didn't also do the span-writing versions, but these are definitely better than (leads into)

We already have BitConverter.ToString which outputs at least one pattern; this (at least the ToString direction) is just outputting another form.

While I use that one in a pinch, I generally don't want the hyphens. The top hit on StackOverflow for "BitConverter.ToString" is

public static string ByteArrayToString(byte[] ba) { return BitConverter.ToString(ba).Replace("-",""); } 

Suggesting that the form here is what many (if not most) callers want. So I feel there's definite goodness.

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

@tkp1n
Copy link
Contributor Author

tkp1n commented Jul 1, 2020

@nattress sorry to bother you again with this PR...
Could you please have a look at the crossgen build error in this PR? Any help would be highly appreciated

@nattress
Copy link
Contributor

nattress commented Jul 1, 2020

Crossgen fails because it couldn't find a type while compiling System.Private.CoreLib:

Error compiling F:\workspace\_work\1\s\src\coreclr\..\..\artifacts\bin\coreclr\Windows_NT.x86.Checked\IL\System.Private.CoreLib.dll: Could not load type 'System.Collections.Generic.UInt32EnumComparer`1' from assembly 'System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e'.

That type was renamed two years ago (dotnet/coreclr@82e02f3) so it looks like we need to update

CorElementType et = elemTypeHnd.GetVerifierCorElementType();
accordingly. Not sure why this hasn't been a problem until now though.

@tkp1n
Copy link
Contributor Author

tkp1n commented Jul 1, 2020

@nattress

That type was renamed two years ago so it looks like we need to update SpecializeComparer accordingly

Thank you for your help. I attempted to fix this over at #38681

@tkp1n
Copy link
Contributor Author

tkp1n commented Jul 16, 2020

It appears to me as if the CI failures are not related to the changes in this PR. Please correct me if I'm wrong though!
So unless there is more review feedback this would be it from my side..

@stephentoub let me know if there is anything else I could do here and how you'd like to proceed with your concerns regarding the public API.

@bartonjs
Copy link
Member

On the one hand, I don't understand any of the test failures. On the other, they repeated on a rerun and I haven't seen them on other PRs. Maybe something internal had a weird hex parsing or formatting going on and it's causing a complicated test failure.

The only way I know to get a nice clean run is to do either push an empty commit or (more useful, probably) merge in the master branch and let it run with the new commit state.

@GrabYourPitchforks You work in this layer more than I do, do the test failures at https://dev.azure.com/dnceng/public/_build/results?buildId=730614&view=ms.vss-test-web.build-test-results-tab&runId=22724830&resultId=102547&paneView=debug make any sense to you?

@bartonjs bartonjs closed this Jul 17, 2020
@bartonjs bartonjs reopened this Jul 17, 2020
@bartonjs
Copy link
Member

/azp run runtime

@steveharter
Copy link
Member

/azp list

@steveharter
Copy link
Member

/azp run

@azure-pipelines
Copy link

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

@steveharter
Copy link
Member

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bartonjs
Copy link
Member

/azp run dotnet-linker-tests,dotnet-runtime-perf

@bartonjs
Copy link
Member

/azp run dotnet-linker-tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bartonjs
Copy link
Member

The failures look to be known: #39484.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add methods to convert between hexadecimal strings and bytes