-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine Bin/Hex parsing of BigInteger #95543
Refine Bin/Hex parsing of BigInteger #95543
Conversation
Tagging subscribers to this area: @dotnet/area-system-numerics Issue DetailsInstead of counting by digits, the new algorithm parses with uint blocks. It also uses vectorized hex converting for large numbers. Introduces a new reference of S.R.Intrinsics into S.R.Numerics. I think it's expected if we start to use more SIMD operations for BigInteger. Performance is measured on different sizes and corner cases, to ensure there's no regression on small values:
Please run outer loop test to ensure more coverage of parsing.
|
static virtual bool TryParseSingleBlock(ReadOnlySpan<TChar> input, out uint result) | ||
=> TParsingInfo.TryParseUnalignedBlock(input, out result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leaves the space for vectorized Vector128<char>
-> uint
conversion. This may or may not be necessary if such optimization is done in uint side.
if (Convert.FromHexString(MemoryMarshal.Cast<TChar, char>(input), MemoryMarshal.AsBytes(destiniation), out _, out _) != OperationStatus.Done) | ||
{ | ||
return false; | ||
} | ||
|
||
if (BitConverter.IsLittleEndian) | ||
{ | ||
MemoryMarshal.AsBytes(destiniation).Reverse(); | ||
} | ||
else | ||
{ | ||
destiniation.Reverse(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be improved if there's a vectorized path that parses in reverse byte order. Performance for huge numbers should be improved, however for the most interesting 96-256bit cases, I'd expect the performance comparison to be very complicated. Thus I'm choosing the simplest approach to depend on public API.
Convert.FromHexString has a slight overhead over HexConverter, but the latter is only vectorized in CoreLib.
@@ -1342,7 +1342,7 @@ static virtual bool TryParseWholeBlocks(ReadOnlySpan<TChar> input, Span<uint> de | |||
Debug.Assert(destiniation.Length * TParser.DigitsPerBlock == input.Length); | |||
ref TChar lastWholeBlockStart = ref Unsafe.Add(ref MemoryMarshal.GetReference(input), input.Length - TParser.DigitsPerBlock); | |||
|
|||
for (int i = 0; i < destiniation.Length - 1; i++) | |||
for (int i = 0; i < destiniation.Length; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not something to handle in this PR, but I noticed this is destiniation
not destination
😆
(we can handle it separately after this goes in to avoid making it harder to review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, never mind, this is net new code and I was looking at the wrong diff view
It'd be great to fix the minor type in this PR then.
Vector512<uint> vector = Vector512.LoadUnsafe(ref start, (nuint)offset); | ||
Vector512<uint> complement = Vector512.OnesComplement(vector); | ||
Vector512.StoreUnsafe(complement, ref start, (nuint)offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vector512<uint> vector = Vector512.LoadUnsafe(ref start, (nuint)offset); | |
Vector512<uint> complement = Vector512.OnesComplement(vector); | |
Vector512.StoreUnsafe(complement, ref start, (nuint)offset); | |
Vector512<uint> vector = ~Vector512.LoadUnsafe(ref start, (nuint)offset); | |
vector.StoreUnsafe(ref start, (nuint)offset); |
offset += Vector256<uint>.Count; | ||
} | ||
|
||
while (Vector128.IsHardwareAccelerated && d.Length - offset >= Vector128<uint>.Count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here is correct, but it's also slightly pessimized as it's going to hit multiple mispredicted branches due to the loops and for small payloads.
We could easily get extra perf by refactoring it to be done a bit differently. That can always be done separately, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe TensorPrimitives can provide optimized code for this pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is TensorPrimitives.OnesComplement
. Do you think S.R.Numerics should start to take dependency on TensorPrimitives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it has to wait at least until Tensor Primitives is in box
CC. @stephentoub, could you give this a secondary review since you helped with the primitive integer parsing logic as well? |
#95402 also touches the generic parser related pattern for BigInteger. Could you provide some insights as well? Thanks! |
Convert to draft to do more performance improvements. |
Fix Bin/Hex parsing of BigInteger for powers of 2
Latest performance numbers:
I'm not experienced about branch tuning. I think this is all what I can do now. |
The test failure looks unrelated now. |
blockCount = Math.DivRem(totalDigitCount, DigitsPerBlock, out int remainder); | ||
if (remainder == 0) | ||
uint leading = signBits; | ||
// First parse unanligned leading block if exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// First parse unanligned leading block if exists. | |
// First parse unaligned leading block if exists. |
Instead of counting by digits, the new algorithm parses with uint blocks. It also uses vectorized hex converting for large numbers.
Introduces a new reference of S.R.Intrinsics into S.R.Numerics. I think it's expected if we start to use more SIMD operations for BigInteger.
Performance is measured on different sizes and corner cases, to ensure there's no regression on small values:
Please run outer loop test to ensure more coverage of parsing.