Refine Bin/Hex parsing of BigInteger #95543

huoyaoyuan · 2023-12-02T10:36:16Z

Instead of counting by digits, the new algorithm parses with uint blocks. It also uses vectorized hex converting for large numbers.

Introduces a new reference of S.R.Intrinsics into S.R.Numerics. I think it's expected if we start to use more SIMD operations for BigInteger.

Performance is measured on different sizes and corner cases, to ensure there's no regression on small values:

Method	Job	Toolchain	input	Mean	Error	StdDev	Ratio
ParseHex	Job-FGNQZB	\1-main\corerun.exe	123	15.59 ns	0.133 ns	0.118 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	123	10.67 ns	0.048 ns	0.045 ns	0.68

ParseHex	Job-FGNQZB	\1-main\corerun.exe	123456789	19.80 ns	0.158 ns	0.148 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	123456789	17.39 ns	0.098 ns	0.082 ns	0.88

ParseHex	Job-FGNQZB	\1-main\corerun.exe	1234567890ABCDEF	25.11 ns	0.208 ns	0.195 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	1234567890ABCDEF	20.42 ns	0.164 ns	0.153 ns	0.81

ParseHex	Job-FGNQZB	\1-main\corerun.exe	12345(...)45678 [24]	31.16 ns	0.146 ns	0.122 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	12345(...)45678 [24]	19.53 ns	0.233 ns	0.194 ns	0.63

ParseHex	Job-FGNQZB	\1-main\corerun.exe	1234(...)CDEF [315]	274.36 ns	3.454 ns	3.062 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	1234(...)CDEF [315]	41.58 ns	0.367 ns	0.307 ns	0.15

ParseHex	Job-FGNQZB	\1-main\corerun.exe	80000000	19.08 ns	0.092 ns	0.077 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	80000000	14.83 ns	0.198 ns	0.176 ns	0.78

ParseHex	Job-FGNQZB	\1-main\corerun.exe	FEDCBA9876543210	26.67 ns	0.180 ns	0.160 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	FEDCBA9876543210	21.57 ns	0.109 ns	0.091 ns	0.81

ParseHex	Job-FGNQZB	\1-main\corerun.exe	FEDC(...)4321 [315]	281.83 ns	2.357 ns	2.205 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	FEDC(...)4321 [315]	46.24 ns	0.912 ns	0.976 ns	0.16

ParseHex	Job-FGNQZB	\1-main\corerun.exe	FFFE00000	21.26 ns	0.162 ns	0.144 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	FFFE00000	16.44 ns	0.060 ns	0.056 ns	0.77

Please run outer loop test to ensure more coverage of parsing.

ghost · 2023-12-02T10:36:30Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

Instead of counting by digits, the new algorithm parses with uint blocks. It also uses vectorized hex converting for large numbers.

Introduces a new reference of S.R.Intrinsics into S.R.Numerics. I think it's expected if we start to use more SIMD operations for BigInteger.

Performance is measured on different sizes and corner cases, to ensure there's no regression on small values:

Method	Job	Toolchain	input	Mean	Error	StdDev	Ratio
ParseHex	Job-FGNQZB	\1-main\corerun.exe	123	15.59 ns	0.133 ns	0.118 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	123	10.67 ns	0.048 ns	0.045 ns	0.68

ParseHex	Job-FGNQZB	\1-main\corerun.exe	123456789	19.80 ns	0.158 ns	0.148 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	123456789	17.39 ns	0.098 ns	0.082 ns	0.88

ParseHex	Job-FGNQZB	\1-main\corerun.exe	1234567890ABCDEF	25.11 ns	0.208 ns	0.195 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	1234567890ABCDEF	20.42 ns	0.164 ns	0.153 ns	0.81

ParseHex	Job-FGNQZB	\1-main\corerun.exe	12345(...)45678 [24]	31.16 ns	0.146 ns	0.122 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	12345(...)45678 [24]	19.53 ns	0.233 ns	0.194 ns	0.63

ParseHex	Job-FGNQZB	\1-main\corerun.exe	1234(...)CDEF [315]	274.36 ns	3.454 ns	3.062 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	1234(...)CDEF [315]	41.58 ns	0.367 ns	0.307 ns	0.15

ParseHex	Job-FGNQZB	\1-main\corerun.exe	80000000	19.08 ns	0.092 ns	0.077 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	80000000	14.83 ns	0.198 ns	0.176 ns	0.78

ParseHex	Job-FGNQZB	\1-main\corerun.exe	FEDCBA9876543210	26.67 ns	0.180 ns	0.160 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	FEDCBA9876543210	21.57 ns	0.109 ns	0.091 ns	0.81

ParseHex	Job-FGNQZB	\1-main\corerun.exe	FEDC(...)4321 [315]	281.83 ns	2.357 ns	2.205 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	FEDC(...)4321 [315]	46.24 ns	0.912 ns	0.976 ns	0.16

ParseHex	Job-FGNQZB	\1-main\corerun.exe	FFFE00000	21.26 ns	0.162 ns	0.144 ns	1.00
ParseHex	Job-AZXDDT	\5-vector-generic\corerun.exe	FFFE00000	16.44 ns	0.060 ns	0.056 ns	0.77

Please run outer loop test to ensure more coverage of parsing.

Author:	huoyaoyuan
Assignees:	-
Labels:	`area-System.Numerics`, `community-contribution`
Milestone:	-

huoyaoyuan · 2023-12-02T10:39:13Z

src/libraries/System.Runtime.Numerics/src/System/Number.BigInteger.cs

+        static virtual bool TryParseSingleBlock(ReadOnlySpan<TChar> input, out uint result)
+            => TParsingInfo.TryParseUnalignedBlock(input, out result);


This leaves the space for vectorized Vector128<char> -> uint conversion. This may or may not be necessary if such optimization is done in uint side.

huoyaoyuan · 2023-12-02T10:42:29Z

src/libraries/System.Runtime.Numerics/src/System/Number.BigInteger.cs

+                if (Convert.FromHexString(MemoryMarshal.Cast<TChar, char>(input), MemoryMarshal.AsBytes(destiniation), out _, out _) != OperationStatus.Done)
+                {
+                    return false;
+                }
+
+                if (BitConverter.IsLittleEndian)
+                {
+                    MemoryMarshal.AsBytes(destiniation).Reverse();
+                }
+                else
+                {
+                    destiniation.Reverse();
+                }


This can be improved if there's a vectorized path that parses in reverse byte order. Performance for huge numbers should be improved, however for the most interesting 96-256bit cases, I'd expect the performance comparison to be very complicated. Thus I'm choosing the simplest approach to depend on public API.

Convert.FromHexString has a slight overhead over HexConverter, but the latter is only vectorized in CoreLib.

src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs

tannergooding · 2024-01-12T18:22:24Z

src/libraries/System.Runtime.Numerics/src/System/Number.BigInteger.cs

@@ -1342,7 +1342,7 @@ static virtual bool TryParseWholeBlocks(ReadOnlySpan<TChar> input, Span<uint> de
            Debug.Assert(destiniation.Length * TParser.DigitsPerBlock == input.Length);
            ref TChar lastWholeBlockStart = ref Unsafe.Add(ref MemoryMarshal.GetReference(input), input.Length - TParser.DigitsPerBlock);

-            for (int i = 0; i < destiniation.Length - 1; i++)
+            for (int i = 0; i < destiniation.Length; i++)


not something to handle in this PR, but I noticed this is destiniation not destination 😆
(we can handle it separately after this goes in to avoid making it harder to review)

oh, never mind, this is net new code and I was looking at the wrong diff view

It'd be great to fix the minor type in this PR then.

tannergooding · 2024-01-12T18:28:56Z

src/libraries/System.Runtime.Numerics/src/System/Numerics/NumericsHelpers.cs

+                Vector512<uint> vector = Vector512.LoadUnsafe(ref start, (nuint)offset);
+                Vector512<uint> complement = Vector512.OnesComplement(vector);
+                Vector512.StoreUnsafe(complement, ref start, (nuint)offset);


Suggested change

Vector512<uint> vector = Vector512.LoadUnsafe(ref start, (nuint)offset);

Vector512<uint> complement = Vector512.OnesComplement(vector);

Vector512.StoreUnsafe(complement, ref start, (nuint)offset);

Vector512<uint> vector = ~Vector512.LoadUnsafe(ref start, (nuint)offset);

vector.StoreUnsafe(ref start, (nuint)offset);

tannergooding · 2024-01-12T18:30:10Z

src/libraries/System.Runtime.Numerics/src/System/Numerics/NumericsHelpers.cs

+                offset += Vector256<uint>.Count;
+            }
+
+            while (Vector128.IsHardwareAccelerated && d.Length - offset >= Vector128<uint>.Count)


The code here is correct, but it's also slightly pessimized as it's going to hit multiple mispredicted branches due to the loops and for small payloads.

We could easily get extra perf by refactoring it to be done a bit differently. That can always be done separately, however.

Maybe TensorPrimitives can provide optimized code for this pattern?

There is TensorPrimitives.OnesComplement. Do you think S.R.Numerics should start to take dependency on TensorPrimitives?

it has to wait at least until Tensor Primitives is in box

tannergooding · 2024-01-12T18:30:50Z

CC. @stephentoub, could you give this a secondary review since you helped with the primitive integer parsing logic as well?

huoyaoyuan · 2024-01-13T10:04:25Z

#95402 also touches the generic parser related pattern for BigInteger. Could you provide some insights as well? Thanks!

huoyaoyuan · 2024-02-01T10:16:42Z

Convert to draft to do more performance improvements.

Fix Bin/Hex parsing of BigInteger for powers of 2

huoyaoyuan · 2024-02-02T17:26:54Z

Latest performance numbers:

Method	Job	Toolchain	input	Mean	Error	StdDev	Ratio
Parse	Job-AJQWKE	\PR\corerun.exe	123	11.49 ns	0.052 ns	0.046 ns	0.88
Parse	Job-FEBANE	\main\corerun.exe	123	13.05 ns	0.081 ns	0.076 ns	1.00

Parse	Job-AJQWKE	\PR\corerun.exe	123456789	17.91 ns	0.118 ns	0.110 ns	0.89
Parse	Job-FEBANE	\main\corerun.exe	123456789	20.04 ns	0.117 ns	0.109 ns	1.00

Parse	Job-AJQWKE	\PR\corerun.exe	1234567890ABCDEF	21.18 ns	0.097 ns	0.091 ns	0.86
Parse	Job-FEBANE	\main\corerun.exe	1234567890ABCDEF	24.69 ns	0.183 ns	0.171 ns	1.00

Parse	Job-AJQWKE	\PR\corerun.exe	12345(...)23456 [22]	20.15 ns	0.128 ns	0.120 ns	0.67
Parse	Job-FEBANE	\main\corerun.exe	12345(...)23456 [22]	30.04 ns	0.229 ns	0.203 ns	1.00

Parse	Job-AJQWKE	\PR\corerun.exe	80000000	13.76 ns	0.039 ns	0.035 ns	0.70
Parse	Job-FEBANE	\main\corerun.exe	80000000	19.78 ns	0.132 ns	0.123 ns	1.00

Parse	Job-AJQWKE	\PR\corerun.exe	FEDCBA9876543210	23.59 ns	0.102 ns	0.090 ns	0.92
Parse	Job-FEBANE	\main\corerun.exe	FEDCBA9876543210	25.56 ns	0.093 ns	0.087 ns	1.00

Parse	Job-AJQWKE	\PR\corerun.exe	FFFE00000	18.34 ns	0.092 ns	0.082 ns	0.94
Parse	Job-FEBANE	\main\corerun.exe	FFFE00000	19.44 ns	0.131 ns	0.123 ns	1.00

I'm not experienced about branch tuning. I think this is all what I can do now.

huoyaoyuan · 2024-02-03T06:48:05Z

The test failure looks unrelated now.

tannergooding · 2024-02-03T17:48:28Z

src/libraries/System.Runtime.Numerics/src/System/Number.BigInteger.cs

-            blockCount = Math.DivRem(totalDigitCount, DigitsPerBlock, out int remainder);
-            if (remainder == 0)
+            uint leading = signBits;
+            // First parse unanligned leading block if exists.


Suggested change

// First parse unanligned leading block if exists.

// First parse unaligned leading block if exists.

huoyaoyuan added 10 commits December 2, 2023 18:18

Introduce intrinsics to S.R.Numerics

1e88ab3

Vectorize DangerousMakeTwosComplement

45d1939

Rewrite hex parsing

12011fa

Port to binary parsing

5a7eca8

Move NumericsHelper processing

a533953

Use vectorization

499973c

Rewrite

9081d39

Port to binary

66d9d5d

Add regression test for binary

bdeaf99

Unify bin and hex

10351a6

ghost added the community-contribution Indicates that the PR has been added by a community member label Dec 2, 2023

dotnet-issue-labeler bot added the area-System.Numerics label Dec 2, 2023

huoyaoyuan commented Dec 2, 2023

View reviewed changes

Use TParser to follow corelib naming

a3edd54

build-analysis bot mentioned this pull request Dec 2, 2023

Crash in Microsoft.Extensions.Logging.Generators.Roslyn4.0.Tests.WorkItemExecution #90019

Open

huoyaoyuan mentioned this pull request Jan 10, 2024

Fix assertion at System.Numerics.BigInteger.Parse #96746

Closed

kzrnm reviewed Jan 10, 2024

View reviewed changes

src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs Show resolved Hide resolved

Fix whole block counting

5d1fd71

This was referenced Jan 11, 2024

Checkout failure: "Git fetch failed with exit code 128" dotnet/arcade#9009

Open

Tracking issue for CI build timeouts #76454

Closed

tannergooding reviewed Jan 12, 2024

View reviewed changes

tannergooding approved these changes Jan 12, 2024

View reviewed changes

huoyaoyuan added 3 commits January 13, 2024 17:48

Merge branch 'main'

beecb31

Fix typo

08c214c

Simplify vector operation

c372811

Fix parsing power of 2

d3a2569

kzrnm mentioned this pull request Jan 25, 2024

Fix Bin/Hex parsing of BigInteger for powers of 2 huoyaoyuan/runtime#5

Merged

Merge branch 'main' into biginteger-hex-vectorize

346140a

This was referenced Jan 29, 2024

[wasi] System.Globalization.Tests AOT build fails with LLVM ERROR: out of memory on Windows #95365

Closed

Hitting OOM in LLVM for the wasi-wasm aot smoke tests #96630

Closed

huoyaoyuan marked this pull request as draft February 1, 2024 10:16

huoyaoyuan added 4 commits February 1, 2024 18:16

Merge pull request #5 from kzrnm/fix-bigint-hex-bin

07359c5

Fix Bin/Hex parsing of BigInteger for powers of 2

Optimize the small value case

883512b

Use ContainsAnyExcept

ecfac7a

Merge branch 'main' into biginteger-hex-vectorize

4cd1419

build-analysis bot mentioned this pull request Feb 1, 2024

Intermittent build failure in AfterSourceBuild: "Could not write state file" #76488

Open

Fix missing return

6a72ae0

build-analysis bot mentioned this pull request Feb 1, 2024

Failed USB connection via port 54050, error 61, in tvOS arm64 Release AllSubsets_Mono #82637

Open

huoyaoyuan marked this pull request as ready for review February 2, 2024 17:27

Fix bit hack

fd74870

tannergooding reviewed Feb 3, 2024

View reviewed changes

tannergooding approved these changes Feb 3, 2024

View reviewed changes

tannergooding merged commit 7901202 into dotnet:main Feb 3, 2024
108 of 111 checks passed

huoyaoyuan deleted the biginteger-hex-vectorize branch February 4, 2024 02:48

kzrnm mentioned this pull request Feb 5, 2024

Fix TryParseBigIntegerHexOrBinaryNumberStyle #97995

Merged

huoyaoyuan mentioned this pull request Feb 27, 2024

[outerloop] System.Runtime.Numerics BigInteger assert failure #98966

Closed

github-actions bot locked and limited conversation to collaborators Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine Bin/Hex parsing of BigInteger #95543

Refine Bin/Hex parsing of BigInteger #95543

huoyaoyuan commented Dec 2, 2023

ghost commented Dec 2, 2023

huoyaoyuan Dec 2, 2023

huoyaoyuan Dec 2, 2023

tannergooding Jan 12, 2024

tannergooding Jan 12, 2024

tannergooding Jan 12, 2024

tannergooding Jan 12, 2024

huoyaoyuan Jan 13, 2024

huoyaoyuan Feb 2, 2024

tannergooding Feb 2, 2024

tannergooding commented Jan 12, 2024

huoyaoyuan commented Jan 13, 2024

huoyaoyuan commented Feb 1, 2024

huoyaoyuan commented Feb 2, 2024

huoyaoyuan commented Feb 3, 2024

tannergooding Feb 3, 2024

		static virtual bool TryParseSingleBlock(ReadOnlySpan<TChar> input, out uint result)
		=> TParsingInfo.TryParseUnalignedBlock(input, out result);

	// First parse unanligned leading block if exists.
	// First parse unaligned leading block if exists.

Refine Bin/Hex parsing of BigInteger #95543

Refine Bin/Hex parsing of BigInteger #95543

Conversation

huoyaoyuan commented Dec 2, 2023

ghost commented Dec 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Jan 12, 2024

huoyaoyuan commented Jan 13, 2024

huoyaoyuan commented Feb 1, 2024

huoyaoyuan commented Feb 2, 2024

huoyaoyuan commented Feb 3, 2024

Choose a reason for hiding this comment