Improve {u}int/long.ToString/TryFormat throughput by pre-computing the length #17432

stephentoub · 2018-04-05T03:57:54Z

The first commit just moves the Count{Hex}Digits methods from https://github.com/dotnet/corefx/blob/master/src/System.Memory/src/System/Buffers/Text/Utf8Formatter/FormattingHelpers.cs into a partial FormattingHelpers.CountDigits.cs file in the shared partition. Once those changes replicate to corefx, I'll dedup the code there.

The second commit then uses Count{Hex}Digits in the ToString and TryFormat methods of int, uint, long, and ulong, in particular for the default D format (and some G configurations) as well as the X format. Currently we create a temporary buffer on the stack, format into it, and then copy from that stack buffer into either the target span (for TryFormat) or into a new string (for ToString. Following the approach (and sharing the same code) from Utf8Formatter, where it first counts the number of digits in the output in order to determine an exact length, this commit changes the implementation to skip the temporary buffer and just format directly into the destination span or string.

Contributes to https://github.com/dotnet/coreclr/issues/15364
cc: @jkotas, @ahsonkhan, @danmosemsft

System.Runtime.Performance.Tests.dll	Before	After	Diff
System.Tests.Perf_Int32.ToString(value: 0)	12.75	10.85	1.18x
System.Tests.Perf_Int32.ToString(value: 1)	12.89	10.87	1.19x
System.Tests.Perf_Int32.ToString(value: -1)	21.77	18.36	1.19x
System.Tests.Perf_Int32.ToString(value: 1283)	13.80	12.62	1.09x
System.Tests.Perf_Int32.ToString(value: -1283)	24.29	19.91	1.22x
System.Tests.Perf_Int32.ToString(value: 12837467)	16.09	15.29	1.05x
System.Tests.Perf_Int32.ToString(value: -12837467)	28.17	23.59	1.19x
System.Tests.Perf_Int32.ToString(value: 2147483647)	17.83	17.17	1.04x
System.Tests.Perf_Int32.ToString(value: -2147483648)	28.93	24.75	1.17x
System.Tests.Perf_Int32.TryFormat(value: 0)	13.46	7.35	1.83x
System.Tests.Perf_Int32.TryFormat(value: 1)	13.57	7.38	1.84x
System.Tests.Perf_Int32.TryFormat(value: -1)	23.64	14.33	1.65x
System.Tests.Perf_Int32.TryFormat(value: 1283)	14.29	9.11	1.57x
System.Tests.Perf_Int32.TryFormat(value: -1283)	25.64	15.95	1.61x
System.Tests.Perf_Int32.TryFormat(value: 12837467)	16.26	11.74	1.39x
System.Tests.Perf_Int32.TryFormat(value: -12837467)	28.19	18.55	1.52x
System.Tests.Perf_Int32.TryFormat(value: 2147483647)	17.55	13.46	1.30x
System.Tests.Perf_Int32.TryFormat(value: -2147483648)	29.63	20.01	1.48x
System.Tests.Perf_Int64.ToString(value: 0)	16.11	11.52	1.40x
System.Tests.Perf_Int64.ToString(value: 2)	15.60	11.72	1.33x
System.Tests.Perf_Int64.ToString(value: -2)	23.14	18.62	1.24x
System.Tests.Perf_Int64.ToString(value: 21)	15.58	12.08	1.29x
System.Tests.Perf_Int64.ToString(value: -21)	23.05	18.74	1.23x
System.Tests.Perf_Int64.ToString(value: 214)	16.41	12.44	1.32x
System.Tests.Perf_Int64.ToString(value: -214)	24.77	19.18	1.29x
System.Tests.Perf_Int64.ToString(value: 2147)	17.20	13.40	1.28x
System.Tests.Perf_Int64.ToString(value: -2147)	24.74	20.36	1.22x
System.Tests.Perf_Int64.ToString(value: 21474)	17.78	13.99	1.27x
System.Tests.Perf_Int64.ToString(value: -21474)	26.44	20.80	1.27x
System.Tests.Perf_Int64.ToString(value: 214748)	19.29	14.63	1.32x
System.Tests.Perf_Int64.ToString(value: -214748)	26.31	21.64	1.22x
System.Tests.Perf_Int64.ToString(value: 2147483)	18.58	15.38	1.21x
System.Tests.Perf_Int64.ToString(value: -2147483)	27.34	22.50	1.22x
System.Tests.Perf_Int64.ToString(value: 21474836)	19.92	15.91	1.25x
System.Tests.Perf_Int64.ToString(value: -21474836)	28.94	23.57	1.23x
System.Tests.Perf_Int64.ToString(value: 214748364)	21.16	16.96	1.25x
System.Tests.Perf_Int64.ToString(value: -214748364)	29.22	24.02	1.22x
System.Tests.Perf_Int64.ToString(value: 2147483647)	20.91	17.52	1.19x
System.Tests.Perf_Int64.ToString(value: -2147483648)	29.79	24.91	1.20x
System.Tests.Perf_Int64.ToString(value: 4294967295000000000)	28.11	25.64	1.10x
System.Tests.Perf_Int64.ToString(value: -4294967295000000000)	38.00	32.82	1.16x
System.Tests.Perf_Int64.ToString(value: 4294967295000000001)	28.29	25.37	1.12x
System.Tests.Perf_Int64.ToString(value: -4294967295000000001)	37.98	32.92	1.15x
System.Tests.Perf_Int64.ToString(value: 92233720368)	23.49	19.77	1.19x
System.Tests.Perf_Int64.ToString(value: -92233720368)	32.46	26.81	1.21x
System.Tests.Perf_Int64.ToString(value: 922337203685)	24.38	20.55	1.19x
System.Tests.Perf_Int64.ToString(value: -922337203685)	33.12	28.09	1.18x
System.Tests.Perf_Int64.ToString(value: 9223372036854)	24.76	21.66	1.14x
System.Tests.Perf_Int64.ToString(value: -9223372036854)	33.59	28.63	1.17x
System.Tests.Perf_Int64.ToString(value: 92233720368547)	25.18	22.44	1.12x
System.Tests.Perf_Int64.ToString(value: -92233720368547)	34.20	28.82	1.19x
System.Tests.Perf_Int64.ToString(value: 922337203685477)	25.86	21.90	1.18x
System.Tests.Perf_Int64.ToString(value: -922337203685477)	35.22	28.95	1.22x
System.Tests.Perf_Int64.ToString(value: 9223372036854775)	26.15	22.77	1.15x
System.Tests.Perf_Int64.ToString(value: -9223372036854775)	35.98	29.56	1.22x
System.Tests.Perf_Int64.ToString(value: 92233720368547758)	26.96	23.60	1.14x
System.Tests.Perf_Int64.ToString(value: -92233720368547758)	37.16	30.31	1.23x
System.Tests.Perf_Int64.ToString(value: 922337203685477580)	27.66	24.16	1.14x
System.Tests.Perf_Int64.ToString(value: -922337203685477580)	37.60	30.96	1.21x
System.Tests.Perf_Int64.ToString(value: 9223372036854775807)	30.28	26.95	1.12x
System.Tests.Perf_Int64.ToString(value: -9223372036854775808)	41.47	33.76	1.23x
System.Tests.Perf_Int64.TryFormat(value: 0)	16.67	8.01	2.08x
System.Tests.Perf_Int64.TryFormat(value: 2)	15.28	8.07	1.89x
System.Tests.Perf_Int64.TryFormat(value: -2)	24.01	14.63	1.64x
System.Tests.Perf_Int64.TryFormat(value: 21)	16.00	9.15	1.75x
System.Tests.Perf_Int64.TryFormat(value: -21)	24.82	14.96	1.66x
System.Tests.Perf_Int64.TryFormat(value: 214)	16.27	9.81	1.66x
System.Tests.Perf_Int64.TryFormat(value: -214)	25.33	15.91	1.59x
System.Tests.Perf_Int64.TryFormat(value: 2147)	16.65	10.52	1.58x
System.Tests.Perf_Int64.TryFormat(value: -2147)	26.83	16.42	1.63x
System.Tests.Perf_Int64.TryFormat(value: 21474)	17.45	11.13	1.57x
System.Tests.Perf_Int64.TryFormat(value: -21474)	26.92	16.59	1.62x
System.Tests.Perf_Int64.TryFormat(value: 214748)	17.71	11.71	1.51x
System.Tests.Perf_Int64.TryFormat(value: -214748)	27.18	17.66	1.54x
System.Tests.Perf_Int64.TryFormat(value: 2147483)	19.02	12.18	1.56x
System.Tests.Perf_Int64.TryFormat(value: -2147483)	28.04	18.34	1.53x
System.Tests.Perf_Int64.TryFormat(value: 21474836)	18.76	13.16	1.43x
System.Tests.Perf_Int64.TryFormat(value: -21474836)	29.07	18.67	1.56x
System.Tests.Perf_Int64.TryFormat(value: 214748364)	19.35	13.65	1.42x
System.Tests.Perf_Int64.TryFormat(value: -214748364)	29.63	19.78	1.50x
System.Tests.Perf_Int64.TryFormat(value: 2147483647)	20.41	14.13	1.44x
System.Tests.Perf_Int64.TryFormat(value: -2147483648)	30.42	20.92	1.45x
System.Tests.Perf_Int64.TryFormat(value: 4294967295000000000)	27.36	21.16	1.29x
System.Tests.Perf_Int64.TryFormat(value: -4294967295000000000)	36.75	27.20	1.35x
System.Tests.Perf_Int64.TryFormat(value: 4294967295000000001)	27.10	21.35	1.27x
System.Tests.Perf_Int64.TryFormat(value: -4294967295000000001)	36.55	27.21	1.34x
System.Tests.Perf_Int64.TryFormat(value: 92233720368)	22.23	17.38	1.28x
System.Tests.Perf_Int64.TryFormat(value: -92233720368)	31.76	22.69	1.40x
System.Tests.Perf_Int64.TryFormat(value: 922337203685)	24.25	16.95	1.43x
System.Tests.Perf_Int64.TryFormat(value: -922337203685)	32.23	23.23	1.39x
System.Tests.Perf_Int64.TryFormat(value: 9223372036854)	23.20	17.66	1.31x
System.Tests.Perf_Int64.TryFormat(value: -9223372036854)	32.78	23.79	1.38x
System.Tests.Perf_Int64.TryFormat(value: 92233720368547)	23.90	18.21	1.31x
System.Tests.Perf_Int64.TryFormat(value: -92233720368547)	32.98	24.20	1.36x
System.Tests.Perf_Int64.TryFormat(value: 922337203685477)	24.49	18.40	1.33x
System.Tests.Perf_Int64.TryFormat(value: -922337203685477)	34.17	24.35	1.40x
System.Tests.Perf_Int64.TryFormat(value: 9223372036854775)	25.04	20.20	1.24x
System.Tests.Perf_Int64.TryFormat(value: -9223372036854775)	35.39	25.29	1.40x
System.Tests.Perf_Int64.TryFormat(value: 92233720368547758)	26.46	19.82	1.33x
System.Tests.Perf_Int64.TryFormat(value: -92233720368547758)	35.27	25.63	1.38x
System.Tests.Perf_Int64.TryFormat(value: 922337203685477580)	26.59	21.72	1.22x
System.Tests.Perf_Int64.TryFormat(value: -922337203685477580)	35.66	26.36	1.35x
System.Tests.Perf_Int64.TryFormat(value: 9223372036854775807)	28.94	22.80	1.27x
System.Tests.Perf_Int64.TryFormat(value: -9223372036854775808)	37.46	29.98	1.25x
System.Tests.Perf_UInt32.ToString(value: 0)	13.21	10.39	1.27x
System.Tests.Perf_UInt32.ToString(value: 1)	12.88	10.61	1.21x
System.Tests.Perf_UInt32.ToString(value: 1283)	13.74	12.67	1.09x
System.Tests.Perf_UInt32.ToString(value: 12837467)	16.11	15.03	1.07x
System.Tests.Perf_UInt32.ToString(value: 4294967295)	17.54	16.14	1.09x
System.Tests.Perf_UInt32.TryFormat(value: 0)	13.38	7.22	1.85x
System.Tests.Perf_UInt32.TryFormat(value: 1)	13.47	7.22	1.87x
System.Tests.Perf_UInt32.TryFormat(value: 1283)	14.88	9.16	1.62x
System.Tests.Perf_UInt32.TryFormat(value: 12837467)	16.36	11.56	1.42x
System.Tests.Perf_UInt32.TryFormat(value: 4294967295)	17.33	13.18	1.31x
System.Tests.Perf_UInt64.ToString(value: 0)	2.99	2.35	1.28x
System.Tests.Perf_UInt64.ToString(value: 1000000000000000000)	5.54	5.16	1.08x
System.Tests.Perf_UInt64.ToString(value: 18446744073709551615)	6.16	6.14	1.00x
System.Tests.Perf_UInt64.ToString(value: 2)	2.97	2.38	1.25x
System.Tests.Perf_UInt64.ToString(value: 21)	3.13	2.46	1.27x
System.Tests.Perf_UInt64.ToString(value: 214)	3.23	2.67	1.21x
System.Tests.Perf_UInt64.ToString(value: 2147)	3.40	2.86	1.19x
System.Tests.Perf_UInt64.ToString(value: 21474)	3.46	2.93	1.18x
System.Tests.Perf_UInt64.ToString(value: 214748)	3.53	3.03	1.17x
System.Tests.Perf_UInt64.ToString(value: 2147483)	3.62	3.17	1.14x
System.Tests.Perf_UInt64.ToString(value: 21474836)	3.80	3.30	1.15x
System.Tests.Perf_UInt64.ToString(value: 214748364)	4.03	3.58	1.13x
System.Tests.Perf_UInt64.ToString(value: 2147483647)	4.13	3.53	1.17x
System.Tests.Perf_UInt64.ToString(value: 4294967295000000000)	5.58	5.48	1.02x
System.Tests.Perf_UInt64.ToString(value: 4294967295000000001)	5.65	5.19	1.09x
System.Tests.Perf_UInt64.ToString(value: 92233720368)	4.62	4.07	1.13x
System.Tests.Perf_UInt64.ToString(value: 922337203685)	4.75	4.17	1.14x
System.Tests.Perf_UInt64.ToString(value: 9223372036854)	4.74	4.47	1.06x
System.Tests.Perf_UInt64.ToString(value: 92233720368547)	4.95	4.77	1.04x
System.Tests.Perf_UInt64.ToString(value: 922337203685477)	5.02	4.49	1.12x
System.Tests.Perf_UInt64.ToString(value: 9223372036854775)	5.17	4.64	1.11x
System.Tests.Perf_UInt64.ToString(value: 92233720368547758)	5.28	4.88	1.08x
System.Tests.Perf_UInt64.ToString(value: 922337203685477580)	5.55	5.44	1.02x
System.Tests.Perf_UInt64.ToString(value: 9223372036854775807)	5.95	5.53	1.08x
System.Tests.Perf_UInt64.TryFormat(value: 0)	3.14	1.67	1.88x
System.Tests.Perf_UInt64.TryFormat(value: 1000000000000000000)	5.49	4.27	1.28x
System.Tests.Perf_UInt64.TryFormat(value: 18446744073709551615)	5.95	4.69	1.27x
System.Tests.Perf_UInt64.TryFormat(value: 2)	3.12	1.64	1.90x
System.Tests.Perf_UInt64.TryFormat(value: 21)	3.19	1.85	1.72x
System.Tests.Perf_UInt64.TryFormat(value: 214)	3.38	1.94	1.74x
System.Tests.Perf_UInt64.TryFormat(value: 2147)	3.40	2.05	1.66x
System.Tests.Perf_UInt64.TryFormat(value: 21474)	3.48	2.19	1.59x
System.Tests.Perf_UInt64.TryFormat(value: 214748)	3.54	2.35	1.50x
System.Tests.Perf_UInt64.TryFormat(value: 2147483)	3.70	2.40	1.54x
System.Tests.Perf_UInt64.TryFormat(value: 21474836)	3.69	2.65	1.39x
System.Tests.Perf_UInt64.TryFormat(value: 214748364)	3.90	2.67	1.46x
System.Tests.Perf_UInt64.TryFormat(value: 2147483647)	4.06	2.77	1.47x
System.Tests.Perf_UInt64.TryFormat(value: 4294967295000000000)	5.71	4.33	1.32x
System.Tests.Perf_UInt64.TryFormat(value: 4294967295000000001)	5.45	4.26	1.28x
System.Tests.Perf_UInt64.TryFormat(value: 92233720368)	4.47	3.35	1.34x
System.Tests.Perf_UInt64.TryFormat(value: 922337203685)	4.57	3.42	1.34x
System.Tests.Perf_UInt64.TryFormat(value: 9223372036854)	5.05	3.55	1.42x
System.Tests.Perf_UInt64.TryFormat(value: 92233720368547)	4.80	3.73	1.29x
System.Tests.Perf_UInt64.TryFormat(value: 922337203685477)	4.92	3.73	1.32x
System.Tests.Perf_UInt64.TryFormat(value: 9223372036854775)	5.10	3.82	1.34x
System.Tests.Perf_UInt64.TryFormat(value: 92233720368547758)	5.14	3.95	1.30x
System.Tests.Perf_UInt64.TryFormat(value: 922337203685477580)	5.29	4.09	1.30x
System.Tests.Perf_UInt64.TryFormat(value: 9223372036854775807)	5.79	4.61	1.26x

Currently we create a temporary buffer on the stack, format into it, and then copy from that stack buffer into either the target span (for TryFormat) or into a new string (for ToString. Following the approach as (and sharing the same code from) Utf8Formatter, where it first counts the number of digits in the output in order to determine an exact length, this commit changes the implementation to skip the temporary buffer and just format directly into the destination span or string. This results in a very measurable performance boost.

benaadams · 2018-04-05T06:20:48Z

Unrelated, but wondering if some of the pointer arithmetic data dependency could be broken in some of these loops?

Like in Int32ToNumber

int i = (int)(buffer + Int32Precision - p);
 	 
number.scale = i;
 
char* dst = number.digits;
while (--i >= 0)
     *dst++ = *p++;

There's a result dependency on inc i for the loop; can't do much about but also will likely hit a result dependency on both inc dst and inc p for the address of the data.

So could change to only depending on the result of i and not change dst or p?

int count = (int)(buffer + Int32Precision - p);
 	 
number.scale = count;
 
char* dst = number.digits;
for (int i = 0; i < count; i++)
{
    // *(dst + i) = *(p + i);
    dst[i] = p[i];
}

Not sure if the above is completely correct; easily confused by postfix vs prefix operators, and that uses both!

stephentoub · 2018-04-05T13:28:17Z

@dotnet-bot test Ubuntu arm Cross Checked Innerloop Build and Test please

jkotas

Nice!

Improve {u}int/long.ToString/TryFormat throughput by pre-computing the length Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>

ahsonkhan · 2018-04-05T18:04:36Z

src/mscorlib/shared/System.Private.CoreLib.Shared.projitems

@@ -52,6 +52,7 @@
    <Compile Include="$(MSBuildThisFileDirectory)System\Buffers\MemoryManager.cs" />
    <Compile Include="$(MSBuildThisFileDirectory)System\Buffers\TlsOverPerCoreLockedStacksArrayPool.cs" />
    <Compile Include="$(MSBuildThisFileDirectory)System\Buffers\Utilities.cs" />
+    <Compile Include="$(MSBuildThisFileDirectory)System\Buffers\Text\FormattingHelpers.CountDigits.cs" />


nit: sort order

What's wrong with the sort order? Don't we normally put files in a directory before folders in that directory?

I didn't know that was what we were doing.

Looking at this file, we seem to be following alphabetical order (only):

<Compile Include="$(MSBuildThisFileDirectory)System\Globalization\UnicodeCategory.cs" /> <Compile Include="$(MSBuildThisFileDirectory)System\Guid.cs" />

FWIW VS inserts items with simple string sort.

Of course it won't ever insert into an imported file like this.

ahsonkhan · 2018-04-05T18:10:00Z

Awesome! Perf improvement across the board :)

stephentoub · 2018-04-07T20:06:55Z

@Anipik, is the mirror to corefx running? I haven't seen the relevant pieces here mirrored yet. Thanks.

Improve {u}int/long.ToString/TryFormat throughput by pre-computing the length Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

Anipik · 2018-04-07T22:23:14Z

Done started the mirror

stephentoub · 2018-04-07T22:26:42Z

Thanks

Improve {u}int/long.ToString/TryFormat throughput by pre-computing the length Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>

stephentoub added 2 commits April 4, 2018 23:52

Move FormattingHelpers.Count{Hex}Digits from Utf8Formatter into shared

1acd737

jkotas approved these changes Apr 5, 2018

View reviewed changes

stephentoub merged commit 1e6b28c into dotnet:master Apr 5, 2018

stephentoub deleted the portnumericperf branch April 5, 2018 15:55

ahsonkhan reviewed Apr 5, 2018

View reviewed changes

lewurm mentioned this pull request Feb 1, 2019

[2018-08] Bump corert mono/mono#12721

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve {u}int/long.ToString/TryFormat throughput by pre-computing the length #17432

Improve {u}int/long.ToString/TryFormat throughput by pre-computing the length #17432

stephentoub commented Apr 5, 2018

benaadams commented Apr 5, 2018 •

edited

Loading

stephentoub commented Apr 5, 2018

jkotas left a comment

ahsonkhan Apr 5, 2018

stephentoub Apr 5, 2018

ahsonkhan Apr 5, 2018

danmoseley Apr 5, 2018

ahsonkhan commented Apr 5, 2018

stephentoub commented Apr 7, 2018

Anipik commented Apr 7, 2018

stephentoub commented Apr 7, 2018

Improve {u}int/long.ToString/TryFormat throughput by pre-computing the length #17432

Improve {u}int/long.ToString/TryFormat throughput by pre-computing the length #17432

Conversation

stephentoub commented Apr 5, 2018

benaadams commented Apr 5, 2018 • edited Loading

stephentoub commented Apr 5, 2018

jkotas left a comment

Choose a reason for hiding this comment

ahsonkhan Apr 5, 2018

Choose a reason for hiding this comment

stephentoub Apr 5, 2018

Choose a reason for hiding this comment

ahsonkhan Apr 5, 2018

Choose a reason for hiding this comment

danmoseley Apr 5, 2018

Choose a reason for hiding this comment

ahsonkhan commented Apr 5, 2018

stephentoub commented Apr 7, 2018

Anipik commented Apr 7, 2018

stephentoub commented Apr 7, 2018

benaadams commented Apr 5, 2018 •

edited

Loading