Light up core ASCII.Utility methods with Vector256/Vector512 code paths. #88532

anthonycanino · 2023-07-07T18:10:40Z

This PR lights up some code path in ASCII.Utility with Vector256/Vector512 code, namely, NarrowUtf16ToAscii, WidenAsciiToUtf16, GetIndexOfFirstNonAsciiChar, and GetIndexOfFirstNonAsciiByte.

For the GetIndexOfMethods, we have implemented the simpler, existing "default" code path but with the explicty VectorXX apis; for the Narrow/Widen methods, we have implemented the more complex SSE/Vector256 path but with the Vector256/Vector512 APIs. Right now, both are a slight tradeoff in terms of code complexity/performance.

We are open to adjusting the implementation style for either path. Perf numbers coming soon.

Perf

Please see the next section for the raw data collected from the utf8 case for System.Text.Tests.Perf_Encoding. I have formatted it a bit here to make it slightly easier to draw conclusions. Essentially, I have run with a base, run with Enable_AVX512F=1 for the vector512 path, and with Enable_AVX512F=0 for the vector256 path.

The micro is run with...

runtime\dotnet.cmd run -c Release -f net8.0 --filter System.Text.Tests.Perf_Encoding.* --corerun runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe

I have added some additional data sizes and an additional micro GetCharCount which simply invokes enc.GetCharCount analogous to GetByteCount. I call out some speedup with green, and some slowdown with orange.

GetBytes

GetChars

GetByteCount

GetCharCount

Raw Results from Two Runs

With DOTNET_AVX512F=1

Method	Job	Toolchain	size	encName	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Gen0	Gen1	Allocated	Alloc Ratio
GetBytes	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	16.870 ns	0.3963 ns	0.4069 ns	16.946 ns	16.094 ns	17.412 ns	1.00	0.00	0.0021	-	40 B	1.00
GetBytes	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	17.409 ns	0.3935 ns	0.3865 ns	17.524 ns	16.928 ns	18.138 ns	1.03	0.03	0.0021	-	40 B	1.00

GetChars	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	24.459 ns	0.5415 ns	0.5561 ns	24.677 ns	23.679 ns	25.508 ns	1.00	0.00	0.0029	-	56 B	1.00
GetChars	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	26.097 ns	0.4077 ns	0.3814 ns	26.229 ns	25.364 ns	26.878 ns	1.07	0.03	0.0029	-	56 B	1.00

GetByteCount	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	9.204 ns	0.4215 ns	0.4853 ns	9.283 ns	7.837 ns	9.642 ns	1.00	0.00	-	-	-	NA
GetByteCount	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	8.027 ns	0.0503 ns	0.0392 ns	8.017 ns	7.981 ns	8.081 ns	0.88	0.07	-	-	-	NA

GetCharCount	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	5.928 ns	0.0466 ns	0.0413 ns	5.924 ns	5.882 ns	5.993 ns	1.00	0.00	-	-	-	NA
GetCharCount	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	7.510 ns	0.1455 ns	0.1361 ns	7.429 ns	7.360 ns	7.777 ns	1.27	0.03	-	-	-	NA

GetBytes	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	16.694 ns	0.1554 ns	0.1213 ns	16.644 ns	16.571 ns	16.952 ns	1.00	0.00	0.0029	-	56 B	1.00
GetBytes	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	16.070 ns	0.0626 ns	0.0523 ns	16.075 ns	15.967 ns	16.149 ns	0.96	0.01	0.0029	-	56 B	1.00

GetChars	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	25.220 ns	0.1805 ns	0.1600 ns	25.204 ns	25.009 ns	25.451 ns	1.00	0.00	0.0046	-	88 B	1.00
GetChars	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	25.691 ns	0.2295 ns	0.1916 ns	25.649 ns	25.316 ns	25.995 ns	1.02	0.01	0.0046	-	88 B	1.00

GetByteCount	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	8.251 ns	0.0137 ns	0.0121 ns	8.254 ns	8.223 ns	8.265 ns	1.00	0.00	-	-	-	NA
GetByteCount	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	7.585 ns	0.0211 ns	0.0198 ns	7.585 ns	7.552 ns	7.623 ns	0.92	0.00	-	-	-	NA

GetCharCount	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	6.889 ns	0.1059 ns	0.0939 ns	6.913 ns	6.563 ns	6.924 ns	1.00	0.00	-	-	-	NA
GetCharCount	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	7.681 ns	0.0036 ns	0.0030 ns	7.681 ns	7.675 ns	7.686 ns	1.12	0.02	-	-	-	NA

GetBytes	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	30.093 ns	0.1604 ns	0.1339 ns	30.134 ns	29.827 ns	30.335 ns	1.00	0.00	0.0046	-	88 B	1.00
GetBytes	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	29.859 ns	0.2332 ns	0.2068 ns	29.885 ns	29.535 ns	30.140 ns	0.99	0.01	0.0046	-	88 B	1.00

GetChars	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	29.528 ns	0.2547 ns	0.2258 ns	29.507 ns	29.246 ns	30.069 ns	1.00	0.00	0.0081	-	152 B	1.00
GetChars	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	31.390 ns	0.2142 ns	0.1899 ns	31.411 ns	30.957 ns	31.668 ns	1.06	0.01	0.0080	-	152 B	1.00

GetByteCount	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	9.404 ns	0.0206 ns	0.0192 ns	9.404 ns	9.366 ns	9.435 ns	1.00	0.00	-	-	-	NA
GetByteCount	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	9.203 ns	0.0503 ns	0.0471 ns	9.209 ns	9.094 ns	9.255 ns	0.98	0.01	-	-	-	NA

GetCharCount	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	7.445 ns	0.0030 ns	0.0027 ns	7.446 ns	7.440 ns	7.448 ns	1.00	0.00	-	-	-	NA
GetCharCount	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	8.367 ns	0.0072 ns	0.0064 ns	8.367 ns	8.356 ns	8.379 ns	1.12	0.00	-	-	-	NA

GetBytes	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	36.796 ns	0.2551 ns	0.2261 ns	36.797 ns	36.321 ns	37.218 ns	1.00	0.00	0.0081	-	152 B	1.00
GetBytes	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	36.670 ns	0.2703 ns	0.2396 ns	36.661 ns	36.354 ns	37.150 ns	1.00	0.01	0.0080	-	152 B	1.00

GetChars	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	37.293 ns	0.3142 ns	0.2785 ns	37.330 ns	36.756 ns	37.780 ns	1.00	0.00	0.0148	-	280 B	1.00
GetChars	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	38.340 ns	0.2087 ns	0.1850 ns	38.320 ns	38.038 ns	38.642 ns	1.03	0.01	0.0148	-	280 B	1.00

GetByteCount	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	11.914 ns	0.0415 ns	0.0388 ns	11.923 ns	11.815 ns	11.957 ns	1.00	0.00	-	-	-	NA
GetByteCount	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	10.412 ns	0.0483 ns	0.0452 ns	10.415 ns	10.317 ns	10.499 ns	0.87	0.01	-	-	-	NA

GetCharCount	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	8.595 ns	0.0078 ns	0.0069 ns	8.597 ns	8.580 ns	8.606 ns	1.00	0.00	-	-	-	NA
GetCharCount	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	11.532 ns	0.0134 ns	0.0119 ns	11.531 ns	11.514 ns	11.553 ns	1.34	0.00	-	-	-	NA

GetBytes	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	50.754 ns	0.0815 ns	0.0637 ns	50.762 ns	50.649 ns	50.848 ns	1.00	0.00	0.0148	-	280 B	1.00
GetBytes	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	46.471 ns	0.0475 ns	0.0397 ns	46.468 ns	46.417 ns	46.560 ns	0.92	0.00	0.0149	-	280 B	1.00

GetChars	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	53.838 ns	0.4639 ns	0.4112 ns	53.651 ns	53.322 ns	54.746 ns	1.00	0.00	0.0284	-	536 B	1.00
GetChars	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	54.503 ns	1.1033 ns	0.9780 ns	54.805 ns	53.152 ns	56.528 ns	1.01	0.02	0.0283	-	536 B	1.00

GetByteCount	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	16.165 ns	0.1093 ns	0.1023 ns	16.188 ns	15.907 ns	16.303 ns	1.00	0.00	-	-	-	NA
GetByteCount	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	14.100 ns	0.2427 ns	0.2270 ns	14.222 ns	13.641 ns	14.296 ns	0.87	0.02	-	-	-	NA

GetCharCount	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	11.281 ns	0.0065 ns	0.0061 ns	11.284 ns	11.268 ns	11.287 ns	1.00	0.00	-	-	-	NA
GetCharCount	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	12.991 ns	0.0526 ns	0.0492 ns	12.991 ns	12.927 ns	13.075 ns	1.15	0.00	-	-	-	NA

GetBytes	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	84.888 ns	0.6871 ns	0.5738 ns	85.025 ns	83.542 ns	85.490 ns	1.00	0.00	0.0283	-	536 B	1.00
GetBytes	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	71.747 ns	0.9126 ns	0.8536 ns	71.638 ns	70.464 ns	73.243 ns	0.84	0.01	0.0283	-	536 B	1.00

GetChars	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	90.956 ns	1.3016 ns	1.1538 ns	90.855 ns	89.107 ns	93.114 ns	1.00	0.00	0.0556	-	1048 B	1.00
GetChars	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	88.008 ns	0.4077 ns	0.3615 ns	87.946 ns	87.422 ns	88.657 ns	0.97	0.01	0.0555	-	1048 B	1.00

GetByteCount	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	24.785 ns	0.1321 ns	0.1171 ns	24.794 ns	24.454 ns	24.955 ns	1.00	0.00	-	-	-	NA
GetByteCount	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	15.675 ns	0.0271 ns	0.0254 ns	15.681 ns	15.625 ns	15.708 ns	0.63	0.00	-	-	-	NA

GetCharCount	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	15.559 ns	0.0052 ns	0.0046 ns	15.559 ns	15.551 ns	15.567 ns	1.00	0.00	-	-	-	NA
GetCharCount	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	16.235 ns	0.0329 ns	0.0308 ns	16.235 ns	16.191 ns	16.305 ns	1.04	0.00	-	-	-	NA

GetBytes	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	174.537 ns	1.2549 ns	1.1124 ns	174.597 ns	172.235 ns	176.644 ns	1.00	0.00	0.0550	-	1048 B	1.00
GetBytes	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	129.363 ns	1.2396 ns	1.0988 ns	129.508 ns	127.232 ns	130.718 ns	0.74	0.01	0.0556	-	1048 B	1.00

GetChars	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	170.690 ns	1.3975 ns	1.2389 ns	170.792 ns	168.408 ns	172.876 ns	1.00	0.00	0.1100	0.0007	2072 B	1.00
GetChars	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	148.640 ns	0.3393 ns	0.2833 ns	148.566 ns	148.082 ns	149.138 ns	0.87	0.01	0.1100	0.0007	2072 B	1.00

GetByteCount	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	51.289 ns	0.2287 ns	0.2140 ns	51.320 ns	50.601 ns	51.489 ns	1.00	0.00	-	-	-	NA
GetByteCount	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	34.275 ns	1.2877 ns	1.4829 ns	34.238 ns	31.380 ns	36.673 ns	0.66	0.03	-	-	-	NA

GetCharCount	Job-GLULPR	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	33.042 ns	0.6956 ns	0.7143 ns	33.531 ns	31.532 ns	33.683 ns	1.00	0.00	-	-	-	NA
GetCharCount	Job-POYNXH	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	20.591 ns	0.2576 ns	0.2409 ns	20.498 ns	20.310 ns	20.989 ns	0.62	0.01	-	-	-	NA

With DOTNET_EnableAVX512F=0

Method	Job	Toolchain	size	encName	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Gen0	Gen1	Allocated	Alloc Ratio
GetBytes	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	15.685 ns	0.2940 ns	0.2750 ns	15.791 ns	15.234 ns	16.202 ns	1.00	0.00	0.0021	-	40 B	1.00
GetBytes	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	15.482 ns	0.0969 ns	0.0809 ns	15.458 ns	15.378 ns	15.633 ns	0.99	0.02	0.0021	-	40 B	1.00

GetChars	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	22.264 ns	0.4973 ns	0.5528 ns	22.087 ns	21.686 ns	23.379 ns	1.00	0.00	0.0029	-	56 B	1.00
GetChars	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	23.657 ns	0.2815 ns	0.2495 ns	23.623 ns	23.312 ns	24.073 ns	1.06	0.03	0.0029	-	56 B	1.00

GetByteCount	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	8.275 ns	0.0317 ns	0.0247 ns	8.274 ns	8.235 ns	8.325 ns	1.00	0.00	-	-	-	NA
GetByteCount	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	8.304 ns	0.0380 ns	0.0297 ns	8.306 ns	8.245 ns	8.345 ns	1.00	0.00	-	-	-	NA

GetCharCount	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	5.953 ns	0.0696 ns	0.0617 ns	5.931 ns	5.891 ns	6.074 ns	1.00	0.00	-	-	-	NA
GetCharCount	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	16	utf-8	7.347 ns	0.0277 ns	0.0246 ns	7.342 ns	7.302 ns	7.390 ns	1.23	0.01	-	-	-	NA

GetBytes	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	16.420 ns	0.3640 ns	0.3226 ns	16.284 ns	16.096 ns	17.034 ns	1.00	0.00	0.0030	-	56 B	1.00
GetBytes	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	16.727 ns	0.1400 ns	0.1241 ns	16.713 ns	16.554 ns	16.938 ns	1.02	0.02	0.0029	-	56 B	1.00

GetChars	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	24.938 ns	0.4161 ns	0.3689 ns	24.880 ns	24.519 ns	25.590 ns	1.00	0.00	0.0046	-	88 B	1.00
GetChars	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	24.797 ns	0.1152 ns	0.1021 ns	24.805 ns	24.644 ns	24.927 ns	0.99	0.02	0.0047	-	88 B	1.00

GetByteCount	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	8.893 ns	0.0236 ns	0.0209 ns	8.894 ns	8.853 ns	8.925 ns	1.00	0.00	-	-	-	NA
GetByteCount	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	7.642 ns	0.0171 ns	0.0152 ns	7.641 ns	7.618 ns	7.673 ns	0.86	0.00	-	-	-	NA

GetCharCount	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	6.666 ns	0.0187 ns	0.0175 ns	6.667 ns	6.639 ns	6.703 ns	1.00	0.00	-	-	-	NA
GetCharCount	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	32	utf-8	7.687 ns	0.0064 ns	0.0060 ns	7.686 ns	7.676 ns	7.698 ns	1.15	0.00	-	-	-	NA

GetBytes	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	28.484 ns	0.2591 ns	0.2297 ns	28.462 ns	28.234 ns	28.920 ns	1.00	0.00	0.0047	-	88 B	1.00
GetBytes	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	30.190 ns	0.5999 ns	0.5009 ns	30.452 ns	29.091 ns	30.659 ns	1.06	0.02	0.0047	-	88 B	1.00

GetChars	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	29.352 ns	0.2424 ns	0.2268 ns	29.315 ns	28.931 ns	29.652 ns	1.00	0.00	0.0080	-	152 B	1.00
GetChars	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	28.814 ns	0.2307 ns	0.2045 ns	28.849 ns	28.504 ns	29.228 ns	0.98	0.01	0.0081	-	152 B	1.00

GetByteCount	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	8.681 ns	0.0169 ns	0.0158 ns	8.680 ns	8.653 ns	8.714 ns	1.00	0.00	-	-	-	NA
GetByteCount	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	9.201 ns	0.0190 ns	0.0159 ns	9.204 ns	9.165 ns	9.221 ns	1.06	0.00	-	-	-	NA

GetCharCount	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	7.376 ns	0.0084 ns	0.0070 ns	7.374 ns	7.368 ns	7.392 ns	1.00	0.00	-	-	-	NA
GetCharCount	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	64	utf-8	8.089 ns	0.0232 ns	0.0217 ns	8.089 ns	8.038 ns	8.123 ns	1.10	0.00	-	-	-	NA

GetBytes	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	36.537 ns	0.1701 ns	0.1421 ns	36.568 ns	36.284 ns	36.775 ns	1.00	0.00	0.0080	-	152 B	1.00
GetBytes	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	35.399 ns	0.1617 ns	0.1434 ns	35.347 ns	35.154 ns	35.641 ns	0.97	0.00	0.0080	-	152 B	1.00

GetChars	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	36.593 ns	0.4243 ns	0.3968 ns	36.610 ns	35.943 ns	37.308 ns	1.00	0.00	0.0148	-	280 B	1.00
GetChars	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	36.775 ns	0.3406 ns	0.3020 ns	36.863 ns	36.344 ns	37.137 ns	1.00	0.01	0.0148	-	280 B	1.00

GetByteCount	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	10.957 ns	0.0113 ns	0.0100 ns	10.959 ns	10.932 ns	10.972 ns	1.00	0.00	-	-	-	NA
GetByteCount	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	10.908 ns	0.0235 ns	0.0220 ns	10.903 ns	10.880 ns	10.953 ns	1.00	0.00	-	-	-	NA

GetCharCount	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	8.644 ns	0.0455 ns	0.0425 ns	8.645 ns	8.588 ns	8.736 ns	1.00	0.00	-	-	-	NA
GetCharCount	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	128	utf-8	8.824 ns	0.1000 ns	0.0935 ns	8.822 ns	8.647 ns	8.961 ns	1.02	0.01	-	-	-	NA

GetBytes	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	52.601 ns	0.3706 ns	0.3285 ns	52.589 ns	52.028 ns	53.098 ns	1.00	0.00	0.0148	-	280 B	1.00
GetBytes	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	49.847 ns	0.3042 ns	0.2697 ns	49.815 ns	49.474 ns	50.335 ns	0.95	0.01	0.0147	-	280 B	1.00

GetChars	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	54.972 ns	0.5678 ns	0.5033 ns	54.919 ns	54.226 ns	56.047 ns	1.00	0.00	0.0283	-	536 B	1.00
GetChars	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	54.261 ns	0.9696 ns	0.9070 ns	54.289 ns	52.871 ns	56.101 ns	0.99	0.02	0.0283	-	536 B	1.00

GetByteCount	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	15.922 ns	0.0803 ns	0.0751 ns	15.917 ns	15.735 ns	16.020 ns	1.00	0.00	-	-	-	NA
GetByteCount	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	14.330 ns	0.0253 ns	0.0237 ns	14.343 ns	14.288 ns	14.363 ns	0.90	0.00	-	-	-	NA

GetCharCount	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	11.216 ns	0.1034 ns	0.0968 ns	11.214 ns	10.986 ns	11.365 ns	1.00	0.00	-	-	-	NA
GetCharCount	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	256	utf-8	10.533 ns	0.0370 ns	0.0309 ns	10.539 ns	10.476 ns	10.586 ns	0.94	0.01	-	-	-	NA

GetBytes	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	99.951 ns	0.7868 ns	0.6974 ns	100.088 ns	98.710 ns	101.068 ns	1.00	0.00	0.0283	-	536 B	1.00
GetBytes	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	80.361 ns	0.5703 ns	0.5056 ns	80.284 ns	79.564 ns	81.469 ns	0.80	0.01	0.0281	-	536 B	1.00

GetChars	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	93.390 ns	1.0341 ns	0.9673 ns	93.122 ns	92.216 ns	95.289 ns	1.00	0.00	0.0554	-	1048 B	1.00
GetChars	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	89.364 ns	1.0377 ns	0.9199 ns	89.247 ns	87.715 ns	90.685 ns	0.96	0.01	0.0557	-	1048 B	1.00

GetByteCount	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	24.083 ns	0.0269 ns	0.0252 ns	24.090 ns	24.043 ns	24.120 ns	1.00	0.00	-	-	-	NA
GetByteCount	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	20.248 ns	0.0847 ns	0.0792 ns	20.251 ns	19.979 ns	20.322 ns	0.84	0.00	-	-	-	NA

GetCharCount	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	15.784 ns	0.0404 ns	0.0378 ns	15.798 ns	15.706 ns	15.836 ns	1.00	0.00	-	-	-	NA
GetCharCount	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	512	utf-8	13.821 ns	0.0650 ns	0.0608 ns	13.831 ns	13.689 ns	13.926 ns	0.88	0.00	-	-	-	NA

GetBytes	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	171.779 ns	1.3070 ns	1.1586 ns	171.905 ns	169.700 ns	174.232 ns	1.00	0.00	0.0551	-	1048 B	1.00
GetBytes	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	156.507 ns	0.9721 ns	0.8617 ns	156.265 ns	155.498 ns	158.305 ns	0.91	0.01	0.0555	-	1048 B	1.00

GetChars	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	167.936 ns	2.4062 ns	2.1330 ns	167.886 ns	165.697 ns	171.998 ns	1.00	0.00	0.1100	0.0007	2072 B	1.00
GetChars	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	156.658 ns	1.2998 ns	1.1523 ns	157.029 ns	154.017 ns	158.111 ns	0.93	0.01	0.1099	0.0006	2072 B	1.00

GetByteCount	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	49.613 ns	0.1716 ns	0.1521 ns	49.592 ns	49.279 ns	49.947 ns	1.00	0.00	-	-	-	NA
GetByteCount	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	49.040 ns	1.1058 ns	1.2734 ns	49.251 ns	44.486 ns	50.348 ns	0.99	0.03	-	-	-	NA

GetCharCount	Job-ZAMPDT	\runtime-base\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	26.012 ns	0.0622 ns	0.0581 ns	26.007 ns	25.938 ns	26.135 ns	1.00	0.00	-	-	-	NA
GetCharCount	Job-RVECRT	\runtime\artifacts\bin\testhost\net8.0-windows-Release-x64\shared\Microsoft.NETCore.App\8.0.0\corerun.exe	1024	utf-8	19.002 ns	0.0659 ns	0.0550 ns	19.004 ns	18.890 ns	19.118 ns	0.73	0.00	-	-	-	NA

ghost · 2023-07-07T18:10:52Z

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR lights up some code path in ASCII.Utility with Vector256/Vector512 code, namely, NarrowUtf16ToAscii, WidenAsciiToUtf16, GetIndexOfFirstNonAsciiChar, and GetIndexOfFirstNonAsciiByte.

For the GetIndexOfMethods, we have implemented the simpler, existing "default" code path but with the explicty VectorXX apis; for the Narrow/Widen methods, we have implemented the more complex SSE/Vector256 path but with the Vector256/Vector512 APIs. Right now, both are a slight tradeoff in terms of code complexity/performance.

We are open to adjusting the implementation style for either path. Perf numbers coming soon.

Author:	anthonycanino
Assignees:	-
Labels:	`area-System.Numerics`, `community-contribution`
Milestone:	-

anthonycanino · 2023-07-07T19:42:27Z

A rough summary of the results...

We will see speedup with the Vector512 and Vector256 implementation on large data sizes.
There is a lot of room to tune, particularly given the Vector512 and Vector256 paths do not use that much specialized intrinsics, though I think we want to favor this more general approach.

anthonycanino · 2023-07-10T20:08:15Z

@dotnet/avx512-contrib can we get a review on this?

I checked the failures which look like they are related to a CPUID test, I don't think the changes would impact?

tannergooding · 2023-07-10T20:59:33Z

CpuId failures are known and have a PR up to resolve them #88623

tannergooding · 2023-07-11T22:08:42Z

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs

+        /// Uses double instead of long to get a single instruction instead of storing temps on general porpose register (or stack)
+        /// </remarks>
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        internal static void StoreLowerUnsafe<T>(this Vector256<T> source, ref T destination, nuint elementOffset = 0)


Why this instead of source.GetLower().StoreUnsafe(ref destination, elementOffset)?

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

tannergooding · 2023-07-11T22:11:11Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

+            {
+                uint SizeOfVector512InBytes = (uint)Vector512<byte>.Count; // JIT will make this a const
+
+                if (Unsafe.ReadUnaligned<Vector512<byte>>(pBuffer).ExtractMostSignificantBits() == 0)


Why not this instead?

Suggested change

if (Unsafe.ReadUnaligned<Vector512<byte>>(pBuffer).ExtractMostSignificantBits() == 0)

if (Vector512.Load(pBuffer).ExtractMostSignificantBits() == 0)

tannergooding · 2023-07-11T22:12:43Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

+
+            if (Vector512.IsHardwareAccelerated && bufferLength >= 2 * (uint)Vector512<byte>.Count)
+            {
+                uint SizeOfVector512InBytes = (uint)Vector512<byte>.Count; // JIT will make this a const


We actually have an internal Vector512.Size that is a const for just these types of cases.

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

with Vector512 and Vector256 APIs.

tannergooding · 2023-07-12T18:05:39Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

-                uint SizeOfVectorInChars = (uint)Vector<ushort>.Count; // JIT will make this a const
-                uint SizeOfVectorInBytes = (uint)Vector<byte>.Count; // JIT will make this a const
+                uint SizeOfVector512InChars = (uint)Vector512<ushort>.Count; // JIT will make this a const
+                uint SizeOfVector512InBytes = (uint)Vector512.Size; // JIT will make this a const


You should be able to use Vecto512.Size directly and have C# keep it a const instead.

It will make things a bit more readable and allow C# to constant fold some of the simple cases (e.g. ~(nuint)(SizeOfVector512InBytes - 1) itself will become constant foldable rather than forcing IL to emit ldloc; sub; conv.i; sub and then requiring the JIT to optimize that down.

tannergooding · 2023-07-12T18:07:23Z

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector512.cs

+        /// Uses double instead of long to get a single instruction instead of storing temps on general porpose register (or stack)
+        /// </remarks>
+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        internal static void StoreLowerUnsafe<T>(this Vector512<T> source, ref T destination, nuint elementOffset = 0)


Same general question here as on the other. Why this helper rather than simply doing source.GetLower().StoreUnsafe(ref destination, elementOffset)

It looks like the right stuff happens for the former already and it avoids the JIT needing to do any inlining for it.

tannergooding · 2023-07-12T18:09:01Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

+
+            if (Vector512.IsHardwareAccelerated && bufferLength >= 2 * (uint)Vector512<byte>.Count)
+            {
+                uint SizeOfVector512InBytes = (uint)Vector512.Size; // JIT will make this a const


Same here and other places. If we use Vector512.Size directly, we get much simply and constant foldable IL, allowing the JIT to do less work.

Since Vector512.Size is itself a constant, we shouldn't need to insert extra casts anywhere. But if we did and it was a non-trivial number, it would be better to declare this as const uint SizeOfVector512InBytes instead so the same constant folding could still happen.

tannergooding · 2023-07-12T18:11:10Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

@@ -1407,6 +1867,20 @@ private static bool VectorContainsNonAsciiChar(Vector128<byte> asciiVector)
            }
        }

+        [MethodImpl(MethodImplOptions.AggressiveInlining)]
+        private static bool VectorContainsNonAsciiChar(Vector256<byte> asciiVector)


Do we need wrapper methods for this simple case?

Is there any reason it can't just be asciiVector != Vector256<byte>.Zero instead which allows ptest; jcc rather than movmsk, cmp; jcc?

I was doing this for code reuse/readability.

Would you prefer to just inline the function?

I think inlining it manually in this case would be better.

The abstractions like this are generally helpful when we have more complex logic differing between platforms or when there is a high likelihood to need to identify and change the pattern for the same pattern repeatedly in the future.

In cases like this, where we're really just doing a trivial comparison check, I don't think the helper buys us much in terms of readability/maintainability and it does have drawbacks in the form of forcing the JIT to do more work to inline/optimize the code.

tannergooding · 2023-07-12T18:13:26Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

+            // Narrows two vectors of words [ w7 w6 w5 w4 w3 w2 w1 w0 ] and [ w7' w6' w5' w4' w3' w2' w1' w0' ]
+            // to a vector of bytes [ b7 ... b0 b7' ... b0'].
+
+            // prefer architecture specific intrinsic as they don't perform additional AND like Vector512.Narrow does


I think this comment is out of sync. We're using the non architecture specific Narrow below.

Likewise, given we're just calling the xplat narrow, do we need this helper method or can we just call Narrow directly and avoid the need to inline?

tannergooding · 2023-07-12T18:14:33Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

+            uint SizeOfVector256 = (uint)Vector256<byte>.Count;
+            nuint MaskOfAllBitsInVector256 = (nuint)(SizeOfVector256 - 1);


These can both be const, using Vector256.Size directly and declaring const nuint MaskOfAllBitsInVector256 for the latter.

Same for the equivalent case in Intrinsified_512

1. turn some variables into explicitly specified const. 2. removed some helper functions and inlined them.

GrabYourPitchforks · 2023-07-13T22:20:05Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

+                        Debug.Assert((nuint)pBuffer % Vector512.Size == 0, "Vector read should be aligned.");
+                        if (Vector512.LoadAligned(pBuffer).ExtractMostSignificantBits() != 0)
+                        {
+                            break; // found non-ASCII data


If we're within the vectorized code path and see non-ASCII data, since we already have the return value of ExtractMostSignificantBits in a register somewhere, I wonder if it would make sense to tzcnt the result and return immediately rather than falling down the drain code path.

Not important for this review, just a random musing.

GrabYourPitchforks · 2023-07-13T22:48:37Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

+
+                    do
+                    {
+                        Debug.Assert((nuint)pBuffer % SizeOfVector512InChars == 0, "Vector read should be aligned.");


This should read:

Debug.Assert((nuint)pBuffer % SizeOfVector512InBytes == 0, "Vector read should be aligned.");

(Looks like this is a bug in the original GetIndexOfFirstNonAsciiChar_Default method as well.)

GrabYourPitchforks · 2023-07-13T22:51:59Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

+            {
+                const uint SizeOfVector512InChars = Vector512.Size / sizeof(ushort);
+
+                Vector512<ushort> asciiMask = Vector512.Create((ushort) 0xFF80);


Unused local?

GrabYourPitchforks · 2023-07-13T23:21:40Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

+            // We're going to get the best performance when we have aligned writes, so we'll take the
+            // hit of potentially unaligned reads in order to hit this sweet spot.
+
+            // pAsciiBuffer points to the start of the destination buffer, immediately before where we wrote


Para should be updated: 0x10 bit, &pAsciiBuffer[SizeOfVector256 / 2], 16-byte write.

Looking back over the original comment this was copied from, I realize now what I originally wrote was word salad. 🙂 My comment wasn't intended to refer to the value stored at the referenced address, but rather the address itself. Basically, if you're trying to become 16-byte aligned, then one of the following must hold: (a) you can from your current position back up 7 or fewer bytes to achieve 16-byte alignment; or (b) you can write 8 bytes, bump the pointer, then back up 7 or fewer bytes to achieve 16-byte alignment.

For this method, since you're trying to become 32-byte aligned, those clauses become "15 or fewer bytes" and "you can write 16 bytes."

GrabYourPitchforks · 2023-07-13T23:23:46Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

+                    goto Finish;
+                }
+
+                // Turn the 32 ASCII chars we just read into 32 ASCII bytes, then copy it to the destination.


16, not 32.

GrabYourPitchforks · 2023-07-13T23:29:08Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

+
+            // First part was all ASCII, narrow and aligned write. Note we're only filling in the low half of the vector.
+
+            Debug.Assert(((nuint)pAsciiBuffer + currentOffsetInElements) % sizeof(ulong) == 0, "Destination should be ulong-aligned.");


Suggested change

Debug.Assert(((nuint)pAsciiBuffer + currentOffsetInElements) % sizeof(ulong) == 0, "Destination should be ulong-aligned.");

Debug.Assert(((nuint)pAsciiBuffer + currentOffsetInElements) % Vector128.Size == 0, "Destination should be 128-bit-aligned.");

GrabYourPitchforks · 2023-07-13T23:31:31Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

+            // We're going to get the best performance when we have aligned writes, so we'll take the
+            // hit of potentially unaligned reads in order to hit this sweet spot.
+
+            // pAsciiBuffer points to the start of the destination buffer, immediately before where we wrote


Same feedback as earlier: fix up comments.

GrabYourPitchforks · 2023-07-13T23:32:45Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

+
+            // First part was all ASCII, narrow and aligned write. Note we're only filling in the low half of the vector.
+
+            Debug.Assert(((nuint)pAsciiBuffer + currentOffsetInElements) % sizeof(ulong) == 0, "Destination should be ulong-aligned.");


Suggested change

Debug.Assert(((nuint)pAsciiBuffer + currentOffsetInElements) % sizeof(ulong) == 0, "Destination should be ulong-aligned.");

Debug.Assert(((nuint)pAsciiBuffer + currentOffsetInElements) % Vector256.Size == 0, "Destination should be 256-bit-aligned.");

GrabYourPitchforks · 2023-07-13T23:33:55Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

+                            break;
+                        }
+
+                        (Vector512<ushort> low, Vector512<ushort> upper) = Vector512.Widen(asciiVector);


Nit: keep the same local variable names as the other blocks in this method.

tannergooding · 2023-07-14T15:30:07Z

CC. @GrabYourPitchforks, looks like all your feedback has been addressed.

stephentoub · 2023-07-17T14:35:53Z

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

+            if (Vector512.IsHardwareAccelerated || Vector256.IsHardwareAccelerated)
+            {
+                return GetIndexOfFirstNonAsciiByte_Vector(pBuffer, bufferLength);
+            }
+            else if (Sse2.IsSupported || (AdvSimd.IsSupported && BitConverter.IsLittleEndian))


GetIndexOfFirstNonAsciiByte_Vector has a Vector128.IsHardwareAccelerated code path. We can't just rely on that and delete GetIndexOfFirstNonAsciiByte_Intrinsified?

The Vector128 fallback path is slower than the (more complex) Intrinsified path (see #88532 (comment)).

ghost added the community-contribution Indicates that the PR has been added by a community member label Jul 7, 2023

dotnet-issue-labeler bot added the area-System.Numerics label Jul 7, 2023

build-analysis bot mentioned this pull request Jul 7, 2023

LibraryImportGenerator.Unit.Tests crashed in CI #87951

Closed

anthonycanino marked this pull request as ready for review July 10, 2023 16:33

anthonycanino force-pushed the LibUpAscii branch from 0e56a90 to d1153c8 Compare July 10, 2023 17:02

build-analysis bot mentioned this pull request Jul 10, 2023

simpleruntimeeventvalidation test failing in CI #88499

Closed

build-analysis bot mentioned this pull request Jul 11, 2023

Test failure readytorun/HardwareIntrinsics/X86/CpuId_R2R_Avx/CpuId_R2R_Avx.sh #88582

Closed

tannergooding reviewed Jul 11, 2023

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs Show resolved Hide resolved

tannergooding reviewed Jul 11, 2023

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs Outdated Show resolved Hide resolved

Ruihan-Yin and others added 6 commits July 12, 2023 10:28

Lib upgrade for ToUtf16

12bb892

Upgrade NarrowUtf16ToAscii with Vector512

ff185d0

Complete the upgrade in NarrowUtf16ToAscii method

9d45ff8

with Vector512 and Vector256 APIs.

Adding VectorXX paths to GetIndexOfFirstNonAscii functions.

09b8e31

Adding optimization to Vecto256 VectorContainsNonAsciiChar method.

01a43e7

Code path refactoring and cleanup.

ea913db

anthonycanino force-pushed the LibUpAscii branch from 579459b to ea913db Compare July 12, 2023 17:35

tannergooding reviewed Jul 12, 2023

View reviewed changes

build-analysis bot mentioned this pull request Jul 12, 2023

File check failures in x64 jit tests #88783

Closed

build-analysis bot mentioned this pull request Jul 12, 2023

Timeout in Microsoft.Gen.OptionsValidation.Unit.Test.EmitterTest #88784

Closed

Code changes based on the review:

5cb1efc

1. turn some variables into explicitly specified const. 2. removed some helper functions and inlined them.

This was referenced Jul 13, 2023

[7.0] IJW support in newer version of CMake breaks the build on empty TargetFramework #88806

Closed

Tracking issue for CI build timeouts #76454

Closed

tannergooding approved these changes Jul 13, 2023

View reviewed changes

GrabYourPitchforks reviewed Jul 13, 2023

View reviewed changes

Resolve comments

5d06c67

build-analysis bot mentioned this pull request Jul 14, 2023

Failed USB connection via port 54050, error 61, in tvOS arm64 Release AllSubsets_Mono #82637

Open

tannergooding added the avx512 Related to the AVX-512 architecture label Jul 14, 2023

revert the changes at GetIndexOfFirstNonAsciiByte

17d3b28

stephentoub reviewed Jul 17, 2023

View reviewed changes

GrabYourPitchforks approved these changes Jul 17, 2023

View reviewed changes

tannergooding merged commit a513676 into dotnet:main Jul 17, 2023

lewing mentioned this pull request Aug 1, 2023

Try Vector128 before Vector #89797

Merged

EgorBo mentioned this pull request Aug 3, 2023

GetIndexOfFirstNonAsciiByte_Vector path in Ascii.Utility.cs is never exercised for AdvSimd/SSE4.1 #89924

Open

stephentoub mentioned this pull request Aug 14, 2023

Fix GetIndexOfFirstNonAsciiByte_Vector not taken on ARM64 #90527

Closed

ghost locked as resolved and limited conversation to collaborators Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Light up core ASCII.Utility methods with Vector256/Vector512 code paths. #88532

Light up core ASCII.Utility methods with Vector256/Vector512 code paths. #88532

anthonycanino commented Jul 7, 2023 •

edited

Loading

ghost commented Jul 7, 2023

anthonycanino commented Jul 7, 2023

anthonycanino commented Jul 10, 2023

tannergooding commented Jul 10, 2023

tannergooding Jul 11, 2023

tannergooding Jul 11, 2023

tannergooding Jul 11, 2023

tannergooding Jul 12, 2023

tannergooding Jul 12, 2023

tannergooding Jul 12, 2023

tannergooding Jul 12, 2023

anthonycanino Jul 12, 2023

tannergooding Jul 12, 2023

tannergooding Jul 12, 2023 •

edited

Loading

tannergooding Jul 12, 2023

tannergooding Jul 12, 2023

GrabYourPitchforks Jul 13, 2023

GrabYourPitchforks Jul 13, 2023

GrabYourPitchforks Jul 13, 2023

GrabYourPitchforks Jul 13, 2023

GrabYourPitchforks Jul 13, 2023

GrabYourPitchforks Jul 13, 2023

GrabYourPitchforks Jul 13, 2023

GrabYourPitchforks Jul 13, 2023

GrabYourPitchforks Jul 13, 2023

tannergooding commented Jul 14, 2023

stephentoub Jul 17, 2023

anthonycanino Jul 17, 2023

	if (Unsafe.ReadUnaligned<Vector512<byte>>(pBuffer).ExtractMostSignificantBits() == 0)
	if (Vector512.Load(pBuffer).ExtractMostSignificantBits() == 0)

		uint SizeOfVector256 = (uint)Vector256<byte>.Count;
		nuint MaskOfAllBitsInVector256 = (nuint)(SizeOfVector256 - 1);


		// First part was all ASCII, narrow and aligned write. Note we're only filling in the low half of the vector.

		Debug.Assert(((nuint)pAsciiBuffer + currentOffsetInElements) % sizeof(ulong) == 0, "Destination should be ulong-aligned.");

Light up core ASCII.Utility methods with Vector256/Vector512 code paths. #88532

Light up core ASCII.Utility methods with Vector256/Vector512 code paths. #88532

Conversation

anthonycanino commented Jul 7, 2023 • edited Loading

Perf

GetBytes

GetChars

GetByteCount

GetCharCount

Raw Results from Two Runs

ghost commented Jul 7, 2023

anthonycanino commented Jul 7, 2023

anthonycanino commented Jul 10, 2023

tannergooding commented Jul 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding Jul 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Jul 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anthonycanino commented Jul 7, 2023 •

edited

Loading

tannergooding Jul 12, 2023 •

edited

Loading