Double IndexOf throughput for chars #78861

MihaZupan · 2022-11-26T01:09:09Z

When searching through strings, it's very common to have single-byte values (think ASCII).
As long as the value falls within an appropriate range ([1, 254] on X86 or [0, 254] on ARM), we can speed up the search by packing two input vectors together before comparing the value.

The IndexOfAnyAsciiSearcher implementation I added in #78093 is already using this trick, but it applies to regular IndexOf as well.

In this PR, I added implementations that do such packing for Contains(char), IndexOf(char), IndexOfAny(char, char), IndexOfAny(char, char, char), and IndexOfAnyInRange(char, char), roughly doubling the throughput for long inputs.

Do we want to do the same for the Last- variants as well?
I don't think specialized IndexOf(4/5 values) would be useful. For ASCII values, using IndexOfAnyValues is already very close in throughput (and things like Regex will use that).

Benchmark numbers

Method	Toolchain	Length	Mean	Ratio
IndexOf	main	1	1.973 ns	1.00
IndexOf	pr	1	1.751 ns	0.89

IndexOfAny2Values	main	1	2.757 ns	1.00
IndexOfAny2Values	pr	1	2.593 ns	0.94

IndexOfAnyInRange	main	1	2.092 ns	1.00
IndexOfAnyInRange	pr	1	1.818 ns	0.87

IndexOf	main	7	3.622 ns	1.00
IndexOf	pr	7	3.741 ns	1.03

IndexOfAny2Values	main	7	5.889 ns	1.00
IndexOfAny2Values	pr	7	5.804 ns	0.99

IndexOfAnyInRange	main	7	5.102 ns	1.00
IndexOfAnyInRange	pr	7	3.638 ns	0.71

IndexOf	main	8	2.604 ns	1.00
IndexOf	pr	8	2.295 ns	0.88

IndexOfAny2Values	main	8	2.588 ns	1.00
IndexOfAny2Values	pr	8	2.790 ns	1.08

IndexOfAnyInRange	main	8	2.929 ns	1.00
IndexOfAnyInRange	pr	8	2.369 ns	0.81

IndexOf	main	9	2.855 ns	1.00
IndexOf	pr	9	2.279 ns	0.80

IndexOfAny2Values	main	9	2.849 ns	1.00
IndexOfAny2Values	pr	9	2.774 ns	0.97

IndexOfAnyInRange	main	9	2.914 ns	1.00
IndexOfAnyInRange	pr	9	2.373 ns	0.81

IndexOf	main	15	2.836 ns	1.00
IndexOf	pr	15	2.315 ns	0.82

IndexOfAny2Values	main	15	2.788 ns	1.00
IndexOfAny2Values	pr	15	2.799 ns	1.00

IndexOfAnyInRange	main	15	2.928 ns	1.00
IndexOfAnyInRange	pr	15	2.361 ns	0.81

IndexOf	main	16	2.402 ns	1.00
IndexOf	pr	16	2.273 ns	0.95

IndexOfAny2Values	main	16	2.830 ns	1.00
IndexOfAny2Values	pr	16	2.791 ns	0.99

IndexOfAnyInRange	main	16	2.871 ns	1.00
IndexOfAnyInRange	pr	16	2.378 ns	0.83

IndexOf	main	17	2.740 ns	1.00
IndexOf	pr	17	2.574 ns	0.94

IndexOfAny2Values	main	17	3.376 ns	1.00
IndexOfAny2Values	pr	17	2.967 ns	0.88

IndexOfAnyInRange	main	17	3.040 ns	1.00
IndexOfAnyInRange	pr	17	2.312 ns	0.76

IndexOf	main	32	2.826 ns	1.00
IndexOf	pr	32	2.558 ns	0.91

IndexOfAny2Values	main	32	3.293 ns	1.00
IndexOfAny2Values	pr	32	2.971 ns	0.90

IndexOfAnyInRange	main	32	2.903 ns	1.00
IndexOfAnyInRange	pr	32	2.312 ns	0.80

IndexOf	main	1000	39.015 ns	1.00
IndexOf	pr	1000	23.991 ns	0.61

IndexOfAny2Values	main	1000	45.199 ns	1.00
IndexOfAny2Values	pr	1000	25.563 ns	0.57

IndexOfAnyInRange	main	1000	55.958 ns	1.00
IndexOfAnyInRange	pr	1000	23.020 ns	0.41

IndexOf	main	100000	4,875.359 ns	1.00
IndexOf	pr	100000	2,190.711 ns	0.45

IndexOfAny2Values	main	100000	5,709.625 ns	1.00
IndexOfAny2Values	pr	100000	2,980.139 ns	0.52

IndexOfAnyInRange	main	100000	5,133.861 ns	1.00
IndexOfAnyInRange	pr	100000	2,987.320 ns	0.58

Method	Toolchain	Length	Mean	Error	Ratio
IndexOfIgnoreCase	main	1	5.807 ns	0.0755 ns	1.00
IndexOfIgnoreCase	pr	1	5.766 ns	0.0167 ns	0.99

IndexOfIgnoreCase	main	32	8.565 ns	0.0465 ns	1.00
IndexOfIgnoreCase	pr	32	8.471 ns	0.0267 ns	0.99

IndexOfIgnoreCase	main	1000	51.199 ns	0.1238 ns	1.00
IndexOfIgnoreCase	pr	1000	34.857 ns	0.3394 ns	0.68

IndexOfIgnoreCase	main	100000	5,734.450 ns	22.3808 ns	1.00
IndexOfIgnoreCase	pr	100000	2,979.708 ns	12.2477 ns	0.52

This is generally a slight regression if a match is found at the start

If the first character matches

Method	Toolchain	Length	Mean	Error	Ratio
IndexOf	main	1	1.745 ns	0.0005 ns	1.00
IndexOf	pr	1	1.985 ns	0.0026 ns	1.14

IndexOfAny2Values	main	1	2.245 ns	0.0015 ns	1.00
IndexOfAny2Values	pr	1	2.182 ns	0.0021 ns	0.97

IndexOfAnyInRange	main	1	2.079 ns	0.0010 ns	1.00
IndexOfAnyInRange	pr	1	1.720 ns	0.0005 ns	0.83

IndexOf	main	7	1.526 ns	0.0005 ns	1.00
IndexOf	pr	7	1.510 ns	0.0006 ns	0.99

IndexOfAny2Values	main	7	1.743 ns	0.0009 ns	1.00
IndexOfAny2Values	pr	7	1.695 ns	0.0023 ns	0.97

IndexOfAnyInRange	main	7	2.100 ns	0.0010 ns	1.00
IndexOfAnyInRange	pr	7	1.751 ns	0.0043 ns	0.83

IndexOf	main	8	2.540 ns	0.0291 ns	1.00
IndexOf	pr	8	2.978 ns	0.0365 ns	1.18

IndexOfAny2Values	main	8	3.021 ns	0.0344 ns	1.00
IndexOfAny2Values	pr	8	3.319 ns	0.0072 ns	1.10

IndexOfAnyInRange	main	8	2.943 ns	0.0375 ns	1.00
IndexOfAnyInRange	pr	8	2.901 ns	0.0335 ns	0.99

IndexOf	main	16	2.934 ns	0.0021 ns	1.00
IndexOf	pr	16	2.791 ns	0.0023 ns	0.95

IndexOfAny2Values	main	16	2.881 ns	0.0028 ns	1.00
IndexOfAny2Values	pr	16	3.246 ns	0.0037 ns	1.13

IndexOfAnyInRange	main	16	2.657 ns	0.0029 ns	1.00
IndexOfAnyInRange	pr	16	2.725 ns	0.0023 ns	1.03

IndexOf	main	17	2.946 ns	0.0016 ns	1.00
IndexOf	pr	17	3.307 ns	0.0018 ns	1.12

IndexOfAny2Values	main	17	2.877 ns	0.0016 ns	1.00
IndexOfAny2Values	pr	17	3.496 ns	0.0064 ns	1.22

IndexOfAnyInRange	main	17	2.654 ns	0.0017 ns	1.00
IndexOfAnyInRange	pr	17	3.057 ns	0.0018 ns	1.15

IndexOf	main	32	2.949 ns	0.0027 ns	1.00
IndexOf	pr	32	3.360 ns	0.0136 ns	1.14

IndexOfAny2Values	main	32	2.914 ns	0.0057 ns	1.00
IndexOfAny2Values	pr	32	3.502 ns	0.0073 ns	1.20

IndexOfAnyInRange	main	32	2.658 ns	0.0021 ns	1.00
IndexOfAnyInRange	pr	32	3.042 ns	0.0012 ns	1.14

IndexOf	main	33	2.968 ns	0.0033 ns	1.00
IndexOf	pr	33	2.745 ns	0.0059 ns	0.92

IndexOfAny2Values	main	33	2.900 ns	0.0042 ns	1.00
IndexOfAny2Values	pr	33	3.260 ns	0.0079 ns	1.12

IndexOfAnyInRange	main	33	2.684 ns	0.0036 ns	1.00
IndexOfAnyInRange	pr	33	2.871 ns	0.0106 ns	1.07

IndexOf	main	1000	3.068 ns	0.0239 ns	1.00
IndexOf	pr	1000	2.903 ns	0.0343 ns	0.95

IndexOfAny2Values	main	1000	3.021 ns	0.0258 ns	1.00
IndexOfAny2Values	pr	1000	3.382 ns	0.0335 ns	1.12

IndexOfAnyInRange	main	1000	2.784 ns	0.0259 ns	1.00
IndexOfAnyInRange	pr	1000	2.981 ns	0.0312 ns	1.07

ghost · 2022-11-26T01:09:28Z

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

When searching through strings, it's very common to have single-byte values (think ASCII).
As long as the value falls within an appropriate range ([1, 254] on X86 or [0, 254] on ARM), we can speed up the search by packing two input vectors together before comparing the value.

The IndexOfAnyAsciiSearcher implementation I added in #78093 is already using this trick, but it applies to regular IndexOf as well.

In this POC PR, I added implementations that do such packing for IndexOf(char), IndexOfAny(char, char), and IndexOfAnyInRange(char, char), roughly doubling the throughput for long inputs.

If we're happy with the direction, I can add implementations for Contains(char), IndexOfAny(3/5 values), and their Last- counterparts as well if we feel they are useful.

Benchmark numbers

Method	Toolchain	Length	Mean	Ratio
IndexOf	main	1	1.973 ns	1.00
IndexOf	pr	1	1.751 ns	0.89

IndexOfAny2Values	main	1	2.757 ns	1.00
IndexOfAny2Values	pr	1	2.593 ns	0.94

IndexOfAnyInRange	main	1	2.092 ns	1.00
IndexOfAnyInRange	pr	1	1.818 ns	0.87

IndexOf	main	7	3.622 ns	1.00
IndexOf	pr	7	3.741 ns	1.03

IndexOfAny2Values	main	7	5.889 ns	1.00
IndexOfAny2Values	pr	7	5.804 ns	0.99

IndexOfAnyInRange	main	7	5.102 ns	1.00
IndexOfAnyInRange	pr	7	3.638 ns	0.71

IndexOf	main	8	2.604 ns	1.00
IndexOf	pr	8	2.295 ns	0.88

IndexOfAny2Values	main	8	2.588 ns	1.00
IndexOfAny2Values	pr	8	2.790 ns	1.08

IndexOfAnyInRange	main	8	2.929 ns	1.00
IndexOfAnyInRange	pr	8	2.369 ns	0.81

IndexOf	main	9	2.855 ns	1.00
IndexOf	pr	9	2.279 ns	0.80

IndexOfAny2Values	main	9	2.849 ns	1.00
IndexOfAny2Values	pr	9	2.774 ns	0.97

IndexOfAnyInRange	main	9	2.914 ns	1.00
IndexOfAnyInRange	pr	9	2.373 ns	0.81

IndexOf	main	15	2.836 ns	1.00
IndexOf	pr	15	2.315 ns	0.82

IndexOfAny2Values	main	15	2.788 ns	1.00
IndexOfAny2Values	pr	15	2.799 ns	1.00

IndexOfAnyInRange	main	15	2.928 ns	1.00
IndexOfAnyInRange	pr	15	2.361 ns	0.81

IndexOf	main	16	2.402 ns	1.00
IndexOf	pr	16	2.273 ns	0.95

IndexOfAny2Values	main	16	2.830 ns	1.00
IndexOfAny2Values	pr	16	2.791 ns	0.99

IndexOfAnyInRange	main	16	2.871 ns	1.00
IndexOfAnyInRange	pr	16	2.378 ns	0.83

IndexOf	main	17	2.740 ns	1.00
IndexOf	pr	17	2.574 ns	0.94

IndexOfAny2Values	main	17	3.376 ns	1.00
IndexOfAny2Values	pr	17	2.967 ns	0.88

IndexOfAnyInRange	main	17	3.040 ns	1.00
IndexOfAnyInRange	pr	17	2.312 ns	0.76

IndexOf	main	32	2.826 ns	1.00
IndexOf	pr	32	2.558 ns	0.91

IndexOfAny2Values	main	32	3.293 ns	1.00
IndexOfAny2Values	pr	32	2.971 ns	0.90

IndexOfAnyInRange	main	32	2.903 ns	1.00
IndexOfAnyInRange	pr	32	2.312 ns	0.80

IndexOf	main	1000	39.015 ns	1.00
IndexOf	pr	1000	23.991 ns	0.61

IndexOfAny2Values	main	1000	45.199 ns	1.00
IndexOfAny2Values	pr	1000	25.563 ns	0.57

IndexOfAnyInRange	main	1000	55.958 ns	1.00
IndexOfAnyInRange	pr	1000	23.020 ns	0.41

IndexOf	main	100000	4,875.359 ns	1.00
IndexOf	pr	100000	2,190.711 ns	0.45

IndexOfAny2Values	main	100000	5,709.625 ns	1.00
IndexOfAny2Values	pr	100000	2,980.139 ns	0.52

IndexOfAnyInRange	main	100000	5,133.861 ns	1.00
IndexOfAnyInRange	pr	100000	2,987.320 ns	0.58

Method	Toolchain	Length	Mean	Error	Ratio
IndexOfIgnoreCase	main	1	5.807 ns	0.0755 ns	1.00
IndexOfIgnoreCase	pr	1	5.766 ns	0.0167 ns	0.99

IndexOfIgnoreCase	main	32	8.565 ns	0.0465 ns	1.00
IndexOfIgnoreCase	pr	32	8.471 ns	0.0267 ns	0.99

IndexOfIgnoreCase	main	1000	51.199 ns	0.1238 ns	1.00
IndexOfIgnoreCase	pr	1000	34.857 ns	0.3394 ns	0.68

IndexOfIgnoreCase	main	100000	5,734.450 ns	22.3808 ns	1.00
IndexOfIgnoreCase	pr	100000	2,979.708 ns	12.2477 ns	0.52

Author:	MihaZupan
Assignees:	MihaZupan
Labels:	`area-System.Memory`, `tenet-performance`
Milestone:	8.0.0

MihaZupan · 2022-11-26T01:12:45Z

cc: @EgorBo @stephentoub

MihaZupan · 2022-12-08T18:30:17Z

Any thoughts on this approach @dotnet/area-system-memory?

dakersnar · 2022-12-09T23:22:15Z

@MihaZupan Sorry for the delay, I'll take a look at this early next week.

dakersnar · 2022-12-13T22:07:02Z

@MihaZupan I'm new to this area and I think I'm missing some context to properly review this.

Can you give me a high-level summary of the intentions behind the changes in each file?

MihaZupan · 2022-12-13T23:20:51Z

Sure. The main idea behind this change is the observation that the char values we commonly search for are ASCII, in which case half of their UTF16 representation will always be 0. When doing vectorized searches, this means we're mostly ignoring half of the input and half of the result of each comparison.
If we instead pack the input (narrow with saturation) before the comparison, we can process twice as many characters in each loop iteration. Such optimization is only possible for values that aren't ambiguous after saturation ([0, 254]).

The core change is the introduction of the new SpanHelpers.Char.Packed.cs file that contains the PackedIndexOf workhorse implementation which mimics the existing IndexOf helpers, but uses the approach of packing the input. The file contains:
- An internal CanUsePackedIndexOf helper method that determines whether the algorithm can be used for a given value.
- The search methods themselves - PackedIndexOf, PackedIndexOfAny, etc.
The changes in SpanHelpers.T.cs are hooking into the existing IndexOf codepaths to delegate to the PackedIndexOf implementation if it's supported for the given value. If not, they fallback to the existing ("NonPacked") implementation.
Changes to Globalization/Ordinal.cs and String.Searching.cs are updating the callers where we know the value to be ASCII to take advantage of the new packed implementation directly, without incurring the cost of checking whether the value is ASCII again.
The CanUsePackedIndexOf helper I mentioned is intended to be "free" if the value is constant (common case). If the value isn't constant, the span.IndexOf path now incurs an additional check before calling into the appropriate implementation. Updates to files in /System/IndexOfAnyValues/* are avoiding this overhead for IndexOfAnyValues<char> implementations. They're not really the interesting part of the change.

dakersnar · 2022-12-14T21:00:32Z

If we instead pack the input (narrow with saturation) before the comparison, we can process twice as many characters in each loop iteration. Such optimization is only possible for values that aren't ambiguous after saturation ([0, 254]).

To confirm, this is because any value that needs the full char to be represented will saturate to 255 when narrowed to a byte, correct?

MihaZupan · 2022-12-14T21:08:42Z

To confirm, this is because any value that needs the full char to be represented will saturate to 255 when narrowed to a byte, correct?

That's right.
In reality, the ranges are [0, 254] for ARM and [1, 254] for X86 because X86 only has signed pack instructions, and values will saturate to both 0 and 255.

dakersnar

LGTM. Left a few clarifying questions.

For testing, I assume all these paths already had sufficient coverage, right?

dakersnar · 2022-12-13T21:50:58Z

src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs

@@ -558,7 +558,7 @@ public static ReadOnlyMemory<char> AsMemory(this string? text, Range range)
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static unsafe int IndexOfAnyExcept<T>(this ReadOnlySpan<T> span, T value) where T : IEquatable<T>?
        {
-            if (SpanHelpers.CanVectorizeAndBenefit<T>(span.Length))
+            if (RuntimeHelpers.IsBitwiseEquatable<T>())


Can you explain this update?

@adamsitnik I see you added this in #73768, can you please clarify what the intent was?
As far as I can tell, this is just adding a redundant length check given that all the SpanHelpers implementations we're calling into also do the length check and have code for handling short inputs. Am I missing something (I didn't see any discussion about this on your PR)?

@MihaZupan, @adamsitnik, was this ever answered?

I don't believe so. Is the changed version causing issues somewhere?

No, but it sounded like there might be extra work happening unnecessarily.

I think that was the case before this change (we would inline a length check and 2 calls to the worker methods), now it should just be a call to the 1 worker method.

I see, I misunderstood the comment

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Char.Packed.cs

MihaZupan · 2022-12-16T00:47:50Z

For testing, I assume all these paths already had sufficient coverage, right?

Yes. I'll double-check that we don't accidentally end up losing substantial coverage of the existing (NonPacked) code paths if we're mostly testing with ASCII values.

In this POC PR, I added implementations that do such packing for IndexOf(char), IndexOfAny(char, char), and IndexOfAnyInRange(char, char)
If we're happy with the direction, I can add implementations for Contains(char), IndexOfAny(char, char, char), and their Last- counterparts as well if we feel they are useful.

I'll add the Contains(char) and IndexOfAny(char, char, char) to this PR (it's just more of the same idea).
Not sure how much we care about the Last- variants.

MihaZupan · 2022-12-16T23:44:11Z

Added Contains(char) and IndexOfAny(char, char, char) now.
I checked and we still have full test coverage of the existing and new methods.

Updated perf numbers

Method	Toolchain	Length	Mean	Error	Ratio
Contains	main	1	2.228 ns	0.0040 ns	1.00
Contains	pr	1	2.216 ns	0.0127 ns	0.99

IndexOf	main	1	2.065 ns	0.0041 ns	1.00
IndexOf	pr	1	1.753 ns	0.0043 ns	0.85

IndexOfAny2Values	main	1	2.902 ns	0.0073 ns	1.00
IndexOfAny2Values	pr	1	2.537 ns	0.0030 ns	0.87

IndexOfAny3Values	main	1	2.975 ns	0.0015 ns	1.00
IndexOfAny3Values	pr	1	3.094 ns	0.0041 ns	1.04

IndexOfAnyInRange	main	1	2.196 ns	0.0027 ns	1.00
IndexOfAnyInRange	pr	1	1.996 ns	0.0039 ns	0.91

Contains	main	7	3.435 ns	0.0033 ns	1.00
Contains	pr	7	3.465 ns	0.0020 ns	1.01

IndexOf	main	7	3.726 ns	0.0034 ns	1.00
IndexOf	pr	7	3.755 ns	0.0017 ns	1.01

IndexOfAny2Values	main	7	5.915 ns	0.0023 ns	1.00
IndexOfAny2Values	pr	7	5.960 ns	0.0092 ns	1.01

IndexOfAny3Values	main	7	8.116 ns	0.0047 ns	1.00
IndexOfAny3Values	pr	7	7.975 ns	0.0036 ns	0.98

IndexOfAnyInRange	main	7	5.145 ns	0.0038 ns	1.00
IndexOfAnyInRange	pr	7	3.678 ns	0.0031 ns	0.71

Contains	main	8	2.089 ns	0.0067 ns	1.00
Contains	pr	8	2.293 ns	0.0019 ns	1.10

IndexOf	main	8	2.693 ns	0.0040 ns	1.00
IndexOf	pr	8	2.312 ns	0.0019 ns	0.86

IndexOfAny2Values	main	8	2.956 ns	0.0012 ns	1.00
IndexOfAny2Values	pr	8	2.886 ns	0.0042 ns	0.98

IndexOfAny3Values	main	8	3.239 ns	0.0022 ns	1.00
IndexOfAny3Values	pr	8	3.436 ns	0.0022 ns	1.06

IndexOfAnyInRange	main	8	2.887 ns	0.0021 ns	1.00
IndexOfAnyInRange	pr	8	2.379 ns	0.0115 ns	0.82

Contains	main	9	2.193 ns	0.0058 ns	1.00
Contains	pr	9	2.296 ns	0.0015 ns	1.05

IndexOf	main	9	2.914 ns	0.0038 ns	1.00
IndexOf	pr	9	2.301 ns	0.0011 ns	0.79

IndexOfAny2Values	main	9	3.438 ns	0.0038 ns	1.00
IndexOfAny2Values	pr	9	2.882 ns	0.0045 ns	0.84

IndexOfAny3Values	main	9	3.727 ns	0.0032 ns	1.00
IndexOfAny3Values	pr	9	3.431 ns	0.0014 ns	0.92

IndexOfAnyInRange	main	9	2.891 ns	0.0019 ns	1.00
IndexOfAnyInRange	pr	9	2.395 ns	0.0112 ns	0.83

Contains	main	15	2.194 ns	0.0055 ns	1.00
Contains	pr	15	2.286 ns	0.0016 ns	1.04

IndexOf	main	15	2.907 ns	0.0042 ns	1.00
IndexOf	pr	15	2.313 ns	0.0037 ns	0.80

IndexOfAny2Values	main	15	3.450 ns	0.0025 ns	1.00
IndexOfAny2Values	pr	15	2.895 ns	0.0065 ns	0.84

IndexOfAny3Values	main	15	3.723 ns	0.0015 ns	1.00
IndexOfAny3Values	pr	15	3.428 ns	0.0021 ns	0.92

IndexOfAnyInRange	main	15	2.886 ns	0.0024 ns	1.00
IndexOfAnyInRange	pr	15	2.334 ns	0.0054 ns	0.81

Contains	main	16	2.194 ns	0.0063 ns	1.00
Contains	pr	16	2.298 ns	0.0020 ns	1.05

IndexOf	main	16	2.497 ns	0.0103 ns	1.00
IndexOf	pr	16	2.336 ns	0.0058 ns	0.94

IndexOfAny2Values	main	16	3.010 ns	0.0021 ns	1.00
IndexOfAny2Values	pr	16	2.862 ns	0.0016 ns	0.95

IndexOfAny3Values	main	16	3.333 ns	0.0024 ns	1.00
IndexOfAny3Values	pr	16	3.428 ns	0.0024 ns	1.03

IndexOfAnyInRange	main	16	2.863 ns	0.0035 ns	1.00
IndexOfAnyInRange	pr	16	2.376 ns	0.0111 ns	0.83

Contains	main	17	2.391 ns	0.0018 ns	1.00
Contains	pr	17	2.422 ns	0.0024 ns	1.01

IndexOf	main	17	2.688 ns	0.0030 ns	1.00
IndexOf	pr	17	2.564 ns	0.0032 ns	0.95

IndexOfAny2Values	main	17	3.439 ns	0.0019 ns	1.00
IndexOfAny2Values	pr	17	2.943 ns	0.0023 ns	0.86

IndexOfAny3Values	main	17	3.840 ns	0.0071 ns	1.00
IndexOfAny3Values	pr	17	3.521 ns	0.0017 ns	0.92

IndexOfAnyInRange	main	17	2.875 ns	0.0037 ns	1.00
IndexOfAnyInRange	pr	17	2.547 ns	0.0028 ns	0.89

Contains	main	32	2.834 ns	0.0238 ns	1.00
Contains	pr	32	2.419 ns	0.0011 ns	0.85

IndexOf	main	32	2.890 ns	0.0035 ns	1.00
IndexOf	pr	32	2.575 ns	0.0040 ns	0.89

IndexOfAny2Values	main	32	3.435 ns	0.0020 ns	1.00
IndexOfAny2Values	pr	32	2.943 ns	0.0023 ns	0.86

IndexOfAny3Values	main	32	4.146 ns	0.0029 ns	1.00
IndexOfAny3Values	pr	32	3.488 ns	0.0021 ns	0.84

IndexOfAnyInRange	main	32	2.865 ns	0.0046 ns	1.00
IndexOfAnyInRange	pr	32	2.548 ns	0.0027 ns	0.89

Contains	main	33	2.903 ns	0.0102 ns	1.00
Contains	pr	33	2.813 ns	0.0022 ns	0.97

IndexOf	main	33	3.103 ns	0.0035 ns	1.00
IndexOf	pr	33	2.747 ns	0.0030 ns	0.89

IndexOfAny2Values	main	33	3.871 ns	0.0022 ns	1.00
IndexOfAny2Values	pr	33	3.283 ns	0.0029 ns	0.85

IndexOfAny3Values	main	33	4.738 ns	0.0033 ns	1.00
IndexOfAny3Values	pr	33	4.349 ns	0.0020 ns	0.92

IndexOfAnyInRange	main	33	3.777 ns	0.0169 ns	1.00
IndexOfAnyInRange	pr	33	3.018 ns	0.0029 ns	0.80

Contains	main	1000	32.797 ns	0.0229 ns	1.00
Contains	pr	1000	17.534 ns	0.0275 ns	0.53

IndexOf	main	1000	39.024 ns	0.0396 ns	1.00
IndexOf	pr	1000	23.966 ns	0.0109 ns	0.61

IndexOfAny2Values	main	1000	44.677 ns	0.0203 ns	1.00
IndexOfAny2Values	pr	1000	23.087 ns	0.2666 ns	0.52

IndexOfAny3Values	main	1000	57.412 ns	0.1221 ns	1.00
IndexOfAny3Values	pr	1000	31.902 ns	0.2688 ns	0.56

IndexOfAnyInRange	main	1000	56.802 ns	0.0273 ns	1.00
IndexOfAnyInRange	pr	1000	26.263 ns	0.1187 ns	0.46

Contains	main	100000	3,686.442 ns	3.9490 ns	1.00
Contains	pr	100000	1,923.226 ns	3.0266 ns	0.52

IndexOf	main	100000	3,827.157 ns	1.4482 ns	1.00
IndexOf	pr	100000	1,918.985 ns	1.6545 ns	0.50

IndexOfAny2Values	main	100000	4,436.271 ns	2.3896 ns	1.00
IndexOfAny2Values	pr	100000	2,559.311 ns	2.1281 ns	0.58

IndexOfAny3Values	main	100000	5,416.439 ns	2.5865 ns	1.00
IndexOfAny3Values	pr	100000	3,144.892 ns	2.2510 ns	0.58

IndexOfAnyInRange	main	100000	5,415.066 ns	2.0259 ns	1.00
IndexOfAnyInRange	pr	100000	2,570.033 ns	2.2194 ns	0.47

dakersnar

LGTM

src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/IndexOfAny2CharValues.cs

MihaZupan · 2022-12-29T10:21:36Z

/azp run runtime-libraries-coreclr outerloop

azure-pipelines · 2022-12-29T10:21:47Z

Azure Pipelines successfully started running 1 pipeline(s).

MihaZupan · 2022-12-29T18:10:26Z

It appears that the packing is noticeably more expensive on ARM in comparison.
While the approach can improve throughput somewhat, the regression for cases where matches are close to the start seems unacceptable. I will update the logic to only apply when running on X86.

ARM64 benchmarks (no match)

Method	Length	Mean	Error
IndexOf	8	3.173 ns	0.0074 ns
IndexOfAny2Values	8	4.455 ns	0.0013 ns
IndexOfAny3Values	8	3.814 ns	0.0048 ns
PackedIndexOf	8	3.343 ns	0.0237 ns
PackedIndexOfAny2Values	8	3.808 ns	0.0104 ns
PackedIndexOfAny3Values	8	4.283 ns	0.0029 ns

IndexOf	9	3.741 ns	0.0144 ns
IndexOfAny2Values	9	5.672 ns	0.0098 ns
IndexOfAny3Values	9	5.095 ns	0.0245 ns
PackedIndexOf	9	3.622 ns	0.0072 ns
PackedIndexOfAny2Values	9	4.226 ns	0.0087 ns
PackedIndexOfAny3Values	9	4.871 ns	0.0148 ns

IndexOf	16	4.141 ns	0.0007 ns
IndexOfAny2Values	16	5.607 ns	0.0002 ns
IndexOfAny3Values	16	5.344 ns	0.0071 ns
PackedIndexOf	16	3.366 ns	0.0263 ns
PackedIndexOfAny2Values	16	3.810 ns	0.0142 ns
PackedIndexOfAny3Values	16	4.285 ns	0.0016 ns

IndexOf	32	5.474 ns	0.0002 ns
IndexOfAny2Values	32	6.940 ns	0.0044 ns
IndexOfAny3Values	32	7.912 ns	0.0124 ns
PackedIndexOf	32	5.002 ns	0.0205 ns
PackedIndexOfAny2Values	32	5.233 ns	0.0050 ns
PackedIndexOfAny3Values	32	6.005 ns	0.0073 ns

IndexOf	128	15.234 ns	0.0050 ns
IndexOfAny2Values	128	18.834 ns	0.0183 ns
IndexOfAny3Values	128	26.844 ns	0.1520 ns
PackedIndexOf	128	14.835 ns	0.0172 ns
PackedIndexOfAny2Values	128	17.159 ns	0.2145 ns
PackedIndexOfAny3Values	128	18.359 ns	0.0738 ns

IndexOf	512	55.640 ns	0.0081 ns
IndexOfAny2Values	512	74.750 ns	0.0188 ns
IndexOfAny3Values	512	93.405 ns	0.0337 ns
PackedIndexOf	512	47.795 ns	0.0702 ns
PackedIndexOfAny2Values	512	65.908 ns	0.2881 ns
PackedIndexOfAny3Values	512	79.024 ns	0.0501 ns

ARM64 benchmarks (the first character matches)

Method	Length	Mean	Error
IndexOf	8	3.952 ns	0.0183 ns
IndexOfAny2Values	8	4.110 ns	0.0237 ns
IndexOfAny3Values	8	4.545 ns	0.0016 ns
PackedIndexOf	8	5.972 ns	0.0140 ns
PackedIndexOfAny2Values	8	6.750 ns	0.0031 ns
PackedIndexOfAny3Values	8	7.356 ns	0.0046 ns

IndexOf	9	3.638 ns	0.0167 ns
IndexOfAny2Values	9	4.476 ns	0.0256 ns
IndexOfAny3Values	9	4.549 ns	0.0020 ns
PackedIndexOf	9	6.668 ns	0.0079 ns
PackedIndexOfAny2Values	9	7.457 ns	0.0016 ns
PackedIndexOfAny3Values	9	8.082 ns	0.0031 ns

IndexOf	16	3.613 ns	0.0304 ns
IndexOfAny2Values	16	4.134 ns	0.0233 ns
IndexOfAny3Values	16	4.546 ns	0.0018 ns
PackedIndexOf	16	5.981 ns	0.0103 ns
PackedIndexOfAny2Values	16	6.758 ns	0.0055 ns
PackedIndexOfAny3Values	16	7.364 ns	0.0026 ns

IndexOf	32	3.646 ns	0.0420 ns
IndexOfAny2Values	32	4.110 ns	0.0236 ns
IndexOfAny3Values	32	4.546 ns	0.0018 ns
PackedIndexOf	32	5.233 ns	0.0221 ns
PackedIndexOfAny2Values	32	5.988 ns	0.0044 ns
PackedIndexOfAny3Values	32	6.673 ns	0.0025 ns

IndexOf	128	3.638 ns	0.0275 ns
IndexOfAny2Values	128	4.110 ns	0.0284 ns
IndexOfAny3Values	128	5.053 ns	0.0077 ns
PackedIndexOf	128	5.215 ns	0.0140 ns
PackedIndexOfAny2Values	128	5.981 ns	0.0080 ns
PackedIndexOfAny3Values	128	6.652 ns	0.0018 ns

MihaZupan · 2022-12-29T18:36:32Z

@tannergooding any concerns about using this sort of approach only on X86, given that it's not profitable on ARM?

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Packed.cs

MihaZupan · 2023-01-03T09:28:18Z

All failures are known according to build-analysis

EgorBo · 2023-01-05T16:38:48Z

Improvements on arm64:

[Perf] Linux/arm64: 3 Improvements on 1/3/2023 9:28:36 AM perf-autofiling-issues#11348
[Perf] Windows/arm64: 4 Improvements on 1/3/2023 9:28:36 AM perf-autofiling-issues#11343

Improvements on x64:

[Perf] Linux/x64: 43 Improvements on 1/3/2023 9:28:36 AM perf-autofiling-issues#11438

This should avoids the size regression on WebAssembly and possibly other platforms without Sse2. The regression is side effect of dotnet#78861 which uses `PackedSpanHelpers.CanUsePackedIndexOf (!!T)` and TShouldUsePacked.Value to guard the usage of PackedSpanHelpers. Because these involve generics, illinker is unable to link the PackedSpanHelpers type away and that pulls other parts in, like System.Runtime.Intrinsics.X86.* types. See https://gist.github.com/radekdoulik/c0b52247d472f69bcf983ade78a924ea for more complete list. This change gets us back 9,216 bytes in the case of app used to repro the regression. ... - Type System.PackedSpanHelpers - Type System.Runtime.Intrinsics.X86.X86Base - Type System.Runtime.Intrinsics.X86.Sse - Type System.Runtime.Intrinsics.X86.Sse2 Summary: - 9,216 File size -0.76% (of 1,215,488) - 2,744 Metadata size -0.43% (of 636,264) - 4 Types count

* Use PackedIndexOfIsSupported checks in more places This should avoids the size regression on WebAssembly and possibly other platforms without Sse2. The regression is side effect of #78861 which uses `PackedSpanHelpers.CanUsePackedIndexOf (!!T)` and TShouldUsePacked.Value to guard the usage of PackedSpanHelpers. Because these involve generics, illinker is unable to link the PackedSpanHelpers type away and that pulls other parts in, like System.Runtime.Intrinsics.X86.* types. See https://gist.github.com/radekdoulik/c0b52247d472f69bcf983ade78a924ea for more complete list. This change gets us back 9,216 bytes in the case of app used to repro the regression. ... - Type System.PackedSpanHelpers - Type System.Runtime.Intrinsics.X86.X86Base - Type System.Runtime.Intrinsics.X86.Sse - Type System.Runtime.Intrinsics.X86.Sse2 Summary: - 9,216 File size -0.76% (of 1,215,488) - 2,744 Metadata size -0.43% (of 636,264) - 4 Types count * Update src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/IndexOfAnyValues.cs Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com> * Update src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/IndexOfAnyValues.cs Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com> * Feedback Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>

MihaZupan added area-System.Memory tenet-performance Performance related issue labels Nov 26, 2022

MihaZupan added this to the 8.0.0 milestone Nov 26, 2022

MihaZupan self-assigned this Nov 26, 2022

MihaZupan mentioned this pull request Nov 26, 2022

Add AVX2 support to IndexOfAnyValues #78863

Merged

MihaZupan force-pushed the packed-indexof-char branch from 1318d5b to 34ee7e0 Compare November 30, 2022 05:50

MihaZupan mentioned this pull request Dec 8, 2022

Use IndexOfAnyValues in the RegexCompiler and source gen #78927

Merged

dakersnar approved these changes Dec 15, 2022

View reviewed changes

MihaZupan force-pushed the packed-indexof-char branch from 34ee7e0 to 15afc4c Compare December 16, 2022 23:37

This was referenced Dec 17, 2022

Precondition failure: File has not had execution verified #79439

Closed

[wasm] Library tests failing during linking for AOT - SIGKILL #79569

Closed

MihaZupan mentioned this pull request Dec 19, 2022

Remove mono specific SpanHelpers #79215

Merged

dakersnar approved these changes Dec 28, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IndexOfAnyValues/IndexOfAny2CharValues.cs Show resolved Hide resolved

MihaZupan added 2 commits December 29, 2022 15:09

Add PackedIndexOf for chars

b2d5f0f

Add Contains and IndexOfValue(3 chars)

178b1f8

Stop using PackedIndexOf on ARM

a59cff5

MihaZupan force-pushed the packed-indexof-char branch from 15afc4c to a59cff5 Compare December 29, 2022 18:34

danmoseley reviewed Dec 29, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Packed.cs Outdated Show resolved Hide resolved

build-analysis bot mentioned this pull request Dec 30, 2022

emcc received SIGKILL #79874

Closed

MihaZupan commented Jan 1, 2023

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Packed.cs Outdated Show resolved Hide resolved

Improve code comment

72ca079

build-analysis bot mentioned this pull request Jan 1, 2023

Tracking issue for CI build timeouts #76454

Closed

MihaZupan merged commit ac2ffdf into dotnet:main Jan 3, 2023

radekdoulik mentioned this pull request Jan 5, 2023

Use PackedIndexOfIsSupported checks in more places #80254

Merged

This was referenced Jan 10, 2023

Regressions around IndexOf #80441

Closed

[Perf] Alpine/x64: 4 Regressions on 1/3/2023 9:28:36 AM dotnet/perf-autofiling-issues#11415

Closed

[Perf] Windows/x64: 1 Regression on 1/3/2023 5:57:54 PM dotnet/perf-autofiling-issues#11444

Closed

MihaZupan mentioned this pull request Jan 18, 2023

Fix recent IndexOf regressions #80779

Merged

ghost locked as resolved and limited conversation to collaborators Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Double IndexOf throughput for chars #78861

Double IndexOf throughput for chars #78861

MihaZupan commented Nov 26, 2022 •

edited

Loading

ghost commented Nov 26, 2022

MihaZupan commented Nov 26, 2022

MihaZupan commented Dec 8, 2022

dakersnar commented Dec 9, 2022

dakersnar commented Dec 13, 2022

MihaZupan commented Dec 13, 2022

dakersnar commented Dec 14, 2022

MihaZupan commented Dec 14, 2022

dakersnar left a comment

dakersnar Dec 13, 2022

MihaZupan Dec 16, 2022 •

edited

Loading

stephentoub Jul 31, 2023

MihaZupan Jul 31, 2023

stephentoub Jul 31, 2023

MihaZupan Jul 31, 2023

stephentoub Jul 31, 2023

MihaZupan commented Dec 16, 2022

MihaZupan commented Dec 16, 2022 •

edited

Loading

dakersnar left a comment

MihaZupan commented Dec 29, 2022

azure-pipelines bot commented Dec 29, 2022

MihaZupan commented Dec 29, 2022

MihaZupan commented Dec 29, 2022

MihaZupan commented Jan 3, 2023

EgorBo commented Jan 5, 2023 •

edited

Loading

Double IndexOf throughput for chars #78861

Double IndexOf throughput for chars #78861

Conversation

MihaZupan commented Nov 26, 2022 • edited Loading

ghost commented Nov 26, 2022

MihaZupan commented Nov 26, 2022

MihaZupan commented Dec 8, 2022

dakersnar commented Dec 9, 2022

dakersnar commented Dec 13, 2022

MihaZupan commented Dec 13, 2022

dakersnar commented Dec 14, 2022

MihaZupan commented Dec 14, 2022

dakersnar left a comment

Choose a reason for hiding this comment

dakersnar Dec 13, 2022

Choose a reason for hiding this comment

MihaZupan Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

stephentoub Jul 31, 2023

Choose a reason for hiding this comment

MihaZupan Jul 31, 2023

Choose a reason for hiding this comment

stephentoub Jul 31, 2023

Choose a reason for hiding this comment

MihaZupan Jul 31, 2023

Choose a reason for hiding this comment

stephentoub Jul 31, 2023

Choose a reason for hiding this comment

MihaZupan commented Dec 16, 2022

MihaZupan commented Dec 16, 2022 • edited Loading

dakersnar left a comment

Choose a reason for hiding this comment

MihaZupan commented Dec 29, 2022

azure-pipelines bot commented Dec 29, 2022

MihaZupan commented Dec 29, 2022

MihaZupan commented Dec 29, 2022

MihaZupan commented Jan 3, 2023

EgorBo commented Jan 5, 2023 • edited Loading

MihaZupan commented Nov 26, 2022 •

edited

Loading

MihaZupan Dec 16, 2022 •

edited

Loading

MihaZupan commented Dec 16, 2022 •

edited

Loading

EgorBo commented Jan 5, 2023 •

edited

Loading