Use `StoreAligned` not `Store` in WidenAsciiToUtf16 #89892

Ruihan-Yin · 2023-08-02T22:53:14Z

Description

This PR is to improve the in-loop write logic in WidenAsciiToUtf16, the major change is to replace StoreAligned with Store inside the loop to reduce the penalty caused by split loads.

We are open to adjusting the implementation style for either path. Perf number attached in the comments.

ghost · 2023-08-02T22:53:25Z

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

This PR is to improve the in-loop write logic in WidenAsciiToUtf16, the major change is to replace StoreAligned with Store inside the loop to reduce the penalty caused by split loads.

We are open to adjusting the implementation style for either path. Perf number attached in the comments.

Author:	Ruihan-Yin
Assignees:	-
Labels:	`area-System.Text.Encoding`, `community-contribution`
Milestone:	-

Ruihan-Yin · 2023-08-02T22:55:10Z

Perf numbers

Base: main (531ad95)
Diff: main + changes on WidenAsciiToUtf16

Avx512

summary:
better: 5, geomean: 1.099
worse: 1, geomean: 1.022
total diff: 6

Slower	diff/base	Base Median (ns)	Diff Median (ns)	Modality
System.Text.Tests.Perf_Encoding.GetChars(size: 512, encName: "utf-8")	1.02	85.25	87.11

Faster	base/diff	Base Median (ns)	Diff Median (ns)
System.Text.Tests.Perf_Encoding.GetChars(size: 1024, encName: "utf-8")	1.18	176.81	150.47
System.Text.Tests.Perf_Encoding.GetChars(size: 1024, encName: "ascii")	1.14	152.16	133.17
System.Text.Tests.Perf_Encoding.GetChars(size: 16, encName: "ascii")	1.09	17.97	16.54
System.Text.Tests.Perf_Encoding.GetChars(size: 512, encName: "ascii")	1.06	79.47	74.72
System.Text.Tests.Perf_Encoding.GetChars(size: 16, encName: "utf-8")	1.03	25.79	24.94

AVX

summary:
better: 3, geomean: 1.049
total diff: 3

No Slower results for the provided threshold = 1% and noise filter = 0.5 ns.

Faster	base/diff	Base Median (ns)	Diff Median (ns)
System.Text.Tests.Perf_Encoding.GetChars(size: 512, encName: "ascii")	1.07	81.68	76.09
System.Text.Tests.Perf_Encoding.GetChars(size: 1024, encName: "utf-8")	1.05	174.54	165.92
System.Text.Tests.Perf_Encoding.GetChars(size: 1024, encName: "ascii")	1.02	148.66	145.48

SSE

summary:
better: 5, geomean: 1.065
total diff: 5

No Slower results for the provided threshold = 1% and noise filter = 0.5 ns.

Faster	base/diff	Base Median (ns)	Diff Median (ns)
System.Text.Tests.Perf_Encoding.GetChars(size: 512, encName: "ascii")	1.11	108.07	97.21
System.Text.Tests.Perf_Encoding.GetChars(size: 16, encName: "ascii")	1.10	17.74	16.14
System.Text.Tests.Perf_Encoding.GetChars(size: 1024, encName: "ascii")	1.05	183.11	173.62
System.Text.Tests.Perf_Encoding.GetChars(size: 1024, encName: "utf-8")	1.04	219.66	211.76
System.Text.Tests.Perf_Encoding.GetChars(size: 512, encName: "utf-8")	1.02	119.36	116.59

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs

neon-sunset · 2023-08-03T22:29:54Z

How does this change affect *-arm64 targets? The PR introduces unconditional .ExtractMostSignificantBits() which is expected to regress its performance.

UPD: @Ruihan-Yin thank you

…fect on Arm64

Ruihan-Yin · 2023-08-04T00:05:34Z

How does this change affect *-arm64 targets? The PR introduces unconditional .ExtractMostSignificantBits() which is expected to regress its performance.

Thanks for pointing out. That was a mistake, changed to VectorContainsNonAsciiChar to make sure Arm64 won't be affected by the use of ExtractMSB.

Ruihan-Yin · 2023-08-09T16:52:37Z

Hi @xtqqczze @neon-sunset, kindly asking if there is any further change needed?

xtqqczze · 2023-08-09T20:43:18Z

Perhaps we shouldn't assume the argument pUtf16Buffer is naturally aligned (i.e. 2 byte), as System.Text.Encoding.GetChars is a public API.

I'm not sure how to handle this, but System.SpanHelpers.IndexOfNullCharacter(System.Char*) does the following check:

runtime/src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Char.cs

Lines 539 to 542 in 9a5fa74

    
           if (((int)searchSpace & 1) != 0) 
        
           { 
        
               // Input isn't char aligned, we won't be able to align it to a Vector 
        
           }

MichalPetryka · 2023-08-10T16:43:27Z

Perhaps we shouldn't assume the argument pUtf16Buffer is naturally aligned (i.e. 2 byte), as System.Text.Encoding.GetChars is a public API.

AFAIR the guideline is that if the platform the code is running on works with unaligned memory, dotnet APIs should work too.

xtqqczze · 2023-08-10T17:32:03Z

AFAIR the guideline is that if the platform the code is running on works with unaligned memory, dotnet APIs should work too.

@MichalPetryka We have existing code that does not account for this, e.g. WidenLatin1ToUtf16, see #90319.

anthonycanino · 2023-08-10T22:40:27Z

Perhaps we shouldn't assume the argument pUtf16Buffer is naturally aligned (i.e. 2 byte), as System.Text.Encoding.GetChars is a public API.

I'm not sure how to handle this, but System.SpanHelpers.IndexOfNullCharacter(System.Char*) does the following check:

runtime/src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Char.cs

Lines 539 to 542 in 9a5fa74

if (((int)searchSpace & 1) != 0)

{

// Input isn't char aligned, we won't be able to align it to a Vector

}

@tannergooding based on some of the conversation we had, I am not sure how to proceed.

Perhaps we want to implement a fallback method that does not pin, and does not explicitly use StoreAligned and instead uses Store but skips the initial alignment if the char pointer is not naturally aligned?

xtqqczze · 2023-08-13T01:22:13Z

Perhaps we shouldn't assume the argument pUtf16Buffer is naturally aligned (i.e. 2 byte)

See also comments at #90319.

tannergooding · 2023-08-14T16:35:28Z

Perhaps we want to implement a fallback method that does not pin, and does not explicitly use StoreAligned and instead uses Store but skips the initial alignment if the char pointer is not naturally aligned?

For right now we should keep the pin and try to align, but if its unalignable then we should just continue as-is. It's basically the same code just with a check for "is this alignable at all" and using Store rather than StoreAligned (this is ultimately the same codegen and perf on modern hardware since the underlying data will actually be aligned in most cases).

For the future, we probably want to come to an agreement about how to universally handle this. My guess/vote is that's probably going to involve not pinning and using StoreUnsafe with optimistic alignment of the underlying data (basically optimistically presuming it won't be moved by the GC, which will be the common case but still using ref so that if it is moved, everything still works as expected). There's ultimately a balance between writing safe/readable code and performant code, so we need to find the right spot to land. Ideally we'd just be using Vector128.Create(ROSpan<T>) and the JIT would elide the bounds checks, so we don't have any unsafeness; but that's not possible today.

eiriktsarpalis · 2023-10-27T15:14:27Z

Just checking up on the status of this PR, @anthonycanino have you had the chance to address @tannergooding's feedback?

anthonycanino · 2023-10-31T17:40:11Z

Just checking up on the status of this PR, @anthonycanino have you had the chance to address @tannergooding's feedback?

@tannergooding how do we feel about this change now that we are moving to ISimdVector? Is it better to address the alignment with that change in one PR?

tannergooding · 2023-11-01T15:23:10Z

I don't think it's changed from my last feedback.

We really need to account for unalignable data and the easiest way to do that is to pin, check the alignment, align if possible, and then continue processing using Store which works regardless of whether the data is aligned or unaligned.

In general the code pattern for efficiently handling alignment efficiently, for idempotent data, looks something like https://source.dot.net/#System.Numerics.Tensors/System/Numerics/Tensors/TensorPrimitives.netcore.cs,2930

Using ISimdVector then lets you merge the 3 different vectorized code paths down to 1 shared code path. The amount of unrolling and other factors can depend on the exact algorithm, the number of inputs, etc. But this is the general basic shape that works well for both large and small inputs.

ghost · 2023-11-29T15:01:32Z

This pull request has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

ghost · 2023-12-13T18:01:49Z

This pull request will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the pull request, but please note that it will be locked if it remains inactive for another 30 days.

Use StoreAligned not Store in WidenAsciiToUtf16

51a1d2f

ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 2, 2023

dotnet-issue-labeler bot added the area-System.Text.Encoding label Aug 2, 2023

This was referenced Aug 3, 2023

Tracking issue for CI build timeouts #76454

Closed

Tests.System.TimeProviderTests.TestProviderTimer test failure #87477

Closed

Assert in System.Net.Sockets.SocketAsyncEventArgs.CompleteAcceptOperation #89806

Closed

Ruihan-Yin marked this pull request as ready for review August 3, 2023 16:23

xtqqczze reviewed Aug 3, 2023

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Text/Ascii.Utility.cs Outdated Show resolved Hide resolved

revert the changes on SizeOfVector*InChars

f17ff0c

Use platform specific method in V128 path to make sure no negative ef…

b73a5a8

…fect on Arm64

build-analysis bot mentioned this pull request Aug 4, 2023

Build race condition in Microsoft.NET.WebAssembly.Webcil.dll #89987

Closed

eiriktsarpalis added this to the 9.0.0 milestone Aug 14, 2023

Ruihan-Yin mentioned this pull request Oct 12, 2023

Light up Ascii.Utility methods with Vector512 code paths. #89280

Open

7 tasks

tannergooding added the needs-author-action An issue or pull request that requires more info or actions from the author. label Nov 3, 2023

BruceForstall mentioned this pull request Nov 8, 2023

Intel architecture improvements for .NET 9 #93196

Closed

33 tasks

eiriktsarpalis self-assigned this Nov 15, 2023

ghost added the no-recent-activity label Nov 29, 2023

ghost closed this Dec 13, 2023

github-actions bot locked and limited conversation to collaborators Jan 13, 2024

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `StoreAligned` not `Store` in WidenAsciiToUtf16 #89892

Use `StoreAligned` not `Store` in WidenAsciiToUtf16 #89892

Ruihan-Yin commented Aug 2, 2023

ghost commented Aug 2, 2023

Description

Ruihan-Yin commented Aug 2, 2023

neon-sunset commented Aug 3, 2023 •

edited

Loading

Ruihan-Yin commented Aug 4, 2023 •

edited

Loading

Ruihan-Yin commented Aug 9, 2023

xtqqczze commented Aug 9, 2023 •

edited

Loading

MichalPetryka commented Aug 10, 2023

xtqqczze commented Aug 10, 2023

anthonycanino commented Aug 10, 2023

xtqqczze commented Aug 13, 2023

tannergooding commented Aug 14, 2023 •

edited

Loading

eiriktsarpalis commented Oct 27, 2023

anthonycanino commented Oct 31, 2023

tannergooding commented Nov 1, 2023

ghost commented Nov 29, 2023

ghost commented Dec 13, 2023

Use StoreAligned not Store in WidenAsciiToUtf16 #89892

Use StoreAligned not Store in WidenAsciiToUtf16 #89892

Conversation

Ruihan-Yin commented Aug 2, 2023

Description

ghost commented Aug 2, 2023

Description

Ruihan-Yin commented Aug 2, 2023

Perf numbers

Avx512

AVX

SSE

neon-sunset commented Aug 3, 2023 • edited Loading

Ruihan-Yin commented Aug 4, 2023 • edited Loading

Ruihan-Yin commented Aug 9, 2023

xtqqczze commented Aug 9, 2023 • edited Loading

MichalPetryka commented Aug 10, 2023

xtqqczze commented Aug 10, 2023

anthonycanino commented Aug 10, 2023

xtqqczze commented Aug 13, 2023

tannergooding commented Aug 14, 2023 • edited Loading

eiriktsarpalis commented Oct 27, 2023

anthonycanino commented Oct 31, 2023

tannergooding commented Nov 1, 2023

ghost commented Nov 29, 2023

ghost commented Dec 13, 2023

Use `StoreAligned` not `Store` in WidenAsciiToUtf16 #89892

Use `StoreAligned` not `Store` in WidenAsciiToUtf16 #89892

neon-sunset commented Aug 3, 2023 •

edited

Loading

Ruihan-Yin commented Aug 4, 2023 •

edited

Loading

xtqqczze commented Aug 9, 2023 •

edited

Loading

tannergooding commented Aug 14, 2023 •

edited

Loading