Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Use reaching definitions in CSE to update conservative VNs #109959

Merged
merged 4 commits into from
Nov 22, 2024

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Nov 19, 2024

Now that CSE always inserts into SSA we can update it to make use of the reaching definition information that it has access to. CSE already spent effort to track some extra information to try to do this, which we can remove.

  • Remove optCSECheckedBoundMap: this was used by CSE to try to update conservative VNs of ancestor bounds checks. This is unnececssary since all descendants of the CSE definitions should get the same conservative VNs automatically now.
  • Remove CSEdsc::defConservNormVN; this was used to update conservative VNs in the case where all defs agree on the conservative VN, which again is unnecessary now

Making this change requires a bit of refactoring to the incremental SSA builder. Before this PR the builder takes all defs and all uses and then inserts everything into SSA. After this change the builder is used in a multi-step process as follows:

  1. All definitions are added with IncrementalSsaBuilder::InsertDef
  2. The definitions are finalized with IncrementalSsaBuilder::FinalizeDefs
  3. Uses are inserted (one by one) with IncrementalSsaBuilder::InsertUse. No finalization is necessary; each use is directly put into SSA as a result of calling this method.

The refactoring allows CSE to use the incremental SSA builder such that it can access reaching definition information for each of the uses as part of making replacements. However, this still requires some refactoring such that CSE performs replacements of all defs before performing the replacements of all uses.

Additionally, this PR fixes various incorrect VN updating made by CSE.

VN and CSE still track VNs that are interesting bounds check. However, VN was sometimes inserting VNs with exception sets into the set, which is not useful (the consumers always use normal VNs when querying the set). This PR fixes VN to insert the normal VN instead.

Fix #109745

Now that CSE always inserts into SSA we can update it to make use of the
reaching definition information that it has access to. CSE already spent
effort to track some extra information to try to do this, which we can
remove.

- Remove `optCSECheckedBoundMap`: this was used by CSE to try to update
  conservative VNs of ancestor bounds checks. This is unnececssary since
  all descendants of the CSE definitions should get the same
  conservative VNs automatically now.
- Remove `CSEdsc::defConservNormVN`; this was used to update
  conservative VNs in the single-def case, which again is unnecessary
  now

Making this change requires a bit of refactoring to the incremental SSA
builder. Before this PR the builder takes all defs and all uses and then
inserts everything into SSA. After this change the builder is used in a
multi-step process as follows:
1. All definitions are added with `IncrementalSsaBuilder::InsertDef`
2. The definitions are finalized with
   `IncrementalSsaBuilder::FinalizeDefs`
3. Uses are inserted (one by one) with
   `IncrementalSsaBuilder::InsertUse`. No finalization are necessary;
   each use is directly put into SSA as a result of calling this
   method.

The refactoring allows CSE to use the incremental SSA builder such that
it can access reaching definition information for each of the uses as
part of making replacements. However, this still requires some
refactoring such that CSE performs replacements of all defs before
performing the replacements of all uses.

Additionally, this PR fixes various incorrect VN updating made by CSE.

VN and CSE still track VNs that are interesting bounds check. However,
VN was sometimes inserting VNs with exception sets into the set, which
is not useful (the consumers always use normal VNs when querying the
set). This PR fixes VN to insert the normal VN instead.

Fix dotnet#109745
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 19, 2024
if (isSharedConst)
// Assign the proper Value Numbers.
ValueNumPair valExc = m_pCompiler->vnStore->VNPExceptionSet(val->gtVNPair);
store->gtVNPair = m_pCompiler->vnStore->VNPWithExc(ValueNumStore::VNPForVoid(), valExc);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was assigning ValueNumStore::VNPForVoid before, without any exceptions. That seems wrong.

Comment on lines +12722 to +12723
ValueNum lengthVN =
vnStore->VNNormalValue(tree->AsBoundsChk()->GetArrayLength()->gtVNPair.GetConservative());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All consumers of this info query the normal conservative VN, so inserting one with exceptions is uninteresting. I hit a few diffs related to this.

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@jakobbotsch jakobbotsch marked this pull request as ready for review November 20, 2024 17:06
@jakobbotsch
Copy link
Member Author

jakobbotsch commented Nov 20, 2024

cc @dotnet/jit-contrib PTAL @AndyAyersMS

Diffs. Some minor CQ and TP improvements.

The test failure looks like an infra issue -- the failing workitem was dead-lettered.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you ever collect stats on how often we bail out putting defs into SSA because of IDF size?

@jakobbotsch
Copy link
Member Author

Did you ever collect stats on how often we bail out putting defs into SSA because of IDF size?

Here's some histograms for IDF size we see in the incremental builder for win-x64 collections:

aspnet
IDF size
     <=          1 ===>     779 count ( 15% of total)
      2 ..       2 ===>     780 count ( 30% of total)
      3 ..       3 ===>     639 count ( 43% of total)
      4 ..       4 ===>     608 count ( 55% of total)
      5 ..       5 ===>     380 count ( 63% of total)
      6 ..      10 ===>    1366 count ( 90% of total)
     11 ..      20 ===>     386 count ( 97% of total)
     21 ..      30 ===>      95 count ( 99% of total)
     31 ..      40 ===>       9 count ( 99% of total)
     41 ..      50 ===>       4 count (100% of total)
     51 ..     100 ===>       0 count (100% of total)
    101 ..     200 ===>       0 count (100% of total)
    201 ..     300 ===>       0 count (100% of total)
    301 ..     400 ===>       0 count (100% of total)
    401 ..     500 ===>       0 count (100% of total)

benchmarks.run_pgo
IDF size
     <=          1 ===>     298 count ( 12% of total)
      2 ..       2 ===>     531 count ( 35% of total)
      3 ..       3 ===>     200 count ( 43% of total)
      4 ..       4 ===>     243 count ( 54% of total)
      5 ..       5 ===>     170 count ( 61% of total)
      6 ..      10 ===>     770 count ( 94% of total)
     11 ..      20 ===>     114 count ( 99% of total)
     21 ..      30 ===>       8 count ( 99% of total)
     31 ..      40 ===>       0 count ( 99% of total)
     41 ..      50 ===>       7 count ( 99% of total)
     51 ..     100 ===>       1 count (100% of total)
    101 ..     200 ===>       0 count (100% of total)
    201 ..     300 ===>       0 count (100% of total)
    301 ..     400 ===>       0 count (100% of total)
    401 ..     500 ===>       0 count (100% of total)


libraries_tests.run:
IDF size
     <=          1 ===>    4340 count ( 18% of total)
      2 ..       2 ===>    3915 count ( 35% of total)
      3 ..       3 ===>    3542 count ( 50% of total)
      4 ..       4 ===>    3435 count ( 64% of total)
      5 ..       5 ===>    1806 count ( 72% of total)
      6 ..      10 ===>    5126 count ( 94% of total)
     11 ..      20 ===>    1231 count ( 99% of total)
     21 ..      30 ===>     151 count ( 99% of total)
     31 ..      40 ===>      20 count ( 99% of total)
     41 ..      50 ===>       5 count ( 99% of total)
     51 ..     100 ===>       2 count (100% of total)
    101 ..     200 ===>       0 count (100% of total)
    201 ..     300 ===>       0 count (100% of total)
    301 ..     400 ===>       0 count (100% of total)
    401 ..     500 ===>       0 count (100% of total)

realworld:
IDF size
     <=          1 ===>     449 count ( 22% of total)
      2 ..       2 ===>     366 count ( 41% of total)
      3 ..       3 ===>     341 count ( 58% of total)
      4 ..       4 ===>     225 count ( 70% of total)
      5 ..       5 ===>     178 count ( 79% of total)
      6 ..      10 ===>     305 count ( 95% of total)
     11 ..      20 ===>      81 count ( 99% of total)
     21 ..      30 ===>      13 count ( 99% of total)
     31 ..      40 ===>       1 count ( 99% of total)
     41 ..      50 ===>       1 count (100% of total)
     51 ..     100 ===>       0 count (100% of total)
    101 ..     200 ===>       0 count (100% of total)
    201 ..     300 ===>       0 count (100% of total)
    301 ..     400 ===>       0 count (100% of total)
    401 ..     500 ===>       0 count (100% of total)

So looks like none of these collections have any examples that fail SSA insertion.

@jakobbotsch
Copy link
Member Author

Some examples of functions with large (>= 40) IDF sizes for some CSEs.

https://github.com/PowerShell/PowerShell/blob/7ca7aae1d13d19e38c7c26260758f474cb9bef7f/src/System.Management.Automation/engine/Modules/ModuleCmdletBase.cs#L1513-L3580 (a 2000 line function)

https://github.com/dotnet/performance/blob/4894b54b3e86637b040ba6c7c54c23524cfbeadd/src/benchmarks/micro/runtime/Bytemark/assign_rect.cs#L438-L540 (the OSR version of this)
The CSE here is for tableau.GetLength(1).

// On method return, pInputBufferRemaining and pOutputBufferRemaining will both point to where
// the next char would have been consumed from / the next byte would have been written to.
// inputLength in chars, outputBytesRemaining in bytes.
public static OperationStatus TranscodeToUtf8(char* pInputBuffer, int inputLength, byte* pOutputBuffer, int outputBytesRemaining, out char* pInputBufferRemaining, out byte* pOutputBufferRemaining)
{
const int CharsPerDWord = sizeof(uint) / sizeof(char);
Debug.Assert(inputLength >= 0, "Input length must not be negative.");
Debug.Assert(pInputBuffer != null || inputLength == 0, "Input length must be zero if input buffer pointer is null.");
Debug.Assert(outputBytesRemaining >= 0, "Destination length must not be negative.");
Debug.Assert(pOutputBuffer != null || outputBytesRemaining == 0, "Destination length must be zero if destination buffer pointer is null.");
// First, try vectorized conversion.
{
nuint numElementsConverted = Ascii.NarrowUtf16ToAscii(pInputBuffer, pOutputBuffer, (uint)Math.Min(inputLength, outputBytesRemaining));
pInputBuffer += numElementsConverted;
pOutputBuffer += numElementsConverted;
// Quick check - did we just end up consuming the entire input buffer?
// If so, short-circuit the remainder of the method.
if ((int)numElementsConverted == inputLength)
{
pInputBufferRemaining = pInputBuffer;
pOutputBufferRemaining = pOutputBuffer;
return OperationStatus.Done;
}
inputLength -= (int)numElementsConverted;
outputBytesRemaining -= (int)numElementsConverted;
}
if (inputLength < CharsPerDWord)
{
goto ProcessInputOfLessThanDWordSize;
}
char* pFinalPosWhereCanReadDWordFromInputBuffer = pInputBuffer + (uint)inputLength - CharsPerDWord;
// We have paths for SSE4.1 vectorization inside the inner loop. Since the below
// vector is only used in those code paths, we leave it uninitialized if SSE4.1
// is not enabled.
Vector128<short> nonAsciiUtf16DataMask;
if (Sse41.X64.IsSupported || (AdvSimd.Arm64.IsSupported && BitConverter.IsLittleEndian))
{
nonAsciiUtf16DataMask = Vector128.Create(unchecked((short)0xFF80)); // mask of non-ASCII bits in a UTF-16 char
}
// Begin the main loop.
#if DEBUG
char* pLastBufferPosProcessed = null; // used for invariant checking in debug builds
#endif
uint thisDWord;
Debug.Assert(pInputBuffer <= pFinalPosWhereCanReadDWordFromInputBuffer);
do
{
// Read 32 bits at a time. This is enough to hold any possible UTF16-encoded scalar.
thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer);
AfterReadDWord:
#if DEBUG
Debug.Assert(pLastBufferPosProcessed < pInputBuffer, "Algorithm should've made forward progress since last read.");
pLastBufferPosProcessed = pInputBuffer;
#endif
// First, check for the common case of all-ASCII chars.
if (Utf16Utility.AllCharsInUInt32AreAscii(thisDWord))
{
// We read an all-ASCII sequence (2 chars).
if (outputBytesRemaining < 2)
{
goto ProcessOneCharFromCurrentDWordAndFinish; // running out of space, but may be able to write some data
}
// The high WORD of the local declared below might be populated with garbage
// as a result of our shifts below, but that's ok since we're only going to
// write the low WORD.
//
// [ 00000000 0bbbbbbb | 00000000 0aaaaaaa ] -> [ 00000000 0bbbbbbb | 0bbbbbbb 0aaaaaaa ]
// (Same logic works regardless of endianness.)
uint valueToWrite = thisDWord | (thisDWord >> 8);
Unsafe.WriteUnaligned(pOutputBuffer, (ushort)valueToWrite);
pInputBuffer += 2;
pOutputBuffer += 2;
outputBytesRemaining -= 2;
// If we saw a sequence of all ASCII, there's a good chance a significant amount of following data is also ASCII.
// Below is basically unrolled loops with poor man's vectorization.
uint inputCharsRemaining = (uint)(pFinalPosWhereCanReadDWordFromInputBuffer - pInputBuffer) + 2;
uint minElementsRemaining = (uint)Math.Min(inputCharsRemaining, outputBytesRemaining);
if (Sse41.X64.IsSupported || (AdvSimd.Arm64.IsSupported && BitConverter.IsLittleEndian))
{
// Try reading and writing 8 elements per iteration.
uint maxIters = minElementsRemaining / 8;
ulong possibleNonAsciiQWord;
int i;
Vector128<short> utf16Data;
for (i = 0; (uint)i < maxIters; i++)
{
// The trimmer won't trim out nonAsciiUtf16DataMask unless this is in the loop.
// Luckily, this is a nop and will be elided by the JIT
Unsafe.SkipInit(out nonAsciiUtf16DataMask);
utf16Data = Unsafe.ReadUnaligned<Vector128<short>>(pInputBuffer);
if (AdvSimd.Arm64.IsSupported)
{
Vector128<short> isUtf16DataNonAscii = AdvSimd.CompareTest(utf16Data, nonAsciiUtf16DataMask);
bool hasNonAsciiDataInVector = AdvSimd.Arm64.MinPairwise(isUtf16DataNonAscii, isUtf16DataNonAscii).AsUInt64().ToScalar() != 0;
if (hasNonAsciiDataInVector)
{
goto LoopTerminatedDueToNonAsciiDataInVectorLocal;
}
Vector64<byte> lower = AdvSimd.ExtractNarrowingSaturateUnsignedLower(utf16Data);
AdvSimd.Store(pOutputBuffer, lower);
}
else if (Sse41.IsSupported)
{
if (!Sse41.TestZ(utf16Data, nonAsciiUtf16DataMask))
{
goto LoopTerminatedDueToNonAsciiDataInVectorLocal;
}
// narrow and write
Sse2.StoreScalar((ulong*)pOutputBuffer /* unaligned */, Sse2.PackUnsignedSaturate(utf16Data, utf16Data).AsUInt64());
}
else
{
// We explicitly recheck each IsSupported query to ensure that the trimmer can see which paths are live/dead
ThrowHelper.ThrowUnreachableException();
}
pInputBuffer += 8;
pOutputBuffer += 8;
}
outputBytesRemaining -= 8 * i;
// Can we perform one more iteration, but reading & writing 4 elements instead of 8?
if ((minElementsRemaining & 4) != 0)
{
possibleNonAsciiQWord = Unsafe.ReadUnaligned<ulong>(pInputBuffer);
if (!Utf16Utility.AllCharsInUInt64AreAscii(possibleNonAsciiQWord))
{
goto LoopTerminatedDueToNonAsciiDataInPossibleNonAsciiQWordLocal;
}
utf16Data = Vector128.CreateScalarUnsafe(possibleNonAsciiQWord).AsInt16();
if (AdvSimd.IsSupported)
{
Vector64<byte> lower = AdvSimd.ExtractNarrowingSaturateUnsignedLower(utf16Data);
AdvSimd.StoreSelectedScalar((uint*)pOutputBuffer, lower.AsUInt32(), 0);
}
else if (Sse2.IsSupported)
{
Unsafe.WriteUnaligned(pOutputBuffer, Sse2.ConvertToUInt32(Sse2.PackUnsignedSaturate(utf16Data, utf16Data).AsUInt32()));
}
else
{
// We explicitly recheck each IsSupported query to ensure that the trimmer can see which paths are live/dead
ThrowHelper.ThrowUnreachableException();
}
pInputBuffer += 4;
pOutputBuffer += 4;
outputBytesRemaining -= 4;
}
continue; // Go back to beginning of main loop, read data, check for ASCII
LoopTerminatedDueToNonAsciiDataInVectorLocal:
outputBytesRemaining -= 8 * i;
if (Sse2.X64.IsSupported)
{
possibleNonAsciiQWord = Sse2.X64.ConvertToUInt64(utf16Data.AsUInt64());
}
else
{
possibleNonAsciiQWord = utf16Data.AsUInt64().ToScalar();
}
// Temporarily set 'possibleNonAsciiQWord' to be the low 64 bits of the vector,
// then check whether it's all-ASCII. If so, narrow and write to the destination
// buffer. Since we know that either the high 64 bits or the low 64 bits of the
// vector contains non-ASCII data, by the end of the following block the
// 'possibleNonAsciiQWord' local is guaranteed to contain the non-ASCII segment.
if (Utf16Utility.AllCharsInUInt64AreAscii(possibleNonAsciiQWord)) // all chars in first QWORD are ASCII
{
if (AdvSimd.IsSupported)
{
Vector64<byte> lower = AdvSimd.ExtractNarrowingSaturateUnsignedLower(utf16Data);
AdvSimd.StoreSelectedScalar((uint*)pOutputBuffer, lower.AsUInt32(), 0);
}
else if (Sse2.IsSupported)
{
Unsafe.WriteUnaligned(pOutputBuffer, Sse2.ConvertToUInt32(Sse2.PackUnsignedSaturate(utf16Data, utf16Data).AsUInt32()));
}
else
{
// We explicitly recheck each IsSupported query to ensure that the trimmer can see which paths are live/dead
ThrowHelper.ThrowUnreachableException();
}
pInputBuffer += 4;
pOutputBuffer += 4;
outputBytesRemaining -= 4;
possibleNonAsciiQWord = utf16Data.AsUInt64().GetElement(1);
}
LoopTerminatedDueToNonAsciiDataInPossibleNonAsciiQWordLocal:
Debug.Assert(!Utf16Utility.AllCharsInUInt64AreAscii(possibleNonAsciiQWord)); // this condition should've been checked earlier
thisDWord = (uint)possibleNonAsciiQWord;
if (Utf16Utility.AllCharsInUInt32AreAscii(thisDWord))
{
// [ 00000000 0bbbbbbb | 00000000 0aaaaaaa ] -> [ 00000000 0bbbbbbb | 0bbbbbbb 0aaaaaaa ]
Unsafe.WriteUnaligned(pOutputBuffer, (ushort)(thisDWord | (thisDWord >> 8)));
pInputBuffer += 2;
pOutputBuffer += 2;
outputBytesRemaining -= 2;
thisDWord = (uint)(possibleNonAsciiQWord >> 32);
}
goto AfterReadDWordSkipAllCharsAsciiCheck;
}
else
{
// Can't use SSE41 x64, so we'll only read and write 4 elements per iteration.
uint maxIters = minElementsRemaining / 4;
uint secondDWord;
int i;
for (i = 0; (uint)i < maxIters; i++)
{
thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer);
secondDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer + 2);
if (!Utf16Utility.AllCharsInUInt32AreAscii(thisDWord | secondDWord))
{
goto LoopTerminatedDueToNonAsciiData;
}
// [ 00000000 0bbbbbbb | 00000000 0aaaaaaa ] -> [ 00000000 0bbbbbbb | 0bbbbbbb 0aaaaaaa ]
// (Same logic works regardless of endianness.)
Unsafe.WriteUnaligned(pOutputBuffer, (ushort)(thisDWord | (thisDWord >> 8)));
Unsafe.WriteUnaligned(pOutputBuffer + 2, (ushort)(secondDWord | (secondDWord >> 8)));
pInputBuffer += 4;
pOutputBuffer += 4;
}
outputBytesRemaining -= 4 * i;
continue; // Go back to beginning of main loop, read data, check for ASCII
LoopTerminatedDueToNonAsciiData:
outputBytesRemaining -= 4 * i;
// First, see if we can drain any ASCII data from the first DWORD.
if (Utf16Utility.AllCharsInUInt32AreAscii(thisDWord))
{
// [ 00000000 0bbbbbbb | 00000000 0aaaaaaa ] -> [ 00000000 0bbbbbbb | 0bbbbbbb 0aaaaaaa ]
// (Same logic works regardless of endianness.)
Unsafe.WriteUnaligned(pOutputBuffer, (ushort)(thisDWord | (thisDWord >> 8)));
pInputBuffer += 2;
pOutputBuffer += 2;
outputBytesRemaining -= 2;
thisDWord = secondDWord;
}
goto AfterReadDWordSkipAllCharsAsciiCheck;
}
}
AfterReadDWordSkipAllCharsAsciiCheck:
Debug.Assert(!Utf16Utility.AllCharsInUInt32AreAscii(thisDWord)); // this should have been handled earlier
// Next, try stripping off the first ASCII char if it exists.
// We don't check for a second ASCII char since that should have been handled above.
if (IsFirstCharAscii(thisDWord))
{
if (outputBytesRemaining == 0)
{
goto OutputBufferTooSmall;
}
if (BitConverter.IsLittleEndian)
{
pOutputBuffer[0] = (byte)thisDWord; // extract [ ## ## 00 AA ]
}
else
{
pOutputBuffer[0] = (byte)(thisDWord >> 16); // extract [ 00 AA ## ## ]
}
pInputBuffer++;
pOutputBuffer++;
outputBytesRemaining--;
if (pInputBuffer > pFinalPosWhereCanReadDWordFromInputBuffer)
{
goto ProcessNextCharAndFinish; // input buffer doesn't contain enough data to read a DWORD
}
else
{
// The input buffer at the current offset contains a non-ASCII char.
// Read an entire DWORD and fall through to non-ASCII consumption logic.
thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer);
}
}
// At this point, we know the first char in the buffer is non-ASCII, but we haven't yet validated it.
if (!IsFirstCharAtLeastThreeUtf8Bytes(thisDWord))
{
TryConsumeMultipleTwoByteSequences:
// For certain text (Greek, Cyrillic, ...), 2-byte sequences tend to be clustered. We'll try transcoding them in
// a tight loop without falling back to the main loop.
if (IsSecondCharTwoUtf8Bytes(thisDWord))
{
// We have two runs of two bytes each.
if (outputBytesRemaining < 4)
{
goto ProcessOneCharFromCurrentDWordAndFinish; // running out of output buffer
}
Unsafe.WriteUnaligned(pOutputBuffer, ExtractTwoUtf8TwoByteSequencesFromTwoPackedUtf16Chars(thisDWord));
pInputBuffer += 2;
pOutputBuffer += 4;
outputBytesRemaining -= 4;
if (pInputBuffer > pFinalPosWhereCanReadDWordFromInputBuffer)
{
goto ProcessNextCharAndFinish; // Running out of data - go down slow path
}
else
{
// Optimization: If we read a long run of two-byte sequences, the next sequence is probably
// also two bytes. Check for that first before going back to the beginning of the loop.
thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer);
if (IsFirstCharTwoUtf8Bytes(thisDWord))
{
// Validated we have a two-byte sequence coming up
goto TryConsumeMultipleTwoByteSequences;
}
// If we reached this point, the next sequence is something other than a valid
// two-byte sequence, so go back to the beginning of the loop.
goto AfterReadDWord;
}
}
if (outputBytesRemaining < 2)
{
goto OutputBufferTooSmall;
}
Unsafe.WriteUnaligned(pOutputBuffer, (ushort)ExtractUtf8TwoByteSequenceFromFirstUtf16Char(thisDWord));
// The buffer contains a 2-byte sequence followed by 2 bytes that aren't a 2-byte sequence.
// Unlikely that a 3-byte sequence would follow a 2-byte sequence, so perhaps remaining
// char is ASCII?
if (IsSecondCharAscii(thisDWord))
{
if (outputBytesRemaining >= 3)
{
if (BitConverter.IsLittleEndian)
{
thisDWord >>= 16;
}
pOutputBuffer[2] = (byte)thisDWord;
pInputBuffer += 2;
pOutputBuffer += 3;
outputBytesRemaining -= 3;
continue; // go back to original bounds check and check for ASCII
}
else
{
pInputBuffer++;
pOutputBuffer += 2;
goto OutputBufferTooSmall;
}
}
else
{
pInputBuffer++;
pOutputBuffer += 2;
outputBytesRemaining -= 2;
if (pInputBuffer > pFinalPosWhereCanReadDWordFromInputBuffer)
{
goto ProcessNextCharAndFinish; // Running out of data - go down slow path
}
else
{
thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer);
goto BeforeProcessThreeByteSequence; // we know the next byte isn't ASCII, and it's not the start of a 2-byte sequence (this was checked above)
}
}
}
// Check the 3-byte case.
BeforeProcessThreeByteSequence:
if (!IsFirstCharSurrogate(thisDWord))
{
// Optimization: A three-byte character could indicate CJK text, which makes it likely
// that the character following this one is also CJK. We'll perform the check now
// rather than jumping to the beginning of the main loop.
if (IsSecondCharAtLeastThreeUtf8Bytes(thisDWord))
{
if (!IsSecondCharSurrogate(thisDWord))
{
if (outputBytesRemaining < 6)
{
goto ConsumeSingleThreeByteRun; // not enough space - try consuming as much as we can
}
WriteTwoUtf16CharsAsTwoUtf8ThreeByteSequences(ref *pOutputBuffer, thisDWord);
pInputBuffer += 2;
pOutputBuffer += 6;
outputBytesRemaining -= 6;
// Try to remain in the 3-byte processing loop if at all possible.
if (pInputBuffer > pFinalPosWhereCanReadDWordFromInputBuffer)
{
goto ProcessNextCharAndFinish; // Running out of data - go down slow path
}
else
{
thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer);
if (IsFirstCharAtLeastThreeUtf8Bytes(thisDWord))
{
goto BeforeProcessThreeByteSequence;
}
else
{
// Fall back to standard processing loop since we don't know how to optimize this.
goto AfterReadDWord;
}
}
}
}
ConsumeSingleThreeByteRun:
if (outputBytesRemaining < 3)
{
goto OutputBufferTooSmall;
}
WriteFirstUtf16CharAsUtf8ThreeByteSequence(ref *pOutputBuffer, thisDWord);
pInputBuffer++;
pOutputBuffer += 3;
outputBytesRemaining -= 3;
// Occasionally one-off ASCII characters like spaces, periods, or newlines will make their way
// in to the text. If this happens strip it off now before seeing if the next character
// consists of three code units.
if (IsSecondCharAscii(thisDWord))
{
if (outputBytesRemaining == 0)
{
goto OutputBufferTooSmall;
}
if (BitConverter.IsLittleEndian)
{
*pOutputBuffer = (byte)(thisDWord >> 16);
}
else
{
*pOutputBuffer = (byte)(thisDWord);
}
pInputBuffer++;
pOutputBuffer++;
outputBytesRemaining--;
if (pInputBuffer > pFinalPosWhereCanReadDWordFromInputBuffer)
{
goto ProcessNextCharAndFinish; // Running out of data - go down slow path
}
else
{
thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer);
if (IsFirstCharAtLeastThreeUtf8Bytes(thisDWord))
{
goto BeforeProcessThreeByteSequence;
}
else
{
// Fall back to standard processing loop since we don't know how to optimize this.
goto AfterReadDWord;
}
}
}
if (pInputBuffer > pFinalPosWhereCanReadDWordFromInputBuffer)
{
goto ProcessNextCharAndFinish; // Running out of data - go down slow path
}
else
{
thisDWord = Unsafe.ReadUnaligned<uint>(pInputBuffer);
goto AfterReadDWordSkipAllCharsAsciiCheck; // we just checked above that this value isn't ASCII
}
}
// Four byte sequence processing
if (IsWellFormedUtf16SurrogatePair(thisDWord))
{
if (outputBytesRemaining < 4)
{
goto OutputBufferTooSmall;
}
Unsafe.WriteUnaligned(pOutputBuffer, ExtractFourUtf8BytesFromSurrogatePair(thisDWord));
pInputBuffer += 2;
pOutputBuffer += 4;
outputBytesRemaining -= 4;
continue; // go back to beginning of loop for processing
}
goto Error; // an ill-formed surrogate sequence: high not followed by low, or low not preceded by high
} while (pInputBuffer <= pFinalPosWhereCanReadDWordFromInputBuffer);
ProcessNextCharAndFinish:
inputLength = (int)(pFinalPosWhereCanReadDWordFromInputBuffer - pInputBuffer) + CharsPerDWord;
ProcessInputOfLessThanDWordSize:
Debug.Assert(inputLength < CharsPerDWord);
if (inputLength == 0)
{
goto InputBufferFullyConsumed;
}
uint thisChar = *pInputBuffer;
goto ProcessFinalChar;
ProcessOneCharFromCurrentDWordAndFinish:
if (BitConverter.IsLittleEndian)
{
thisChar = thisDWord & 0xFFFFu; // preserve only the first char
}
else
{
thisChar = thisDWord >> 16; // preserve only the first char
}
ProcessFinalChar:
{
if (thisChar <= 0x7Fu)
{
if (outputBytesRemaining == 0)
{
goto OutputBufferTooSmall; // we have no hope of writing anything to the output
}
// 1-byte (ASCII) case
*pOutputBuffer = (byte)thisChar;
pInputBuffer++;
pOutputBuffer++;
}
else if (thisChar < 0x0800u)
{
if (outputBytesRemaining < 2)
{
goto OutputBufferTooSmall; // we have no hope of writing anything to the output
}
// 2-byte case
pOutputBuffer[1] = (byte)((thisChar & 0x3Fu) | unchecked((uint)(sbyte)0x80)); // [ 10xxxxxx ]
pOutputBuffer[0] = (byte)((thisChar >> 6) | unchecked((uint)(sbyte)0xC0)); // [ 110yyyyy ]
pInputBuffer++;
pOutputBuffer += 2;
}
else if (!UnicodeUtility.IsSurrogateCodePoint(thisChar))
{
if (outputBytesRemaining < 3)
{
goto OutputBufferTooSmall; // we have no hope of writing anything to the output
}
// 3-byte case
pOutputBuffer[2] = (byte)((thisChar & 0x3Fu) | unchecked((uint)(sbyte)0x80)); // [ 10xxxxxx ]
pOutputBuffer[1] = (byte)(((thisChar >> 6) & 0x3Fu) | unchecked((uint)(sbyte)0x80)); // [ 10yyyyyy ]
pOutputBuffer[0] = (byte)((thisChar >> 12) | unchecked((uint)(sbyte)0xE0)); // [ 1110zzzz ]
pInputBuffer++;
pOutputBuffer += 3;
}
else if (thisChar <= 0xDBFFu)
{
// UTF-16 high surrogate code point with no trailing data, report incomplete input buffer
goto InputBufferTooSmall;
}
else
{
// UTF-16 low surrogate code point with no leading data, report error
goto Error;
}
}
// There are two ways we can end up here. Either we were running low on input data,
// or we were running low on space in the destination buffer. If we're running low on
// input data (label targets ProcessInputOfLessThanDWordSize and ProcessNextCharAndFinish),
// then the inputLength value is guaranteed to be between 0 and 1, and we should return Done.
// If we're running low on destination buffer space (label target ProcessOneCharFromCurrentDWordAndFinish),
// then we didn't modify inputLength since entering the main loop, which means it should
// still have a value of >= 2. So checking the value of inputLength is all we need to do to determine
// which of the two scenarios we're in.
if (inputLength > 1)
{
goto OutputBufferTooSmall;
}
InputBufferFullyConsumed:
OperationStatus retVal = OperationStatus.Done;
goto ReturnCommon;
InputBufferTooSmall:
retVal = OperationStatus.NeedMoreData;
goto ReturnCommon;
OutputBufferTooSmall:
retVal = OperationStatus.DestinationTooSmall;
goto ReturnCommon;
Error:
retVal = OperationStatus.InvalidData;
goto ReturnCommon;
ReturnCommon:
pInputBufferRemaining = pInputBuffer;
pOutputBufferRemaining = pOutputBuffer;
return retVal;
}
(standard tier 1 version).
This one has a CSE with 27 definitions and with no uses in reachable blocks. I wonder if it would be worth it to compute the DFS tree. It seems like the heuristic is not accounting for reachability of the uses (probably the better solution is to aggressively trim the blocks earlier)

@jakobbotsch jakobbotsch merged commit fe70623 into dotnet:main Nov 22, 2024
131 of 133 checks passed
@jakobbotsch jakobbotsch deleted the fix-109745 branch November 22, 2024 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: libraries jitstress Assertion failed 'doesVNMatch' during 'Assertion prop'
2 participants