Skip to content

Commit

Permalink
Optimize String.Replace(string, string) (#44088)
Browse files Browse the repository at this point in the history
We can significantly improve its throughput for a few scenarios:
- If both oldValue and newValue are single characters in the form of strings, we can just delegate to the Replace(char, char) overload, which is much faster.
- If oldValue is a single character but newValue isn't, we can use IndexOf to find it, rather than an open-coded loop, making search times much faster for reasonably sized strings.
- If oldValue is multiple characters and the string being searched for isn't super frequent (e.g. doesn't repeat every few characters), we can significantly speed up throughput by using IndexOf to search for the whole string.  For example, replacing "\r\n" with "\n" in the contents of a typical file.

This does come at a measurable cost when the oldValue is really common and tightly packed, e.g. searching for "aa" in "aaaaaaaaaaaaaaaaaa", so we can decide whether the tradeoff is the right one.
  • Loading branch information
stephentoub authored Oct 31, 2020
1 parent 3cc9f2c commit f29a272
Showing 1 changed file with 48 additions and 25 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -1124,48 +1124,71 @@ public string Replace(char oldChar, char newChar)

public string Replace(string oldValue, string? newValue)
{
if (oldValue == null)
if (oldValue is null)
{
throw new ArgumentNullException(nameof(oldValue));
}
if (oldValue.Length == 0)
{
throw new ArgumentException(SR.Argument_StringZeroLength, nameof(oldValue));
}

// Api behavior: if newValue is null, instances of oldValue are to be removed.
newValue ??= string.Empty;
// If newValue is null, treat it as an empty string. Callers use this to remove the oldValue.
newValue ??= Empty;

// Track the locations of oldValue to be replaced.
var replacementIndices = new ValueListBuilder<int>(stackalloc int[StackallocIntBufferSizeLimit]);

unsafe
if (oldValue.Length == 1)
{
fixed (char* pThis = &_firstChar)
// Special-case oldValues that are a single character. Even though there's an overload that takes
// a single character, its newValue is also a single character, so this overload ends up being used
// often to remove characters by having an empty newValue.

if (newValue.Length == 1)
{
int matchIdx = 0;
int lastPossibleMatchIdx = this.Length - oldValue.Length;
while (matchIdx <= lastPossibleMatchIdx)
{
char* pMatch = pThis + matchIdx;
for (int probeIdx = 0; probeIdx < oldValue.Length; probeIdx++)
{
if (pMatch[probeIdx] != oldValue[probeIdx])
{
goto Next;
}
}
// Found a match for the string. Record the location of the match and skip over the "oldValue."
replacementIndices.Append(matchIdx);
matchIdx += oldValue.Length;
continue;
// With both the oldValue and newValue being a single character, it's cheaper to just use the other overload.
return Replace(oldValue[0], newValue[0]);
}

Next:
matchIdx++;
// Find all occurrences of the oldValue character.
char c = oldValue[0];
int i = 0;
while (true)
{
int pos = SpanHelpers.IndexOf(ref Unsafe.Add(ref _firstChar, i), c, Length - i);
if (pos == -1)
{
break;
}
replacementIndices.Append(i + pos);
i += pos + 1;
}
}
else
{
// Find all occurrences of the oldValue string.
int i = 0;
while (true)
{
int pos = SpanHelpers.IndexOf(ref Unsafe.Add(ref _firstChar, i), Length - i, ref oldValue._firstChar, oldValue.Length);
if (pos == -1)
{
break;
}
replacementIndices.Append(i + pos);
i += pos + oldValue.Length;
}
}

// If the oldValue wasn't found, just return the original string.
if (replacementIndices.Length == 0)
{
return this;
}

// String allocation and copying is in separate method to make this method faster for the case where
// nothing needs replacing.
// Perform the replacement. String allocation and copying is in separate method to make this method faster
// for the case where nothing needs replacing.
string dst = ReplaceHelper(oldValue.Length, newValue, replacementIndices.AsSpan());

replacementIndices.Dispose();
Expand Down

0 comments on commit f29a272

Please sign in to comment.