Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent IndexOutOfRangeException in RegexInterpreter #97916

Merged
merged 2 commits into from
Apr 13, 2024

Conversation

rhirano0715
Copy link
Contributor

This update fixes the IndexOutOfRangeException in RegexInterpreter by enhancing the TrackPush and TrackPush2 methods. The adjustment involves checking the runtrack position before decrementing it, ensuring that it doesn't become negative, which was the root cause of the exception. This prevents potential out-of-range errors when handling large numbers of repetitions in regular expressions.

Fix #62094

This update fixes the IndexOutOfRangeException in RegexInterpreter by enhancing the `TrackPush` and `TrackPush2` methods. The adjustment involves checking the runtrack position before decrementing it, ensuring that it doesn't become negative, which was the root cause of the exception. This prevents potential out-of-range errors when handling large numbers of repetitions in regular expressions.

Fix dotnet#62094
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Feb 3, 2024
@ghost
Copy link

ghost commented Feb 3, 2024

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

This update fixes the IndexOutOfRangeException in RegexInterpreter by enhancing the TrackPush and TrackPush2 methods. The adjustment involves checking the runtrack position before decrementing it, ensuring that it doesn't become negative, which was the root cause of the exception. This prevents potential out-of-range errors when handling large numbers of repetitions in regular expressions.

Fix #62094

Author: rhirano0715
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: -

@stephentoub
Copy link
Member

stephentoub commented Feb 3, 2024

Thank you, but I'm not convinced this is the right fix. Why are the numbers becoming negative in the first place? That would seem to suggest a logic error in how the stack is balanced. Seems like this is addressing a symptom rather than the cause.

@rhirano0715
Copy link
Contributor Author

@stephentoub

Thanks for your reply.
I think you are right.
I will look into it some more.

@steveharter steveharter added the needs-author-action An issue or pull request that requires more info or actions from the author. label Feb 14, 2024
@ghost ghost added the no-recent-activity label Feb 29, 2024
@ghost
Copy link

ghost commented Feb 29, 2024

This pull request has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

If EnsureStorage() is called unconditionally, the array will be expanded, so the position will never become negative. When the conditions inside EnsureStorage() are true, it might be necessary to expand the array, regardless of the comparison between newpos and codepos.

https://github.com/dotnet/runtime/blob/6ebc8bd86dbc780b2a2a7daf3ab6020f9104f09e/src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.MultipleMatches.Tests.cs#L461-L469

Before the change, in this test case, EnsureStorage() is not called because newpos == codepos == 6 from the first time until an exception occurs.

Fix dotnet#62049
@dotnet-policy-service dotnet-policy-service bot removed needs-author-action An issue or pull request that requires more info or actions from the author. no-recent-activity labels Mar 9, 2024
@rhirano0715
Copy link
Contributor Author

I made changes because it seemed good for RegexInterpreter's Goto() and Backtrack() to call EnsureStorage() unconditionally.

private void Goto(int newpos)
{
// When branching backward, ensure storage.
if (newpos < _codepos)
{
EnsureStorage();
}
_codepos = newpos;
SetOperator((RegexOpcode)_code.Codes[newpos]);
}

If EnsureStorage() is called unconditionally, the array will be extended, so it seems that the array's position will never become negative. I think there might be a need to extend when the condition inside EnsureStorage() is true.

engine, @"(?:){93}", "x", RegexOptions.None, new[]
{
new CaptureData("", 0, 0),
new CaptureData("", 1, 0)
}

In the case of this test case, EnsureStorage() is not called since newpos == codepos == 6 is always true from the first instance until an exception occurs.

@rhirano0715 rhirano0715 marked this pull request as ready for review March 15, 2024 09:23
@stephentoub
Copy link
Member

RegexInterpreter's Goto() and Backtrack() to call EnsureStorage() unconditionally

Thanks. Are you able to measure what impact that has on perf? Hopefully the call would be inlined, and we should end up with effectivlely two comparisons instead of one and it shouldn't be measurable. As long as perf looks good, this general form of fix makes sense.

@rhirano0715
Copy link
Contributor Author

@stephentoub
Thank you for checking.

Are you able to measure what impact that has on perf?

I'm sorry. I don't understand how to do it.
Is it something that's written around here?

And how to measure performance:
* [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
* [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md)

@steveharter
Copy link
Member

I'm sorry. I don't understand how to do it.
Is it something that's written around here?

The common ways to verify perf:

@rhirano0715
Copy link
Contributor Author

@stephentoub
I ran the existing RegEx benchmarks.
It seems to be fine, but what do you think?

// * Summary *

BenchmarkDotNet v0.13.13-nightly.20240311.145, Windows 11 (10.0.22621.3374/22H2/2022Update/SunValley2)
11th Gen Intel Core i7-1185G7 3.00GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 9.0.100-preview.1.24101.2
[Host] : .NET 9.0.0 (9.0.24.8009), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Job-HPUVJJ : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Job-NTZNCJ : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250ms MaxIterationCount=20
MinIterationCount=3 WarmupCount=1

Method Job Toolchain Options Mean Error StdDev Median Min Max Ratio Allocated Alloc Ratio
Count Job-HPUVJJ \after\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe None 334.8 ms 3.33 ms 0.52 ms 335.0 ms 334.1 ms 335.2 ms 1.00 1.05 KB 1.00
Count Job-NTZNCJ \before\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe None 333.5 ms 2.18 ms 0.34 ms 333.5 ms 333.0 ms 333.9 ms 1.00 1.05 KB 1.00
Count Job-HPUVJJ \after\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe IgnoreCase 800.0 ms 9.89 ms 1.53 ms 799.8 ms 798.7 ms 801.8 ms 0.98 1.05 KB 1.00
Count Job-NTZNCJ \before\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe IgnoreCase 816.1 ms 13.59 ms 2.10 ms 815.7 ms 814.0 ms 818.9 ms 1.00 1.05 KB 1.00
Count Job-HPUVJJ \after\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe Compiled 305.7 ms 6.07 ms 1.58 ms 305.4 ms 304.0 ms 308.1 ms 1.00 1.05 KB 1.00
Count Job-NTZNCJ \before\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe Compiled 305.1 ms 4.36 ms 1.13 ms 304.6 ms 304.0 ms 306.4 ms 1.00 1.05 KB 1.00
Count Job-HPUVJJ \after\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe IgnoreCase, Compiled 575.6 ms 8.54 ms 3.79 ms 576.5 ms 570.7 ms 580.1 ms 0.99 1.05 KB 1.00
Count Job-NTZNCJ \before\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe IgnoreCase, Compiled 579.1 ms 8.68 ms 3.10 ms 579.0 ms 574.2 ms 583.7 ms 1.00 1.05 KB 1.00
Count Job-HPUVJJ \after\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe NonBacktracking 375.4 ms 6.17 ms 4.08 ms 374.6 ms 371.7 ms 383.2 ms 1.00 1.05 KB 1.00
Count Job-NTZNCJ \before\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe NonBacktracking 374.3 ms 6.47 ms 3.85 ms 374.8 ms 367.2 ms 378.6 ms 1.00 1.05 KB 1.00
Count Job-HPUVJJ \after\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe IgnoreCase, NonBacktracking 820.7 ms 16.30 ms 5.81 ms 821.8 ms 811.2 ms 828.5 ms 0.99 1.05 KB 1.00
Count Job-NTZNCJ \before\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe IgnoreCase, NonBacktracking 827.7 ms 14.01 ms 5.00 ms 826.2 ms 822.9 ms 834.5 ms 1.00 1.05 KB 1.00

// * Hints *
Outliers
Perf_Regex_Industry_SliceSlice.Count: PowerPlanMode=00000000-0000-0000-0000-000000000000, Toolchain=\after\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe, IterationTime=250ms, MaxIterationCount=20, MinIterationCount=3, WarmupCount=1 -> 1 outlier was removed, 2 outliers were detected (334.07 ms, 338.34 ms)
Perf_Regex_Industry_SliceSlice.Count: PowerPlanMode=00000000-0000-0000-0000-000000000000, Toolchain=\before\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe, IterationTime=250ms, MaxIterationCount=20, MinIterationCount=3, WarmupCount=1 -> 1 outlier was removed (340.24 ms)
Perf_Regex_Industry_SliceSlice.Count: PowerPlanMode=00000000-0000-0000-0000-000000000000, Toolchain=\before\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe, IterationTime=250ms, MaxIterationCount=20, MinIterationCount=3, WarmupCount=1 -> 1 outlier was removed (597.71 ms)
Perf_Regex_Industry_SliceSlice.Count: PowerPlanMode=00000000-0000-0000-0000-000000000000, Toolchain=\after\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe, IterationTime=250ms, MaxIterationCount=20, MinIterationCount=3, WarmupCount=1 -> 1 outlier was removed (394.62 ms)
Perf_Regex_Industry_SliceSlice.Count: PowerPlanMode=00000000-0000-0000-0000-000000000000, Toolchain=\after\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe, IterationTime=250ms, MaxIterationCount=20, MinIterationCount=3, WarmupCount=1 -> 1 outlier was detected (811.21 ms)
Perf_Regex_Industry_SliceSlice.Count: PowerPlanMode=00000000-0000-0000-0000-000000000000, Toolchain=\before\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe, IterationTime=250ms, MaxIterationCount=20, MinIterationCount=3, WarmupCount=1 -> 1 outlier was removed (882.08 ms)

// * Legends *
Options : Value of the 'Options' parameter
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
Median : Value separating the higher half of all measurements (50th percentile)
Min : Minimum
Max : Maximum
Ratio : Mean of the ratio distribution ([Current]/[Baseline])
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
Alloc Ratio : Allocated memory ratio distribution ([Current]/[Baseline])
1 ms : 1 Millisecond (0.001 sec)

// * Diagnostic Output - MemoryDiagnoser *

// ***** BenchmarkRunner: End *****
Run time: 00:03:10 (190.47 sec), executed benchmarks: 12

Global total time: 01:05:22 (3922.2 sec), executed benchmarks: 558

Copy link
Member

@danmoseley danmoseley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@stephentoub
Copy link
Member

This looks good. Thanks for the fix!

@stephentoub stephentoub merged commit 7c365ec into dotnet:main Apr 13, 2024
111 checks passed
matouskozak pushed a commit to matouskozak/runtime that referenced this pull request Apr 30, 2024
* Prevent IndexOutOfRangeException in RegexInterpreter

This update fixes the IndexOutOfRangeException in RegexInterpreter by enhancing the `TrackPush` and `TrackPush2` methods. The adjustment involves checking the runtrack position before decrementing it, ensuring that it doesn't become negative, which was the root cause of the exception. This prevents potential out-of-range errors when handling large numbers of repetitions in regular expressions.

Fix dotnet#62094

* Changed to call EnsureStorage() unconditionally.

If EnsureStorage() is called unconditionally, the array will be expanded, so the position will never become negative. When the conditions inside EnsureStorage() are true, it might be necessary to expand the array, regardless of the comparison between newpos and codepos.

https://github.com/dotnet/runtime/blob/6ebc8bd86dbc780b2a2a7daf3ab6020f9104f09e/src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.MultipleMatches.Tests.cs#L461-L469

Before the change, in this test case, EnsureStorage() is not called because newpos == codepos == 6 from the first time until an exception occurs.

Fix dotnet#62049
@github-actions github-actions bot locked and limited conversation to collaborators May 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.RegularExpressions community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regex Capture gives IndexOutOfRangeException on (?:){93}
4 participants