-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinite loop for certain text #11
Comments
I am not sure what this is supposed to do however the problem seems to be in The regex ReferenceRegex is used against the text and it seems the 2 are a fatal combination. |
It seems the combination of character, '.', number does not process well with the regex ReferenceRegex. When testing the regex at https://regexr.com if I type c.#### then as I continue to type any numbers then the execution time gets slower and slower until it eventually timesout at 250ms. Therefore for very large numbers I would expect the execution time is exponentially long. I am not sure how to test the original ruby version however it seems like since it uses the exact same regex then it likely has the same issues. |
I am not sure if this is a good solution or even "correct" in general however it does solve my problem. private static readonly Regex _numericSeparator = new Regex(@"(.\.\d)", RegexOptions.Compiled);
private string PreprocessText(string text)
{
var matches = _numericSeparator.Matches(text);
var groups = matches
.AsEnumerable()
.SelectMany(x => x.Groups.Values)
.Select(x => x.Value)
.Distinct();
foreach (var group in groups)
{
var replacement = group
.Replace(".", ". ");
text = text.Replace(group, replacement);
}
return text;
} |
The suggested fix may not work in the general case. For example |
Certain text causes the segmenter to enter into an infinite loop.
The text was updated successfully, but these errors were encountered: