Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify minterm generation in Regex NonBacktracking engine #61232

Merged
merged 2 commits into from
Nov 5, 2021

Conversation

olsaarik
Copy link
Contributor

@olsaarik olsaarik commented Nov 4, 2021

PartitionTree had a holdover from when it was tracking the predicates that made up each minterm, which caused it to use order n too much memory for some patterns: nodes with the same predicate as the parent are no longer created. This also significantly simplified the algorithm.

The effect is particularly evident in patterns with lots of non-overlapping predicates. For the pattern "qwertyuuiopasdfghjklzxcvbnm" this dropped the number of PartitionTree nodes created from 378 to 53.

PartitionTree had a holdover from when it was tracking the predicates
that made up each minterm, which caused it to use order n too much
memory for some patterns.
Nodes with the same predicate as the parent are no longer created.
This also significantly simplified the algorithm.
@ghost
Copy link

ghost commented Nov 4, 2021

Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

PartitionTree had a holdover from when it was tracking the predicates that made up each minterm, which caused it to use order n too much memory for some patterns: nodes with the same predicate as the parent are no longer created. This also significantly simplified the algorithm.

The effect is particularly evident in patterns with lots of non-overlapping predicates. For the pattern "qwertyuuiopasdfghjklzxcvbnm" this dropped the number of PartitionTree nodes created from 378 to 53.

Author: olsaarik
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: -

@@ -41,25 +40,17 @@ public MintermGenerator(IBooleanAlgebra<TPredicate> algebra)
/// <returns>all minterms of the given predicate sequence</returns>
public List<TPredicate> GenerateMinterms(params TPredicate[] preds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as you're improving things here, does this need to be params or could that be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I don't think it has to be. I'll change it to match what this is called with.

Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less code and cheaper. Yay :)

Also remove the deduplication in there, since its now optional and the
only call site already passes in a unique set of predicates.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants