Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch pattern matching to use TemporaryArray #51035

Merged
merged 23 commits into from
May 7, 2021

Conversation

CyrusNajmabadi
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi commented Feb 5, 2021

This code is a prime candidate for temporary arrays as most often teh number of chunks/matches we have will be <=4, allowing us to process things without any allocs or need to go to pools. This also simplifies our impl a bunch as we don't have to do complex work around avoiding even accessing the Pool for our array builders.

{
private readonly bool _includeMatchedSpans;
private readonly string _candidate;
private readonly ArrayBuilder<TextSpan> _candidateHumps;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was annoying. can't point to a struct by ref as a field. So this had to be a parameter passed to all methods instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the current approach for this PR. The available alternative is placing the non-copyable value in heap storage instead of stack storage, e.g. StrongBox<TemporaryArray<TextSpan>> or Tuple<>/class, depending on the number of values that need to be stored together. The two main downsides of this approach is it forces an additional allocation (perhaps not bad if the value can be used many times) and it forces all callers to change signatures.

Copy link
Member

@sharwell sharwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do a pass to switch from ref to in for the new parameters in cases where they are not modified?

@CyrusNajmabadi
Copy link
Member Author

Can you do a pass to switch from ref to in for the new parameters in cases where they are not modified?

Yup. Brainfart on my part. Thanks!

@CyrusNajmabadi
Copy link
Member Author

@sharwell ready for another view.

@@ -29,7 +30,7 @@ private struct TextChunk : IDisposable
/// capitalized runs and lowercase runs. i.e. if you have AAbb, then there will be two
/// character spans, one for AA and one for BB.
/// </summary>
public readonly ArrayBuilder<TextSpan> PatternHumps;
public TemporaryArray<TextSpan> PatternHumps;
Copy link
Member

@sharwell sharwell Feb 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❗ This is a hole in the non-copyable analysis. Can't have a non-copyable field of a copyable structure. Need to either mark TextChunk as non-copyable, or remove this field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: is this actually bad here? i get the potential hole in teh analysis, but in this case the usage is all correct. this data is effectively readonly (i would actually make it readonly if i could, but the analyzer gets annoyed in other ways). So as long as the text chunk is kept around, it's safe to use. THen, we end up disposing the TextChunk when no longer needed, which frees the PatternHumps.

Copy link
Member

@sharwell sharwell Feb 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: is this actually bad here?

Yes. Non-copyable analysis is intended to only allow false positives (might be copied but wasn't) as opposed to approximation features like NRT that also allow false negatives. [NonCopyable] provides the reader with an absolute correctness guarantee that can be relied on without any additional knowledge of code outside the current context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fair. do you have a sense of how the design shoudl work here? For pattern matching we have a search pattern (that we want to break into a small, finite amount of pre-processed info that we likely can keep on the stack). We then compare against candidates that we similarly break up.

Ideally this coudl almost all be on the stack. The only exceptions being when breaking up that info literally does require more than teh fixed amount of the TempArray and has to spill to the heap.

One interesting thing though is that there are a few arrays-of-arrays going on. FOr example a pattern-segment has an array of text-chunks. And teach text-chunk has an array of camel-humps.

DO you think there's a way to rejigger this to work with your stack-affinitized approach currently? If not, do you see the work coming in the next C# (for example, ref-fields) as helping out here?

Thanks!

Copy link
Member

@sharwell sharwell Feb 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this coudl almost all be on the stack.

The storage location is generally irrelevant. We wouldn't, for example, want to use ref struct for any of this. [NonCopyable] is the intended abstraction (data can exist anywhere, as long as it's only in one location at any given time).

I see at two paths forward that would offer good support for our goals in this implementation.

  1. I would like to expand [NonCopyable] to support the concept of "allows copies to read-only locations". This would work well for types that are initially mutable, but later transition to an immutable state.
  2. We could break up the mutable non-copyable types to have an associated type that represents a read-only final state. The latter would not hold resources and would be copyable.

@CyrusNajmabadi
Copy link
Member Author

@sharwell is the current approach viable?

Base automatically changed from master to main March 3, 2021 23:53
@CyrusNajmabadi
Copy link
Member Author

@sharwell what are next steps here?

@CyrusNajmabadi
Copy link
Member Author

@sharwell small reminder on this. thanks!

Copy link
Member

@sharwell sharwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Two comments are suggestions for changes either prior to merge or as an immediate follow-up:

  1. ref struct back to struct
  2. See if SimplePatternMatcher._fullPatternSegment can return to readonly

{
private readonly bool _includeMatchedSpans;
private readonly string _candidate;
private readonly ArrayBuilder<TextSpan> _candidateHumps;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the current approach for this PR. The available alternative is placing the non-copyable value in heap storage instead of stack storage, e.g. StrongBox<TemporaryArray<TextSpan>> or Tuple<>/class, depending on the number of values that need to be stored together. The two main downsides of this approach is it forces an additional allocation (perhaps not bad if the value can be used many times) and it forces all callers to change signatures.

@CyrusNajmabadi CyrusNajmabadi merged commit 5a2d726 into dotnet:main May 7, 2021
@ghost ghost added this to the Next milestone May 7, 2021
@CyrusNajmabadi CyrusNajmabadi deleted the patternTempArray branch May 7, 2021 18:48
@RikkiGibson RikkiGibson modified the milestones: Next, 17.0.P2 Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants