-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch pattern matching to use TemporaryArray #51035
Conversation
{ | ||
private readonly bool _includeMatchedSpans; | ||
private readonly string _candidate; | ||
private readonly ArrayBuilder<TextSpan> _candidateHumps; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was annoying. can't point to a struct by ref as a field. So this had to be a parameter passed to all methods instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the current approach for this PR. The available alternative is placing the non-copyable value in heap storage instead of stack storage, e.g. StrongBox<TemporaryArray<TextSpan>>
or Tuple<>
/class
, depending on the number of values that need to be stored together. The two main downsides of this approach is it forces an additional allocation (perhaps not bad if the value can be used many times) and it forces all callers to change signatures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do a pass to switch from ref
to in
for the new parameters in cases where they are not modified?
Yup. Brainfart on my part. Thanks! |
@sharwell ready for another view. |
@@ -29,7 +30,7 @@ private struct TextChunk : IDisposable | |||
/// capitalized runs and lowercase runs. i.e. if you have AAbb, then there will be two | |||
/// character spans, one for AA and one for BB. | |||
/// </summary> | |||
public readonly ArrayBuilder<TextSpan> PatternHumps; | |||
public TemporaryArray<TextSpan> PatternHumps; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❗ This is a hole in the non-copyable analysis. Can't have a non-copyable field of a copyable structure. Need to either mark TextChunk
as non-copyable, or remove this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: is this actually bad here? i get the potential hole in teh analysis, but in this case the usage is all correct. this data is effectively readonly (i would actually make it readonly if i could, but the analyzer gets annoyed in other ways). So as long as the text chunk is kept around, it's safe to use. THen, we end up disposing the TextChunk when no longer needed, which frees the PatternHumps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: is this actually bad here?
Yes. Non-copyable analysis is intended to only allow false positives (might be copied but wasn't) as opposed to approximation features like NRT that also allow false negatives. [NonCopyable]
provides the reader with an absolute correctness guarantee that can be relied on without any additional knowledge of code outside the current context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fair. do you have a sense of how the design shoudl work here? For pattern matching we have a search pattern (that we want to break into a small, finite amount of pre-processed info that we likely can keep on the stack). We then compare against candidates that we similarly break up.
Ideally this coudl almost all be on the stack. The only exceptions being when breaking up that info literally does require more than teh fixed amount of the TempArray and has to spill to the heap.
One interesting thing though is that there are a few arrays-of-arrays going on. FOr example a pattern-segment has an array of text-chunks. And teach text-chunk has an array of camel-humps.
DO you think there's a way to rejigger this to work with your stack-affinitized approach currently? If not, do you see the work coming in the next C# (for example, ref-fields) as helping out here?
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this coudl almost all be on the stack.
The storage location is generally irrelevant. We wouldn't, for example, want to use ref struct
for any of this. [NonCopyable]
is the intended abstraction (data can exist anywhere, as long as it's only in one location at any given time).
I see at two paths forward that would offer good support for our goals in this implementation.
- I would like to expand
[NonCopyable]
to support the concept of "allows copies to read-only locations". This would work well for types that are initially mutable, but later transition to an immutable state. - We could break up the mutable non-copyable types to have an associated type that represents a read-only final state. The latter would not hold resources and would be copyable.
@sharwell is the current approach viable? |
@sharwell what are next steps here? |
@sharwell small reminder on this. thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. Two comments are suggestions for changes either prior to merge or as an immediate follow-up:
ref struct
back tostruct
- See if
SimplePatternMatcher._fullPatternSegment
can return toreadonly
src/Workspaces/Core/Portable/PatternMatching/AllLowerCamelCaseMatcher.cs
Outdated
Show resolved
Hide resolved
src/Workspaces/Core/Portable/PatternMatching/SimplePatternMatcher.cs
Outdated
Show resolved
Hide resolved
src/Workspaces/Core/Portable/PatternMatching/AllLowerCamelCaseMatcher.cs
Show resolved
Hide resolved
{ | ||
private readonly bool _includeMatchedSpans; | ||
private readonly string _candidate; | ||
private readonly ArrayBuilder<TextSpan> _candidateHumps; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the current approach for this PR. The available alternative is placing the non-copyable value in heap storage instead of stack storage, e.g. StrongBox<TemporaryArray<TextSpan>>
or Tuple<>
/class
, depending on the number of values that need to be stored together. The two main downsides of this approach is it forces an additional allocation (perhaps not bad if the value can be used many times) and it forces all callers to change signatures.
This code is a prime candidate for temporary arrays as most often teh number of chunks/matches we have will be <=4, allowing us to process things without any allocs or need to go to pools. This also simplifies our impl a bunch as we don't have to do complex work around avoiding even accessing the Pool for our array builders.