Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Merge behave consistently #4042

Closed

Conversation

pascalkuthe
Copy link
Member

@pascalkuthe pascalkuthe commented Sep 29, 2022

Supersedes #3695.

Currently, the HighlightEvent merge function overlays spans Vec<(usize, std::ops::Range<usize>)>`` on an Iterator of HighlightEvent`. The current merge doesn't handle overlapping spans well:

  • diagnostic ranges may be discarded if they overlap
  • ghost text may appear when a language server sends overlapping diagnostics

After a discussion with @the-mikedavis we realized that the solution in #3695 has a problem:
The spans must remain in a certain order to allow (somewhat performant) conversion to HighlightEvents.
However, with #3015 we usually want to show the highest priority diagnostic on top (so that an error is not overlapped by a hint for example). This is not possible with existing approach or the one in #3695.

However, a few key realizations lead to an alternative approach:

  • Implementing an overlay for non-overlappig spans is trivial and always produces the correct result
  • overlapping spans with the same Highlight scope can be merged together so that the resulting spans do not overlap
  • The only case where overlapping spans occur are diagnostics. However, there are only 5 different Highlight scopes for diagnostics: One for each severity. Therefore, the diagnostics can be separated into 5 sets of spans that all share the same highlights. These can then be used as an overlay as explained above.

This PR implements that approach.
The overlay I am referring to here is essentially merge (renamed it to make it more clear).
I implemented it from scratch because the old implementation was more complex than it needed to be and it was easier to integrate the merging directly into the iterator instead of handeling it in a seperate function.
Compared to previous implementation it should also perform a bit better (does not use any allocations for example).
This implementation can also automatically merge overlapping spans under the assumption that they have the same highlight. Therefore, there are two functions that produce such an iterator:

  • monotonic_overlay: accepts Spans (range + Highlight) that must be monotonically ascending (no overlaps and sorted by start)
  • overlapping_overlay: accepts a single Highlight and multiple Ranges. (sorted by start)

The overlapping_overlay is used to layer the diagnostic levels and the monotonic_overlay is used for cursor/selections (and in the future rainbow brackets).

As we are now layering up to 6 iterators on top of each other, I was a bit concerned about performance.
However, the new overlay implementation is very performant so the overhead is very small.
Furthermore, I refactored the calls to the iterators a bit to completely avoid dynamic dispatch.
When dynamically dispatching an iterator a v-table lookup is necessary for each call to next (occurs in the hot loop).
While LTO might theoretically move this out of the loop, the code is pretty complex so its probably not happening.
Benchmarking the syntax highlighting code is kind of tricky but my -not very scientific- glances at flamegraphs show that this is probably neutral or an improvement to performance.

I have not written an extensive test-tsuite as @the-mikedavis did for 3695 #because the algorithm is so simple that one can easily reason about all possible scenarios.
In particular all test-cases used there trivially work here.
The old merge implementation was more complex and didn't have any tests either.
However, a test-case that covers #2248 would probably be good, although I struggle to come up with one.
I am certain that this PR fixes that issue but until I can test it properly I am marking this as a draft.

This definitely fixes #2248

It might be good to rebase #2857 on this as that will probably produce the most overlays when thats merged.
However, the conflicts looked pretty daunting so I didn't get into that yet.

I merge #2857 with this and have not found any problems

@kirawi kirawi added A-core Area: Helix core improvements S-waiting-on-review Status: Awaiting review from a maintainer. labels Sep 29, 2022
@pascalkuthe pascalkuthe marked this pull request as draft September 29, 2022 21:35
@pascalkuthe pascalkuthe marked this pull request as ready for review October 1, 2022 11:41
Comment on lines 1196 to 1210
fn next(&mut self) -> Option<Self::Item> {
let res = match self.0.next()?.unwrap() {
// TODO: use byte slices directly
// convert byte offsets to char offset
HighlightEvent::Source { start, end } => {
let text = self.0.source;
let start = text.byte_to_char(ensure_grapheme_boundary_next_byte(text, start));
let end = text.byte_to_char(ensure_grapheme_boundary_next_byte(text, end));
HighlightEvent::Source { start, end }
}
event => event,
};

Some(res)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old maps on HighlightIter were doing two things:

  1. Unwrapping the result (which currently always returns Ok)
    • I'm concerned that helix_core is not the appropriate place to be unwrapping. HighlightIter returns Result items so that in the future we can implement cancellation of highlighting (timer based for example). There isn't a concrete plan for that AFAIK so it's hard to say how/where the Result should be handled.
  2. Translating bytes to chars in Source events.
    • @archseer does the TODO for using byte slices applies anymore? I think we can just translate bytes to chars in the Iterator implementation for HighlightIter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that helix_core is not the appropriate place to be unwrapping. HighlightIter returns Result items so that in the future we can implement cancellation of highlighting (timer based for example). There isn't a concrete plan for that AFAIK so it's hard to say how/where the Result should be handled.

I have moved unwrapping the closure back into helix-term in 25427ce altough it make the code simpler if we simply returned HighlightEvent from the HighlightIterator (panicking directly in the iterator) and then switched to a result once its actually required (because an unwrap is required everywhere the HighlighterIterator is used all call-sites will have to be changed in the future anyway). What do you think is better

@archseer does the TODO for using byte slices applies anymore? I think we can just translate bytes to chars in the Iterator implementation for HighlightIter

In markdown.rs the highlighting directly indexes into a string and therefore uses byte indices. Using char indices and translating back would be inefficient in that case so I am not sure there is a better solution right now.
Rewriting the rendering to use byte indices seems to amount to translating byte indices to chars anyways so there might not be a better approach.

helix-core/src/syntax/overlay.rs Outdated Show resolved Hide resolved
helix-core/src/syntax/overlay.rs Outdated Show resolved Hide resolved
helix-core/src/syntax/overlay.rs Outdated Show resolved Hide resolved
helix-core/src/syntax/overlay.rs Outdated Show resolved Hide resolved
helix-core/src/syntax/overlay/test.rs Outdated Show resolved Hide resolved
helix-term/src/ui/editor.rs Outdated Show resolved Hide resolved
)
})
.collect()
let diagnostic_ranges = doc.diagnostics().iter().filter_map(move |diagnostic| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We end up filtering doc.diagnostics() for each severity, right? Could we iterate over it once and separate out the diagnostics into a HashMap<Severity, SmallVec<Range<usize>> to save some work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the extra allocations would outweigh the cost of filtering here. I think SmallVec would be even worse because you essentially get an extra branch on every call to next() so the overhead on the iteration would be similar to just filtering.

helix-term/src/ui/editor.rs Outdated Show resolved Hide resolved
Comment on lines 1502 to 1505
pub enum DocSyntaxHighlight<'d> {
TreeSitter(CharacterHighlightIter<'d>),
Default(iter::Once<HighlightEvent>),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that we can use static dispatch for everything, right? As opposed to a Box of a dyn trait implementation?

This enum makes the uses of doc_syntax_highlights awkward (looking at the usages in picker.rs and here in editor.rs) and I think what we lose in elegance of the code might not be worth the performance improvement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this was a bit ugly, this is one of the rare cases where the limitations of the rust type system are kind of showing (generic assoicated types would help here or generic closures in general).

I managed to implement an alternative solution in 2c6f681 that turned out quite good.
The idea is that we never return the iterator and instead call the rendering code in the same function.
An Overlay (like diagnostics, selections) are passed as parameter that implements the HighlightOverlayso that they are generic over the input Iterator. HighlightOverlay are composable using tuples. For example the diagnostics Layers are implemented by creating a tuple of 5 DiagnosticOverlays (hint_overlay, info_overlay, unkown_overlay, warning_overlay, error_overlay).

@the-mikedavis
Copy link
Member

I hacked together an FPS counter for the render loop: the-mikedavis@02ae7ad and this change looks costly based on those rough numbers. I see master at 868 FPS and this branch at 805 FPS for the same code when there are no diagnostics. It seems like there is some amount of overhead for overlaying an empty vec on the highlight stream that I think can be avoided.

I have some local changes that take care of the merging of overlapping diagnostics within Editor::doc_diagnostics_highlights which seems to be fast; I'll make a PR soon. I think it should be possible to solve this with minimal or no changes to Merge and maybe come out with a performance boost when there are no diagnostics.

@pascalkuthe
Copy link
Member Author

closed in favor of #4113

@pascalkuthe pascalkuthe closed this Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-core Area: Helix core improvements S-waiting-on-review Status: Awaiting review from a maintainer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Characters rendering multiple times
3 participants