Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add syntax highlighting support with syntect crate #313

Merged
merged 11 commits into from
Feb 4, 2024

Conversation

erratic-pattern
Copy link
Contributor

@erratic-pattern erratic-pattern commented Nov 3, 2023

To do after finalizing design:

  • test cases
  • better docs

@erratic-pattern erratic-pattern force-pushed the feature/syntect branch 4 times, most recently from 468a7a1 to ba65bae Compare November 3, 2023 21:53
@erratic-pattern
Copy link
Contributor Author

A bunch of tests currently fail with syntect feature flag because it inserts a bunch of unexpected ANSI codes into outputs. I'm thinking the best way to deal with that is to add a pub(crate) helper function to disable highlighting that those tests can call.

@olivia-fl
Copy link
Contributor

This is super cool!

For the tests issue, I'm wondering if this should be hooked up with GraphicalTheme somehow. Currently, the ANSI codes generally aren't an issue in tests because the tests use GraphicalTheme::unicode_nocolor. Even outside of tests, I would find it a bit surprising if specifying unicode_nocolor still results in color output by default. Unfortunately, I'm not sure there's a backwards-compatible way to do this, since GraphicalTheme and ThemeStyles both have exhaustive public members.

Similarly, it looks like this PR uses truecolor for the syntax highlighting, while the default theme for the rest of the output uses ANSI escapes (see #176 for the rationale). If it's feasible, it's probably best if this was consistent.

@erratic-pattern
Copy link
Contributor Author

erratic-pattern commented Nov 3, 2023 via email

@olivia-fl
Copy link
Contributor

Is there a way to map truecolor to ANSI? Currently it's just converting the
RGB values from the syntect Style to an RGB in owo-colors Style (and doing
some alpha channel blending if a background is present, but I currently
have background colors off)

Oof, that's gonna be tricky. There is no straightforward way to map RGB to 16-color ANSI because the actual RGB values of the ANSI colors will vary arbitrarily depending on how the terminal emulator is configured. This information is not exposed to a terminal application.

Ideally, you'd have the style information from the highlighter stored in a form that can be converted reasonably to both truecolor and ANSI. A common way to do this is to have your "theme" specify a separate ANSI value and truecolor value for each type of highlight. This is effectively what miette does currently. Another common approach is to specify only ANSI in the theme, and then hardcode a mapping from ANSI to truecolor with a reasonable color scheme. The latter loses a bit of flexibility (for example, if you want to have more distinct colors in your truecolor output than ANSI supports).

Skimming through the syntect docs, it looks like it does neither of these, and instead just uses truecolor directly in the theme. I think getting good ANSI output is either gonna require one of:

  • skipping out on syntect's theme mechanism entirely and implementing it ourselves.
  • writing a invertable mapping from ANSI colors to a "magic values" subset of truecolor, and then using that to smuggle the ANSI values through syntect.
  • doing something in upstream syntect.

I could be missing something, but this looks like a really thorny problem. My opinion is that it's probably not worth blocking the syntax highlighting support in miette on solving it, particularly if syntax highlighting is not enabled by default.


impl<'h> HighlighterState for SyntectHighlighterState<'h> {
fn highlight_line<'s>(&mut self, line: &'s str) -> Vec<Styled<&'s str>> {
let use_bg_color = false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to expose this as an option somewhere?

Comment on lines 708 to 709
.chars()
.zip(self.line_visual_char_width(&styled_text))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't zip the same value in a loop.
Create a single variable before loop, instead

src/handlers/graphical.rs Outdated Show resolved Hide resolved
src/handlers/graphical.rs Outdated Show resolved Hide resolved
@erratic-pattern
Copy link
Contributor Author

Added a comment to the issue with a gist of what the Theme struct looks like. #67 (comment)

Maybe we can come up with a way to map GraphicalTheme onto this, or add some new fields to GraphicalTheme specifically for syntax highlighting options. This would make nocolor detection via GraphicalTheme a lot easier too.

@erratic-pattern
Copy link
Contributor Author

With the help of Arc and a newtype wrapper I was able to sledgehammer the Highlighter trait object into GraphicalReportHandler without losing Debug or Clone. The Default implementation does the right thing with regards to NO_COLOR and TTY settings (except it currently doesn't support ANSI codes over truecolor).

With that change, I can now add with_syntax_highlighting and without_syntax_highlighting as methods to GraphicalReportHandler, which fixes broken tests because the fmt_report helper can now call those methods when emulating a nocolor environment.

I also renamed DefaultHighlighter to BlankHighlighter since it is not always the default, and the new name makes it more clear what it actually does.

@@ -27,11 +27,6 @@ impl NamedSource {
}
}

/// Gets the name of this `NamedSource`.
pub fn name(&self) -> &str {
Copy link
Contributor Author

@erratic-pattern erratic-pattern Nov 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice this, but this is a breaking change because SourceCode::name has to return an Option<&str> but NamedSource::name was returning just &str. Should we keep this for compatibility even though it shadows the new SourceCode method?

@olivia-fl
Copy link
Contributor

For testing, I'm thinking it might be a good idea to have most the tests in tests/graphical.rs strip the ANSI escapes in fmt_report, rather than configuring it so that they aren't generated in the first place. This way we're more likely to catch text width calculation bugs and similar.

@erratic-pattern
Copy link
Contributor Author

Looks like rustc 1.56 doesn't like the dep: prefix on features.

https://github.com/kallisti-dev/miette/actions/runs/6756721685/job/18366498929

Possible fixes here:

  1. rename the feature in Cargo.toml as syntect-highlighter = ["fancy-no-backtrace", "syntect"]
  2. delete the explicit syntect feature from Cargo.toml and use the implicit feature only. Requires users to enable both syntect and fancy (or fancy-no-backtrace)
  3. bump minimum rustc to 1.60 where dep: prefix is supported

I'm leaning toward option 1, if no one objects, even though it makes the feature name a bit more verbose than I wanted.

@erratic-pattern erratic-pattern force-pushed the feature/syntect branch 2 times, most recently from fa57777 to 6a80565 Compare November 4, 2023 20:19
@erratic-pattern
Copy link
Contributor Author

erratic-pattern commented Nov 5, 2023

For testing, I'm thinking it might be a good idea to have most the tests in tests/graphical.rs strip the ANSI escapes in fmt_report, rather than configuring it so that they aren't generated in the first place. This way we're more likely to catch text width calculation bugs and similar.

Just tried this and it seems to work for most tests, but breaks the URL tests because a simple ANSI stripping algorithm is too naive. I'd have to do lookahead to properly preserve links while deleting color codes.

Personally I'd rather leave the existing tests as nocolor and write new tests with STYLE=1 that specifically test tab formatting in the presence of color codes.

EDIT: the URL tests actually don't care about the presence of ANSI codes, so we could remove the links as well and avoid lookahead. I'd still rather avoid the additional complexity of stripping out ANSI stuff in the tests that were written to expect nocolor, and instead write new tests that are color-aware.

@erratic-pattern erratic-pattern force-pushed the feature/syntect branch 9 times, most recently from ab408df to 5b118b8 Compare November 11, 2023 17:46
@erratic-pattern erratic-pattern force-pushed the feature/syntect branch 3 times, most recently from c62d121 to 3a2df73 Compare November 11, 2023 19:53
@erratic-pattern erratic-pattern marked this pull request as ready for review November 11, 2023 19:54
@erratic-pattern
Copy link
Contributor Author

erratic-pattern commented Nov 11, 2023

Alright, I've added crate-level docs and test cases, so I'm comfortable marking this PR as ready.

Some lingering questions to resolve are:

  • What to do about SourceCode::name shadowing NamedSource::name.
  • What should the default theme be. See my comment on the issue for example screenshots.

Some items that could probably be new issues to indicate future work should be done:

  • How to make the logic of line_visual_char_width more robust. I think ideally it should be able to detect any possible escape sequence as having 0 visual width. The current logic should work fine for just color codes, but it's very naive and does not actually check that the escape sequence is well-formed. Making it more robust would allow users to pass in a greater variety of input strings and get properly formatted results.
  • How to make a custom syntect theme that plays nicely with the graphical reporter configuration.

Some notes on CI stuff:

  • I tried building with syntect-highlighter on Miri and the tests ran for 45+ minutes and then crashed without an error. Not sure what's going on there but I'm guessing some new dependency is really really slow on Miri?
  • I excluded syntect-highlighter from the MSRV build. syntect's MSRV policy is the last 3 stable releases. Not sure what you want to do there @zkat
  • minimal versions check passes with --features fancy,no-format-args-capture but --all-features is failing with weird gcc stuff that I'm not seeing when I run locally

@erratic-pattern erratic-pattern force-pushed the feature/syntect branch 4 times, most recently from 6c87bcd to fded543 Compare November 11, 2023 20:28
@erratic-pattern
Copy link
Contributor Author

Moving the language detection stuff over to SpanContents as @zkat originally suggested removes the name shadowing issue.

@zkat
Copy link
Owner

zkat commented Nov 15, 2023

looks like we have some conflicts that need resolved, and a minimal version check issue?

@erratic-pattern
Copy link
Contributor Author

looks like we have some conflicts that need resolved, and a minimal version check issue?

just resolved the merge conflict.

1d4967d fixes the CI error, but does so by excluding syntect-highlighter completely from the tested feature set.

the syntect-highlighter feature either requires a bump to the MSRV or requires exclusion from the current MSRV policy. the syntect MSRV policy according to their changelog is the last 3 stable releases.

as for the CI failure, the command cargo +nightly build -Z minimal-versions --all-features succeeds for me locally, but is failing on CI for some reason (seems to be related to gcc)

@erratic-pattern
Copy link
Contributor Author

(non-)update: I have not had time to investigate the CI failures with gcc on this change, but plan to soon.

I think the best path forward with the MSRV issue is keep the current MSRV but exclude the syntect feature from that definition, deferring to syntect's own MSRV policy in that case.

I think for Miri it might be best to just not test syntect feature against Miri for the time being, unless someone can identify the reason for the massive performance regression on CI. I will make an attempt to investigate but I can't promise there will be any solution yet.

Copy link
Owner

@zkat zkat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let's get this in and see how it goes! :)

@zkat zkat merged commit e65d0a7 into zkat:main Feb 4, 2024
0 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants