Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code syntax hightlighting #1484

Closed
Omikhleia opened this issue Jul 20, 2022 · 14 comments
Closed

Code syntax hightlighting #1484

Omikhleia opened this issue Jul 20, 2022 · 14 comments
Labels
enhancement Software improvement or feature request

Comments

@Omikhleia
Copy link
Member

Omikhleia commented Jul 20, 2022

As part of the efforts on native Markdown overhaul #1481, to ensure all features are discussed and addressed, I'd like to raise the question of code/syntax highlighting.

We could possibly have two use cases for supporting nice syntax highlighting for snippets of code:

  • Markdown fenced blocks, e.g. ~~~lua ... (with backticks too, but that's annoying to escape here 🤣 )
  • Our own SILE manual, to make it more colorful and eye-pleasing 💃

Some possible options (non-exhaustive):

  • Ditch the idea
    • It's perhaps cool, but not that mandatory
    • Personally, I have little interest in it, except for the sake of covering Markdown nicely (but my own usage of Markdown doesn't need code highlighting)
  • Consider code highlighting strategies and solutions:
    • Minimal approach with our own implementation = perhaps with Lua only in mind, e.g. deriving something "quick" from Lua lexers such as those mentioned here
    • Larger-scope approach with a 3rd party highlighter
      • But which one?
      • As a PoC in my own Pandoc Lua writer, I used leafo/lua-syntaxhighlight
        • Pros: Lua, MIT-licensed, working with modifications, having a rockspec (I used a "vendored" copy in my PoC, but I guess luarocks could be used to avoid it)
        • Cons: Derived from old Textadept code (AFAIK Textadept has switched to C++ etc.), unknown maintenance status.
          • As I did for lunamark, I could contact the repository owner and check whether changes and/or updates would be possibly accepted. But I will not do it without a "charter" from the SILE organization.

What could be our take on this topic and are other end-users interested too?

@ctrlcctrlv
Copy link
Member

I have a suggestion.

I think that writing any syntax highlighting code is a waste of time (although I like wastes of time as much as anyone).

I decided to figure out finally how to fool vimpager into working with aha and I got it!

script -qc 'vimpager --force-passthrough ~/Workspace/glifparser.rlib/src/glif.rs' | dos2unix -f | aha > /tmp/aha.html

Here's some output:

(-b)

image

(no -b)

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!-- This file was created with the aha Ansi HTML Adapter. https://github.com/theZiz/aha -->
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="application/xml+xhtml; charset=UTF-8"/>
<title>stdin</title>
</head>
<body>
<pre>
<span style="color:#ffd7d7;">//! [`Glif`] (`&lt;glif&gt;` toplevel), read/write modules, + [`Lib`]</span>

<span style="color:olive;">use</span> <span style="color:#5fd7ff;">std</span><span style="color:#ffd7d7;">::</span>path;

<span style="color:olive;">use</span> <span style="color:#5fd7ff;">crate</span><span style="color:#ffd7d7;">::</span><span style="color:#5fd7ff;">anchor</span><span style="color:#ffd7d7;">::</span>Anchor;
<span style="color:olive;">use</span> <span style="color:#5fd7ff;">crate</span><span style="color:#ffd7d7;">::</span><span style="color:#5fd7ff;">component</span><span style="color:#ffd7d7;">::</span>GlifComponents;
<span style="color:olive;">use</span> <span style="color:#5fd7ff;">crate</span><span style="color:#ffd7d7;">::</span><span style="color:#5fd7ff;">error</span><span style="color:#ffd7d7;">::</span>GlifParserError;
<span style="color:olive;">use</span> <span style="color:#5fd7ff;">crate</span><span style="color:#ffd7d7;">::</span><span style="color:#5fd7ff;">guideline</span><span style="color:#ffd7d7;">::</span>Guideline;
<span style="color:#5fd7ff;">#[cfg(feature = </span><span style="color:purple;">&quot;glifimage&quot;</span><span style="color:#5fd7ff;">)]</span>
<span style="color:olive;">use</span> <span style="color:#5fd7ff;">crate</span><span style="color:#ffd7d7;">::</span><span style="color:#5fd7ff;">image</span><span style="color:#ffd7d7;">::</span>GlifImage;
<span style="color:olive;">use</span> <span style="color:#5fd7ff;">crate</span><span style="color:#ffd7d7;">::</span><span style="color:#5fd7ff;">point</span><span style="color:#ffd7d7;">::</span>PointData;
<span style="color:olive;">use</span> <span style="color:#5fd7ff;">crate</span><span style="color:#ffd7d7;">::</span><span style="color:#5fd7ff;">outline</span><span style="color:#ffd7d7;">::</span>{Outline, OutlineType};

<span style="color:olive;">mod</span> <span style="font-weight:bold;color:teal;">conv</span>;
<span style="color:olive;">mod</span> <span style="font-weight:bold;color:teal;">lib</span>;
<span style="color:olive;">pub</span> <span style="color:olive;">use</span> <span style="color:#5fd7ff;">lib</span><span style="color:#ffd7d7;">::</span>Lib;
<span style="color:olive;">mod</span> <span style="font-weight:bold;color:teal;">read</span>;
<span style="color:olive;">pub</span> <span style="color:olive;">use</span> <span style="color:#5fd7ff;">self</span><span style="color:#ffd7d7;">::</span><span style="color:#5fd7ff;">read</span><span style="color:#ffd7d7;">::</span>read_ufo_glif <span style="color:olive;">as</span> read;
…

It's then a matter of just using SILE's XML mode, right?

Significantly less complicated starting commands exist btw, such as pygmentize:

pygmentize ~/Workspace/glifparser.rlib/src/glif.rs | aha -b > /tmp/aha.html

image

But vim is the gold standard for syntax highlighting, it supports thousands of languages…why use anything else tbh

@Omikhleia
Copy link
Member Author

@ctrlcctrlv Good points. In the same vein (using external tools):

  • It wouldn't be that undoable to implement an inputter interpreting the ANSI escape sequences, and thus avoiding the extra requirement for aha after vimcat/vimpager or pygmentize. (The HTML/CSS subset it produces would not be that hard to parse with a dedicated XML package, but it doesn't feel as robust)
  • I was considering pygmentize too, which has many output formatters besides ANSI colors. (The minimum necessary subset of) bb code (or even roff), for instance, would also be an easy to reach target, with the additional interest of providing yet another supported input format in SILE.

@ctrlcctrlv
Copy link
Member

ctrlcctrlv commented Aug 6, 2022

Good point about robustness. What about calling Window.getComputedStyle(el).color in a headless browser? That means you don't have to worry about the CSS at all. You can figure out if an element is bold the same way.

@Omikhleia
Copy link
Member Author

Omikhleia commented Aug 6, 2022

in a headless browser?

I am not sure I understand here. If SILE has to rely on a headless browser as 3rd party for such a simple thing... We could as well ditch SILE as a whole and use that headless browser for everything (with those W3C Paged Media things such as paged.js)... It's very delicate, at best, to let headless browsers enter the game ;)

@ctrlcctrlv
Copy link
Member

Well that's not really what I meant. Since all we need to 'compute' is whether an element is bold and its color, I more meant to figure out what libraries do that :)

@Omikhleia
Copy link
Member Author

On the "minimal" approach (= Lua only, for our manual), I just noticed Penlight include a (basic) Lua lexer. Might be a good start point for supporting at least Lua syntax highlighting without any new dependency.

@ctrlcctrlv
Copy link
Member

How often are people using SILE to write about Lua outside our own documentation? One of my papers includes syntax highlighted Perl, and I would've liked to do it with SILE. Supporting it in Lua biases future work to also be in Lua.

@ctrlcctrlv
Copy link
Member

Maybe useful — https://github.com/rdlaitila/LURE

@Omikhleia
Copy link
Member Author

How often are people using SILE to write about Lua outside our own documentation? One of my papers includes syntax highlighted Perl, and I would've liked to do it with SILE. Supporting it in Lua biases future work to also be in Lua.

Yes, you are on point, and that's why I opened this issue ;) - I mean, there are really two things here:

  • Having a "vivid" SILE manual out-of-the box, without needing 3rd party dependencies. I'll give a chance to the Penlight Lua lexer here -- It comes for free since we have it already included. It's KISS and a low-hanging fruit that's better than nothing.
  • Having a more general solution, which as discussed above might very likely need other 3rd party tooling that might not be shipped with SILE (= It's hard to imagine imposing SILE to also require some vim-tooling or Python to be installed, etc.).

@ctrlcctrlv
Copy link
Member

I agree with you there, I just had my best idea yet although you've already had it :)

It wouldn't be that undoable to implement an inputter interpreting the ANSI escape sequences, and thus avoiding the extra requirement for aha after vimcat/vimpager or pygmentize. (The HTML/CSS subset it produces would not be that hard to parse with a dedicated XML package, but it doesn't feel as robust)

For some reason I thought ANSI was so arcane that it'd be easier to handle it in HTML/CSS, not so. I think that working on Kitty terminal emulator has jaded me a little too much haha. But I looked at aha's source code and it's not so bad!¹

https://github.com/theZiz/aha/blob/5eaec96aae98274d1d86b431b9d426d50a023fbc/aha.c#L735

I think we should refocus the problem around ANSI instead of HTML/CSS. I don't think it'd be hard to write an LPEG parser for ANSI now that I think about it.

¹ There's nothing wrong with Kitty, it just supports so many terminal standards that the problem seemed more daunting than it actually is. There's no reason we should need to support Sixels, images, Kitty's fancy box drawing with sprites, etc.

@ctrlcctrlv
Copy link
Member

This code seems to get us very close.

https://github.com/spc476/LPeg-Parsers/blob/c2f69b0a9d665e1afda45eae5fd362eb6f882458/utf8/control.lua#L27-L42

It'd then be up to users how they make the files, be it with vimpager via script, pygmentize, etc.

@rjmunro
Copy link
Contributor

rjmunro commented Aug 9, 2022

@ctrlcctrlv Will I have to make the ANSI coloured files in advance, or could I use a normal markdown file and tell sile that to turn a ``` section into ANSI it needs to call vimpager or whatever?

@alerque
Copy link
Member

alerque commented Aug 9, 2022

I haven't dug into this issue, but a few comments:

  • At this point anything that requires 3rd party tooling should be an external package. We have the tooling to make that easy as of v0.14.0. Even if we eventually want to move it to core there is a significant cost to testing and distributing extra dependencies. At the very least such new packages will be expected to start life as an external package. I can add them to this org and distribute them with SILE's luarocks account, but that's the starter path..

  • I will look very skeptically on anything that needs a headless browser to run. There should be an easier path from a language to a syntax highlighted AST.

@rjmunro That question wasn't for me but given the current architecture it should never be required to prepare some special format in advance, a SILE package can call out to whatever tooling is required to transform input into whatever format it needs. I think the discussion here is just about what tooling to call out to.

@Omikhleia
Copy link
Member Author

Omikhleia commented Nov 17, 2022

Thanks for the fruitful discussion.

When I opened that issue in July 2021, I hadn't used yet the new luarocks-based packaging system. I wholly agree with @alerque 's last comment that such features are, likely, best addressed by 3rd-party packages (that then have the choice of using the implementation and/or external tools they want, without adding more dependencies to SILE's core distribution itself).

I added to #1194 one small idea mentioned here (using Penlight's Lua lexer for e.g. autodoc to have a naive Lua syntax-hightlighter for in-documentation snippets of Lua code). Other than that, I consider my issue/question as wholly answered.

@alerque alerque added the enhancement Software improvement or feature request label Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Software improvement or feature request
Projects
None yet
Development

No branches or pull requests

4 participants