Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Highlight non-printable characters #395

Merged
merged 4 commits into from
Nov 1, 2018
Merged

Conversation

sharkdp
Copy link
Owner

@sharkdp sharkdp commented Nov 1, 2018

Adds a new -A/--show-all option (in analogy to GNU Linux cats option) that
highlights non-printable characters like space, tab or newline.

This works in two steps:

  • Preprocessing: replace space by , replace tab by ├──┤, replace
    newline by , etc.
  • Highlighting: Use a newly written Sublime syntax to highlight
    these special symbols.

without --show-all

image

with --show-all

image

Remarks:

Adds a new `-A`/`--show-all` option (in analogy to GNU Linux `cat`s option) that
highlights non-printable characters like space, tab or newline.

This works in two steps:
- **Preprocessing**: replace space by `•`, replace tab by `├──┤`, replace
newline by `␤`, etc.
- **Highlighting**: Use a newly written Sublime syntax to highlight
these special symbols.

Note: This feature is not technically a drop-in replacement for GNU `cat`s
`--show-all` but it has the same purpose.
@HenrikBengtsson
Copy link

Thank you - this is great! A few comments:

  • bat --show-all --tabs 1 is rendered identically to bat --show-all --tabs 2, i.e. ├┤ (== followed by ). Could, say, (U+21E5, RIGHTWARDS ARROW TO BAR, rightward tab), or (U+21B9, LEFTWARDS ARROW TO BAR OVER RIGHTWARDS ARROW TO BAR, tab with shift tab) be used for --tabs 1? [https://en.wikipedia.org/wiki/Tab_key#Unicode]

  • I think LF () should be used to render the line-feed symbols \n (ASCII 0x0A) rather than NL (), which I think should be reserved for \025 (ASCII 0x15) [https://en.wikipedia.org/wiki/Newline]. This way the rendering of \n and \r\n matches LF (Unix) and CRLF (Windows), which is how we typically refer to the different new-line alternatives.

  • Although not needed to be implemented right away, but in relation to previous comment, plan ahead for adding support for other non-printable ASCII symbols, e.g. newline NL (\025, ASCII 0x15) and record separator RS (\036, ASCII 0x1E), vertical tab VT (\v, ASCII 0x0B), and form feed FF (\014, ASCII 0x0C).

@sharkdp
Copy link
Owner Author

sharkdp commented Nov 1, 2018

@HenrikBengtsson Thank you very much for your detailed feedback!

  • bat --show-all --tabs 1 is rendered identically to bat --show-all --tabs 2, i.e. ├┤ (== followed by ). Could, say, (U+21E5, RIGHTWARDS ARROW TO BAR, rightward tab), or (U+21B9, LEFTWARDS ARROW TO BAR OVER RIGHTWARDS ARROW TO BAR, tab with shift tab) be used for --tabs 1? [en.wikipedia.org/wiki/Tab_key#Unicode]

Yes, good idea! I was using initially before I implemented the ├──┤ style. I will use it in the case where --tabs=1.

  • I think LF () should be used to render the line-feed symbols \n (ASCII 0x0A) rather than NL () [...] This way the rendering of \n and \r\n matches LF (Unix) and CRLF (Windows), which is how we typically refer to the different new-line alternatives.

Absolutely. I changed it and it will now output either LF or CRLF at the end of a line - much better!

Hm... 0x15 seems to be something else: Negative Acknowledgement / NAK (https://en.wikipedia.org/wiki/ASCII). I have always thought of "newline" as a synonym for "line feed", but I might be wrong...

Although not needed to be implemented right away, but in relation to previous comment, plan ahead for adding support for other non-printable ASCII symbols

More non-printable characters should be very easy to add in the future. One entry in preprocessor.rs and another in the show-nonprintable.sublime-syntax.

@HenrikBengtsson
Copy link

which I think should be reserved for \025 (ASCII 0x15) [https://en.wikipedia.org/wiki/Newline]

Hm... 0x15 seems to be something else: Negative Acknowledgement / NAK (https://en.wikipedia.org/wiki/ASCII). I have always thought of "newline" as a synonym for "line feed", but I might be wrong...

Yeah, you're correct. I was too quick on "NL" in https://en.wikipedia.org/wiki/Newline:

"IBM mainframe systems, including z/OS (OS/390) and i5/OS(OS/400) | EBCDIC | NL | 15 | 21 | \025"

That's not part of the ASCII standard.

@HenrikBengtsson
Copy link

Just tested the updated PR; looks great to me.

@sharkdp sharkdp merged commit e81f9b2 into master Nov 1, 2018
@sharkdp sharkdp deleted the show-nonprintable branch November 1, 2018 21:00
@sharkdp sharkdp mentioned this pull request Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WISH: Visualize/highlight different whitespace symbols
2 participants