Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add the Control Pictures Unicode block #219

Closed
PhMajerus opened this issue Dec 17, 2019 · 23 comments · Fixed by #402
Closed

Feature Request: Add the Control Pictures Unicode block #219

PhMajerus opened this issue Dec 17, 2019 · 23 comments · Fixed by #402

Comments

@PhMajerus
Copy link
Contributor

PhMajerus commented Dec 17, 2019

We talked about control characters before, and how they are interpreted by the console or the terminal instead of printing characters.
When working in a terminal, it is sometimes helpful to visualize these in-band control sequences, Visual Studio Code even does it when showing files, by showing small "ESC", "SUB",... when the "Render Control Characters" option is enabled.

Unicode got the same idea, and included a block of Control Pictures (U+2400 to U+2426) in Unicode 12.0. These are designed to be able to represent the control characters on a terminal screen:
␀␁␂␃␄␅␆␇␈␉␊␋␌␍␎␏␐␑␒␓␔␕␖␗␘␙␚␛␜␝␞␟␠␡␢␣␤␥␦
(https://en.wikipedia.org/wiki/Control_Pictures)
When available, these can be used by CUI apps to provide a visual representation of these special characters, exactly like Visual Studio Code does.

Adding these 39 glyphs would make it possible for utilities such as hexdump to show them in text representations, which is much more helpful than having 34 of the 256 values show up as generic dots. It would even make it possible to a CUI text editor to provide the "Render Control Characters" option.

Windows Terminal currently falls back to another font to render these, but they are tiny and impractical for use in a terminal.

Below is a sample of a hexdump function showing the contents of cmd.exe with high-ascii and control characters (using the font fallback):
hexdump with Control Pictures
And the Ubuntu hexdump command showing the same file, with dots for high-ascii and control characters (so only 96 values out of 256 provide chars representations). (this one is not using Cascadia, but shows the limitation when these characters are not available).
hexdump without Control Pictures

@aaronbell
Copy link
Collaborator

aaronbell commented Dec 2, 2020

Been a while! I thought I would take a crack at these, and, well, they're hard to fit into that little box! I used a similar descending set of letters and thought I'd get your opinion. They're still a bit rough :x. Thanks!

Screen Shot 2020-12-01 at 8 59 32 PM

@aaronbell
Copy link
Collaborator

Forgot these!
Screen Shot 2020-12-01 at 9 16 05 PM

@PhMajerus
Copy link
Contributor Author

@aaronbell Thanks for looking into the control pictures.

I think you'll have an artistic and practical decision to make with those, balancing your font visual identity, readability, and familiarity.

The diagonal sets of letters seems to be the most common representation and the one used by the Unicode consortium: https://unicode.org/charts/PDF/U2400.pdf and Segoe UI Symbol.

Visual Studio Code, on the other hand uses horizontal sets of letters:
image

I'm not sure which would be easiest to read in a Terminal app at small sizes.

Alternatively, it seems square graphic symbols were once standardized for all of these, but not commonly used anymore (probably because the C0 control characters were originally designed with punched cards and serial terminals in mind).
See the rightmost column in the C0 table at https://www.aivosto.com/articles/control-characters.html#list_C0
Notice how SUB and DEL graphic symbols seem related to the modern ␦ and ␥. While not immediately familiar for today's users, they are all pretty representative of their intents and would be recognizable at small sizes. Maybe a case of "old enough to be new and fancy again".
This would be similar to the "Show/Hide ¶" symbols in Word, they might not make sense at first, but users working with them would get used to them and having simple and easily recognizable pictures at small sizes might be a benefit in the long run over the more explicit but hard to read sets of letters.

I think if Windows Terminal allowed me to set different fonts for specific Unicode ranges I would try to get these legacy graphic symbols in a font by themselves and use those, they seem more readable at my typical Terminal font size and I even though I never encountered these graphic representations before, I could get used to them easily enough.

@aaronbell
Copy link
Collaborator

Thanks for the info! I had actually originally designed the control characters to be full height 1/3 width but found that they were really difficult to read, especially when you have a bunch in a row (like your original sample image). As such, the diagonal form actually makes a lot of sense. I think, though, that I’ll continue to experiment with making them a bit more bigger (wider maybe?) to see what kind of results I can get.

The graphical form versions is a good idea too! Interestingly, many of those forms are already present in the geometric shapes codepage: http://www.unicode.org/charts/PDF/U25A0.pdf so I think it would be relatively straightforward to bring them in. I think it would make the most sense to add them as a stylistic set of the base versions so folks like you who want to give them have that option available :)

@aaronbell
Copy link
Collaborator

Screen Shot 2020-12-02 at 12 59 20 PM

Did a fun little mockup of the square graphic symbols. While I suspect maybe you and one other person would even use these intentionally, they're certainly more recognizable than the text versions. The only one that didn't have an alternate form is the NL (which appears to be NEL in the standard form / NL in the Unicode chart, and doesn't appear to be used as frequently as LF).

As for the textual variant, I found that increasing the width of the forms allowed for a more open, recognizable shape. Of course, some extra tweaking is necessary, but overall I think it is rendering well at smaller sizes (here at 14pt), even at the heavier end of the range:
Screen Shot 2020-12-02 at 1 21 45 PM
(do note this is unhinted, so it'll be a bit blurrier than an actually hinted version)

@PhMajerus
Copy link
Contributor Author

PhMajerus commented Dec 2, 2020

Here's the background story on LF, NL and NEL.

U+2400 to U+241F are pictures of all the C0 control characters from ASCII, mapped directly from their corresponding U+0000...U+001F control characters.
Pure ASCII contains only those and the extra SP (space) and DEL (delete).
These are the ones that also have legacy standardized square graphical representations, and could be considered a group if you take artistic liberties with them.

C1 control characters, which NEL is from, are a whole other set of codes only available in ANSI (extended ASCII / high-ASCII) and Unicode, and as far as I know those have no representation in the control pictures range.

My understanding is that all the remaining pictures ␢ ␣ ␤ ␥ ␦ in U+2422 to U+2426 are general-purpose representation of hidden characters for CUI apps that wish to show simpler symbols, for example for a text editor aimed at less technical people.
The Unicode chart at https://unicode.org/charts/PDF/U2400.pdf seems to confirm that, as these extras are not labelled as "control codes". They probably should therefore be kept similar to their original design as it means an app using them made the explicit choice of using these alternate versions for style and might use them for app-specific things unrelated to C0 control characters.

"NL" is documented by Unicode as being a symbol for New Line, which isn't the same as the NEL (Next Line) from C1 control characters.
The origin of the term "New Line" is that operating systems didn't agree on what constitutes a proper sequence of control characters to start a new line. CR is originally a carriage return, as in returning the typewriter to the left position, but not changing the line, while LF is a line feed, shifting the page to the next line, but not moving the horizontal position. CP/M, MS-DOS and Windows got it right and use the combination CR+LF to start a new line. Unix figured it could save a byte by using LF alone, disregarding the standard, while Apple decided to Think Different and therefore settled on using CR alone (in MacOS classic, they now moved to a Unix LF in OSX).
So New Line isn't a control character, but a generic term for different combinations of control character(s) that produce a newline on different platforms. This means an app such as a text editor might choose to show the "NL" picture for whichever sequence is appropriate for the underlying platform to provide consistency to the user across different operating systems instead of revealing exact C0 control sequences. If you want to change it into a symbol, something generic like Word's new line symbol ↵ is probably fine.

NEL (Next Line) on the other hand is a C1 control character for compatibility with IBM mainframe's EBCDIC encoding, which had something similar to LF but not exactly, and therefore needed a separate control character for text exchange.
None of the C1 control characters have graphical representation in the U+24xx range, so I'm pretty confident NL stands for New Line and is unrelated to Next Line.

@DHowett
Copy link
Member

DHowett commented Dec 3, 2020

Wow, I'm really liking how these are shaping up here in #219 (comment). You're right, they felt a little cramped in the first pass you made 😄

@aaronbell
Copy link
Collaborator

Thanks @DHowett! Do we dare put the graphical versions as default? :D

@DHowett
Copy link
Member

DHowett commented Dec 3, 2020

Aw, alas. I use them for debugging Terminal, so I would need to learn a whole new language if we do that! 😁

I'm not against it, for sure. haha

@aaronbell
Copy link
Collaborator

I’ll give you a sample version to mull over ;)

@PhMajerus
Copy link
Contributor Author

@aaronbell @DHowett Hey! no fair! why the microsofties-only version?! 😭

Looking at #219 (comment), it seems 2-letters symbols are going to be more readable than 3-letter ones. (I still like the graphics symbols, but understand many users might not want to have to learn another set of symbols.)

Since the 2-letter abbreviations for C0 codes are less common but nonetheless standardized (https://en.wikipedia.org/wiki/ISO_2047), what about a variant that uses the 2-letter versions for the whole set? This would make their size more consistent than a mix of 3-letter and 2-letter, and probably would help readability at small sizes. Users used to the more common 3-letter versions can probably figure out the shortened ones without too much trouble.

@aaronbell
Copy link
Collaborator

@PhMajerus Don't worry, I won't exclude you :).

@aaronbell
Copy link
Collaborator

@PhMajerus @DHowett Alright, sorry for the delay :)

Here's a demo version of the font, named Cascadia CTRL:
CascadiaCNTRL.zip

A couple of notes:

  • Font is autohinted, exported out of Glyphs. So rendering will not be as clear as it would be in the real version. And other things might not work quite as usual. This is a test!
  • Base version has 2 letter / 3 letter variants

Screen Shot 2020-12-15 at 7 34 00 PM

  • ss19 has all 2 letter variants

Screen Shot 2020-12-15 at 7 34 13 PM

  • ss20 has the graphical versions

Screen Shot 2020-12-15 at 7 34 30 PM

IIRC, Windows Terminal (and I think VSCode) let you set stylistic sets. Give it a try and see what you think!

@PhMajerus
Copy link
Contributor Author

PhMajerus commented Dec 16, 2020

@aaronbell Thanks for doing all 3 variants, they all look great and I feel each have their benefits.
I cannot find how to set the variants in either Code or Terminal in their json settings documentation, but judging from the default 2-3 letters variant I could try and the pictures you posted, I really like all 3 designs.
The more vertically-stacked letters giving them more overlap than the common 45° diagonal representations makes them both very readable and more distinct when several control characters are following each other, this really works well.

I'm really curious to try the all 2 letters variant in Terminal as the 3 letter ones are a bit too small for me at my usual font size.
@DHowett did I miss something in the Terminal documentation for font variants or is that something that's not yet implemented?

@mdtauk
Copy link

mdtauk commented Dec 16, 2020

Implementing them would be a tricky prospect, at least in an easily discoverable way.

Not every font will offer alternate Stylistic Sets - so unless Cascadia Code is treated as a special case, and extra settings show up when its the chosen font - the only way to implement it would be to allow settings a stylistic set for all fonts that include them.

And then, these sets don't include names, and showing the user what these stylistic sets are used for, would be impossible.

@aaronbell
Copy link
Collaborator

@mdtauk My putting them in stylistic sets is purely for testing purposes—so that they are somewhat accessible for y'all to take a look at. I expect that for the final version, we'll lock to one of the three approaches.

@aaronbell
Copy link
Collaborator

@PhMajerus - Here's the setup for VSCode at least: microsoft/vscode#80577

@mdtauk
Copy link

mdtauk commented Dec 16, 2020

@mdtauk My putting them in stylistic sets is purely for testing purposes—so that they are somewhat accessible for y'all to take a look at. I expect that for the final version, we'll lock to one of the three approaches.

That is fair enough, but there is little reason not to include stylistic sets for the sake of a more complete typeface - even if Terminal doesn't provide a user facing way to change it

@PhMajerus
Copy link
Contributor Author

PhMajerus commented Dec 16, 2020

@aaronbell Ah, sorry, I didn't realize stylistic sets were selected by ligatures options. Thanks for the info.
So VS Code supports them in the editor but not in the built-in terminal for performances reasons, but that was enough to see how each looks like by copy/pasting from terminal to editor.

After trying all 3, first they all look great!

I really wanted to try to get used to the graphical symbols (ss20), hoping they would be the most readable and faster to scan through once used to them, but testing them mixed with other characters it quickly becomes apparent that they probably only work well when shown in ASCII-only strings, as then the set of other characters present is very limited and does not include any graphical character that could be confused with them.
Once a string contains a larger set of graphical characters from Unicode, it becomes more difficult to differentiate them from other graphics, and they lose their benefits.
I think they look great and, if it doesn't have any negative impact like file size or performance when not used, they should be kept as an option, but they probably will only be practical in specific scenarios like debugging ASCII-based communications. I could see them used in a hex+ascii file editor or a serial monitor for example.
image

I find the 2 letters (ss19) variant really good for readability, and while it will require some time to feel natural as we're more used the the 2-3 letters, I would probably use the 2 letters one for Terminal. This is probably very dependent on font size and DPI, but testing on both a 1920x1080 monitor at 100% and a Surface Book 2 at 200%, the 3 letters ones are slower to read because they end up less well-defined. This could still change with hinting though.
image
image

image
image

I think using the 2 letters variant as the default could provide better readability and discoverability.
By this I mean someone used to the 2-3 letters and confused by the 2 letters-only is more likely to investigate and find information online about the variants and how to select the other one. On the other hand, someone finding the 3 letters hard to read is more likely to just increase their font size or change to another font than to ever learn about the variants that could have improved their use of Cascadia.

@aaronbell
Copy link
Collaborator

Thanks for the review @PhMajerus!

Your experience aligns pretty well with what I suspected might be the case. The graphical variants are fun, but are difficult to parse in real life scenarios. I would be tempted to leave them there, but I think I have to be honest with myself that the likelihood of anyone using them when they're hidden behind OpenType is quite low—even modern coding / terminal environments don't necessarily support stylistic sets, let alone anything older.

Between the 2-letter and mixed settings, it makes sense that the 2 letter variant would render more clearly. With proper hinting they'll perform markedly better with clearer differentiation between the letters, whereas the mixed setting will likely only perform similarly, or slightly better. The problem is that there just aren't sufficient pixels to create definition in the three-stacked form—as you said, folks are likely to switch fonts or increase point size to make them out. For similar reasons as the graphical variants, I think I'd skip providing the mixed setting (or wasting a stylistic set slot on them), and just provide the 2 letter abbreviations. I think folks will be able to get used to it pretty quick.

@DHowett What do you think? Would you be open to using the 2 letter variants as default?

@DHowett
Copy link
Member

DHowett commented Dec 17, 2020

Based on @PhMajerus' screenshots above, I would absolutely be open to using the 2-letter variants as a default.

I wish I'd given terminal the ability to choose stylistic sets. I like them all. 😄

I'll kick the tires myself, as well. Thanks for putting this together.

@schuelermine
Copy link

Thanks for the review @PhMajerus!

Your experience aligns pretty well with what I suspected might be the case. The graphical variants are fun, but are difficult to parse in real life scenarios. I would be tempted to leave them there, but I think I have to be honest with myself that the likelihood of anyone using them when they're hidden behind OpenType is quite low—even modern coding / terminal environments don't necessarily support stylistic sets, let alone anything older.

Between the 2-letter and mixed settings, it makes sense that the 2 letter variant would render more clearly. With proper hinting they'll perform markedly better with clearer differentiation between the letters, whereas the mixed setting will likely only perform similarly, or slightly better. The problem is that there just aren't sufficient pixels to create definition in the three-stacked form—as you said, folks are likely to switch fonts or increase point size to make them out. For similar reasons as the graphical variants, I think I'd skip providing the mixed setting (or wasting a stylistic set slot on them), and just provide the 2 letter abbreviations. I think folks will be able to get used to it pretty quick.

@DHowett What do you think? Would you be open to using the 2 letter variants as default?

I love the graphical variants

DHowett pushed a commit that referenced this issue Feb 9, 2021
This is a significant update to Cascadia Code including a large number
of bug fixes as well as updating the font to offer support for Fira
Code v5 ligature support. 

This update supersedes PR #373.

Closes #262 - ⏎ added
Closes #264 - additional codepoints for control characters added
Closes #281 - `!:` and `!.` added
Closes #290 - `/\` and `\/` added
Closes #301 - `??=` added
Closes #324 - ℞ added
Closes #327 - `<:>` and other variants implemented via the `calt`
  refactoring
Closes #359 - house added
Closes #371 - Added x-height instruction into ttfautohint to control the
  height of the lowercase.  
Closes #375 - Completely redesigned quote marks for better recognition
Closes #377 - updated hinting to achieve more consistent results
Closes #381 - increased height of thetamod
Closes #382 - reduced the width of the hooklefts
Closes #383 - updated heights on esh, glottalstop, glottalstopreversed
Closes #384 - tweaked hinting a little bit. Maybe it'll help :)
Closes #386 - added remaining soft-dotting
Closes #392 - changed designs of angled quotes (they are now round)
Closes #394 - changed former `~=` symbol to a simpler component-based
  version. Should be less confusing now for Lua / Matlab users. 
Closes #395 - makes the underline thicker based on font weight
Closes #400 - increased size of degree

Closes #219 
The full control pictures block has been added (u+2400 to u+2426). For
purposes of rendering, the two letter abbreviations have been used
instead of the standard three letter abbreviations:

Additionally, ss20 includes the oft-unused graphical representations of
these codepoints (for fun!):

Closes #276 (infinite arrows)
Full support for Fira Code's current ligature set (with a few
exceptions). Now featuring infinite arrows!!! 

This involved a full refactoring of the `calt` feature—for those
interested, it now uses forward-looking substitutions instead of
backward-looking substitutions and progressive substitution to reduce
code. This also required some redesigning of the greater / lesser
related ligatures. Please note, I have also removed all the obsolete
ligatures now covered by the arrows code.

Closes #329 
There was a mismatch in the font's postscript naming conventions that
was corrected. Should now render all weights in Word. **Note** there is
apparently an additional bug in Mac Word's implementation of variable
fonts which should be available in an update mid-Feb. 

* Not listed – Reworked the hints for the mod and superscript glyphs so
  that they're bottom-up rather than top-down. This allows for better
  bottom alignments. 

Aside from the above changes, this version also includes many other
small updates including spacing, outline quality improvements, and
fixing hinting.
@PhMajerus
Copy link
Contributor Author

PhMajerus commented Jan 3, 2024

Hey @aaronbell and @DHowett, sorry for posting in a closed issue, but I thought you might enjoy this, and it provides even more validation for the choice made if anyone happens to read through this thread.

Looking at some documentation on the HP 264x terminals series (from the 1970s), I found out they also used the 2-characters representation for control pictures:
Roman Uppercase Roman Lowercase
(more details at https://www.curiousmarc.com/computing/hp-264x-terminals)

BTW, after two years with the 2-characters variant, I'm really happy with the readability, and I keep seeing legacy systems where they made the same choice back in the 1970s and 1980s.
Thanks again for making this happen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants