You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With your tool it is possible to look up unicode characters by various criteria as you've stated in your readme, including "unicode name" and "also known as".
In HTML, named character escape sequences are available for things like the less than and the greater than signs, but also for quite a few other characters.
Back in the day, before UTF-8 encoding support was widespread, we'd use the ISO-8859-1 encoding for our HTML and we'd use named character escape sequences for characters like æ, ø, å for example.
Some of those names stuck with me and I sometimes search for those characters by those names on Google if I am on a machine where inputing said characters directly is not possible or just too cumbersome.
Even on my MacBook Air, where I can generally long-press certain keys to access other characters, some applications implement text input that does not support the long-press functionality, so I go to some other window on-screen and either long-press there, or search for it on Google whichever is most convenient at the time (convenience in this case is determined by which other windows I happen to have on screen at that moment).
I pretty much always have at least one terminal window open at any time, and if I don't then opening the terminal is fast and simple.
Prior to purchasing my MacBook Air, when I was running Linux on a ThinkPad, I made a few simple shellscripts that were named after the HTML character entity references for the characters that I most commonly needed; æ, ø, å, Æ, Ø, Å; aelig, oslash, aring, AElig, Oslash, Aring. When executed they would spit out the corresponding UTF-8 encoded byte sequence for the character in question.
Most notably for me personally, aside from the six mentioned above are laquo, raquo, ndash, mdash, eacute and Eacute, but they are all useful IMO and anyway if you agree to include the HTML character entity reference names then it would make the most sense to include them all I think.
Some examples of what the output of chars would look like:
Example 1
chars U+002A
ASCII 2/a, 42, 0x2a, 0052, bits 00101010
Width: 1, prints as *
Unicode name: ASTERISK
Also known as: Star, Splat, Aster, Times, Gear, Dingle, Bug, Twinkle, Glob
HTML entity names: ast, midast
Example 2
chars U+00AE
LATIN1 ae, 174, 0xae, 0256, bits 10101110
Width: 1 (2 in CJK context), prints as ®
Quotes as \u{ae}
Unicode name: REGISTERED SIGN
HTML entity names: reg, circledR, REG
Example 3
chars U+00C6
LATIN1 c6, 198, 0xc6, 0306, bits 11000110
Width: 1 (2 in CJK context), prints as Æ
Upper case. Downcases to æ
Quotes as \u{c6}
Unicode name: LATIN CAPITAL LETTER AE
HTML entity name: AElig
In the examples above, a field named "HTML entity names" (where multiple names exist) or "HTML entity name" (where only one name exists) has been added.
Furthermore, I request that case-sensitive search is performed on this field where present, so that one can search for them and get results like shown in the following examples:
Example 1
chars Oslash
LATIN1 d8, 216, 0xd8, 0330, bits 11011000
Width: 1 (2 in CJK context), prints as Ø
Upper case. Downcases to ø
Quotes as \u{d8}
Unicode name: LATIN CAPITAL LETTER O WITH STROKE
HTML entity name: Oslash
Example 2
chars oslash
LATIN1 f8, 248, 0xf8, 0370, bits 11111000
Width: 1 (2 in CJK context), prints as ø
Lower case. Upcases to Ø
Quotes as \u{f8}
Unicode name: LATIN SMALL LETTER O WITH STROKE
HTML entity name: oslash
The text was updated successfully, but these errors were encountered:
23: Add data file and retrieval script for character reference names supported by HTML r=antifuchs a=ctsrc
This PR relates to issue #22 and is a first step towards the request I made in that issue.
Co-authored-by: Erik Nordstrøm <erik@nordstroem.no>
With your tool it is possible to look up unicode characters by various criteria as you've stated in your readme, including "unicode name" and "also known as".
In HTML, named character escape sequences are available for things like the less than and the greater than signs, but also for quite a few other characters.
Back in the day, before UTF-8 encoding support was widespread, we'd use the ISO-8859-1 encoding for our HTML and we'd use named character escape sequences for characters like æ, ø, å for example.
Some of those names stuck with me and I sometimes search for those characters by those names on Google if I am on a machine where inputing said characters directly is not possible or just too cumbersome.
Even on my MacBook Air, where I can generally long-press certain keys to access other characters, some applications implement text input that does not support the long-press functionality, so I go to some other window on-screen and either long-press there, or search for it on Google whichever is most convenient at the time (convenience in this case is determined by which other windows I happen to have on screen at that moment).
I pretty much always have at least one terminal window open at any time, and if I don't then opening the terminal is fast and simple.
Prior to purchasing my MacBook Air, when I was running Linux on a ThinkPad, I made a few simple shellscripts that were named after the HTML character entity references for the characters that I most commonly needed; æ, ø, å, Æ, Ø, Å;
aelig
,oslash
,aring
,AElig
,Oslash
,Aring
. When executed they would spit out the corresponding UTF-8 encoded byte sequence for the character in question.A full list of all HTML character entity references can be found at https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Character_entity_references_in_HTML
Most notably for me personally, aside from the six mentioned above are laquo, raquo, ndash, mdash, eacute and Eacute, but they are all useful IMO and anyway if you agree to include the HTML character entity reference names then it would make the most sense to include them all I think.
So to get to the point, my suggestion is that based upon the table at https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Character_entity_references_in_HTML, an additional field be added for applicable characters in the output for
chars
.Some examples of what the output of
chars
would look like:Example 1
Example 2
Example 3
In the examples above, a field named "HTML entity names" (where multiple names exist) or "HTML entity name" (where only one name exists) has been added.
Furthermore, I request that case-sensitive search is performed on this field where present, so that one can search for them and get results like shown in the following examples:
Example 1
Example 2
The text was updated successfully, but these errors were encountered: