Skip to content

The Unix tr command reimplemented in Perl, augmented with (minimal) Unicode support

License

Notifications You must be signed in to change notification settings

dlazesz/tr-unicode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

tr-unicode

The Unix tr command reimplemented in Perl, augmented with (minimal) Unicode support

GNU tr has no support for Unicode. It works on bytes and its 8-bit clean.

Therefore on UTF-8 text strange errors occur randomly as a feature. For example:

$ echo "°" | tr "Ű" "ű"
±

This implementation is trying to provide a fix for this issue with minimal Unicode support.

Bug reports and patch requests are welcome on the project site!

Note: The tr implementation of uutils coreutils written in Rust has Unicode support

Invocation

One can make an alias to tr or place in ~/bin to have the basic functionality of tr.

Excerpt form man tr with some addition:

Usage: tr [OPTION]... SET1 [SET2]

(OPTION must (if there are any) present in the first argument, only one at a time (TODO: bundling)!)

This is a tr-like utility in Perl with minimal Unicode support. (Aid for tr's 'feature': echo "°" | tr "Ű" "ű" -> ±)

Translate, squeeze, and/or delete characters from standard input, writing to standard output.

-c, -C, --complement    use the complement of SET1
-d, --delete            delete characters in SET1, do not translate
-s, --squeeze-repeats   replace each input sequence of a repeated character
                          that is listed in SET1 with a single occurrence
                          of that character
-t, --truncate-set1     (NOT IMPLEMENTED) first truncate SET1 to length of
                          SET2
    --help     display this help and exit (This text)
    --version  output version information and exit (NOT IMPLEMENTED)

SETs are specified as strings of characters. Most represent themselves. Interpreted sequences are:

\NNN            character with octal value NNN (1 to 3 octal digits)
\\              backslash
\a              audible BEL
\b              backspace
\f              form feed
\n              new line
\r              return
\t              horizontal tab
\v              vertical tab
CHAR1-CHAR2     all characters from CHAR1 to CHAR2 in ascending order
                (as in ASCII, TODO: MAKE LOCALE DEPENDENT)

NOT IMPLEMENTED, TODO:
[CHAR*]         in SET2, copies of CHAR until length of SET1
[CHAR*REPEAT]   REPEAT copies of CHAR, REPEAT octal if starting with 0
[:alnum:]       all letters and digits
[:alpha:]       all letters
[:blank:]       all horizontal whitespace
[:cntrl:]       all control characters
[:digit:]       all digits
[:graph:]       all printable characters, not including space
[:lower:]       all lower case letters
[:print:]       all printable characters, including space
[:punct:]       all punctuation characters
[:space:]       all horizontal or vertical whitespace
[:upper:]       all upper case letters
[:xdigit:]      all hexadecimal digits
[=CHAR=]        all characters which are equivalent to CHAR

Bug reports and patch requests are welcome on the project site!

Excerpt from man tr:

Translation occurs if -d is not given and both SET1 and SET2 appear. -t may be used only when translating. SET2 is extended to length of SET1 by repeating its last character as necessary. Excess characters of SET2 are ignored. Only [:lower:] and [:upper:] are guaranteed to expand in ascending order; used in SET2 while translating, they may only be used in pairs to specify case conversion. -s uses SET1 if not translating nor deleting; else squeezing uses SET2 and occurs after translation or deletion.

GNU coreutils home page: http://www.gnu.org/software/coreutils/ General help using GNU software: http://www.gnu.org/gethelp/ For complete documentation, run: info coreutils 'tr invocation'

About

The Unix tr command reimplemented in Perl, augmented with (minimal) Unicode support

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages