A low-level string character analysis library for PHP.
See compart unicode documentation for useful information about unicode characters that PHP does not handle yet.
Features:
- Per-character (multi-byte aware) analysis;
- Script detection, differentiating look-a-like characters (see IDN homograph attack);
- Character encoding detection & conversion;
- Various serialization formats;
Instructions on how to set this repository up for use in your own project, or as a develper contributing to this one.
There are not many requirements for this library. All of them are host machine related.
- PHP
>= 7.1
; - PHP Intl extension;
- libicu;
Use composer and your autoloader.
composer require jordanbrauer/runes
For Contributors:
Clone the repository and install the development tools to begin running tests for your features & bug fixes.
git clone https://github.com/jordanbrauer/runes.git \
&& cd ./runes \
&& composer install;
Using the library is super simple. For a quick example, let's analyze the ancient, yet strangely familiar, ᛒ
from the Elder Futhark writing system!
use Rune\Rune;
$rune = new Rune('ᛒ');
dump($rune->toJson());
Would output the following data about the glyph.
{
"bidirectionalClass": "L",
"binary": "111000011001101110010010",
"blockCode": 35,
"bytes": 1,
"category": "Lo",
"codepoint": "U+16D2",
"combiningClass": 0,
"decimal": 14785426,
"encoding": "UTF-8",
"glyph": "ᛒ",
"hex": "e19b92",
"isMirrored": false,
"name": "RUNIC LETTER BERKANAN BEORC BJARKAN B",
"script": "Runic",
"utf16": "0x16D2",
"utf32": "0x000016D2",
"utf8": "0xE1 0x9B 0x92",
"version": "3.0.0.0"
}
Unicode is awesome. However, it can be the source of much pain for programmers. This tool aims to help alleviate said pain by providing a low-level/generic API that allows you to focus on the problem without getting bogged down with UTF-8 and it's cousins.
Originally, the name of this project was UTFH8; a tongue-in-cheek word-play on the UTF-8 encoding algorithm and the (very strong) English word, "hate", by using the number eight in place of the letters "a-t-e", insinutating that at some point in every developer's career, they will say – "I hate unicode".
Now, the project has been renamed to Runes, inspired by Go's rune type, and the Elder Futhark – something less harsh & sinister sounding.