This library is an implementation of the skeleton function described in the Confusion Detection section of the Unicode Security Mechanisms technical standard.
Because Unicode contains such a large number of characters and incorporates the varied writing systems of the world, incorrect usage can expose programs or systems to possible security attacks.
The skeleton
function deconstructs complex Unicode graphemes into a string that can be used to detect if other strings are visually similar (aka confusable).
The Confusable class file is generated by bin/build-confusables
. That build script will automatically be called by composer on install/update.
The reason the class file is built dynamically is two fold:
- The confusables.txt file is quite large (~120k). Caching them locally is an improvement, but it still requires a disk read and parsing.
- Injecting the confusables rules into the PHP file it can be stored in PHP byte-code caches.
Should the Unicode confusables.txt file be updated, developers can rerun the build script at any time, even via a cronjob.
Create the skeleton
of a string.
Storing this value in the database will give developers a way of doing a visual uniqueness check against existing identifiers.
Check if two strings are confusable for each other.
Under the hood, this is implemented as skeleton(A) == skeleton(B)
.
Casefolding is not part of the skeleton
algorithm. If the requirements of your application include casefolding identifiers, it is your responsibility to supply the strings in the correct case to the skeleton
function.