Skip to content

kynx/code-utils

Repository files navigation

kynx/code-utils

Build Status

Utilities for generating PHP code.

Normalizers

The normalizers generate readable PHP labels (class names, namespaces, property names, etc) from valid UTF-8 strings, transliterating them to ASCII and spelling out any invalid characters.

Usage

The following code (forgive the Japanese - a certain translation tool tells me it means "Pet Store"):

<?php

use Kynx\Code\Normalizer\ClassNameNormalizer;

$normalizer = new ClassNameNormalizer('Controller');
$namespace = $normalizer->normalize('ペット \ ショップ');
echo $namespace;

outputs:

Petto\Shoppu

and:

<?php

use Kynx\Code\Normalizer\PropertyNameNormalizer;

$normalizer = new PropertyNameNormalizer();
$property = $normalizer->normalize('2 $ bill');
echo $property;

outputs:

twoDollarBill

See the tests for more examples.

Why?

You must never run code generated from untrusted user input. But there are a few cases where you do want to output code generated from (mostly) trusted input.

In my case, I need to generate classes and properties from an OpenAPI specification. There are no hard-and-fast rules on the characters present, just a vague "it is RECOMMENDED to follow common programming naming conventions". Whatever they are.

How?

Each normalizer uses ext-intl's Transliterator to turn the UTF-8 string into Latin-ASCII. Where a character has no equivalent in ASCII (the "€" symbol is a good example), it uses the Unicode name of the character to spell it out (to Euro, after some minor clean-up). For ASCII characters that are not valid in a PHP label, it provides its own spell outs. For instance, a backtick "`" becomes Backtick.

Initial digits are also spelt out: "123foo" becomes OneTwoThreeFoo. Finally reserved words are suffixed with a user-supplied string so they don't mess things up. In the first usage example above, if we normalized "class" it would become ClassController.

The results may not be pretty. If for some mad reason your input contains ͖ - put your glasses on! - the label will contain CombiningRightArrowheadAndUpArrowheadBelow. But it is valid PHP, and stands a chance of being as unique as the original. Which brings me to...

Unique labelers

The normalization process reduces around a million Unicode code points down to just 162 ASCII characters. Then it mangles the label further by stripping separators, reducing whitespace and turning it into camelCase, snake_case or whatever your programming preference. It's gonna be lossy - nothing we can do about that.

The unique labelers' job is to add back lost uniqueness, using a UniqueStrategyInterface to decorate any non-unique class names in the list it is given.

To guarantee uniqueness within a set of class name labels, use the UniqueClassLabeller:

<?php

use Kynx\Code\Normalizer\ClassNameNormalizer;
use Kynx\Code\Normalizer\UniqueClassLabeler;
use Kynx\Code\Normalizer\UniqueStrategy\NumberSuffix;

$labeler = new UniqueClassLabeler(new ClassNameNormalizer('Handler'), new NumberSuffix());

$labels = ['Déjà vu', 'foo', 'deja vu'];
$unique = $labeler->getUnique($labels);
var_dump($unique);

outputs:

array(3) {
  'Déjà vu' =>
  string(7) "DejaVu1"
  'foo' =>
  string(3) "Foo"
  'deja vu' =>
  string(7) "DejaVu2"
}

There are labelers for each of the normalizers: UniqueClassLabeler, UniqueConstantLabeler, UniquePropertyLabeler and UniqueVariableLabeler. Along with the NumberSuffix implementation of UniqueStrategyInterface, we provide a SpellOutOrdinalPrefix strategy. Using that instead of NumberSuffix above would output:

array(3) {
  'Déjà vu' =>
  string(11) "FirstDejaVu"
  'foo' =>
  string(3) "Foo"
  'deja vu' =>
  string(12) "SecondDejaVu"
}

Kinda cute, but a bit verbose for my taste.