-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of ctype dependency #173
Comments
I see that the function is used with single-character input. For this use-case the best I could find is: function is_alpha($input)
{
$code = ord($input);
return ($code >= 97 && $code <= 122) || ($code >= 65 && $code <= 90);
} It is 2x slower than ctype_alpha() however for 1 million calls it's 0.08 sec. vs 0.04 sec. So, I'd go for it. |
Will be happy to get rid of the dependency. Can we make this an optional dependency, using something as |
Polyfill exists - https://github.com/symfony/polyfill-ctype/blob/master/Ctype.php. But it is slower than my version which is optimized for use with single byte string. As for |
I have two solutions, both have 2% impact on performance:
--- a/src/HTML5/Parser/Tokenizer.php
+++ b/src/HTML5/Parser/Tokenizer.php
@@ -137,7 +137,7 @@ class Tokenizer
$this->endTag();
} elseif ('?' === $tok) {
$this->processingInstruction();
- } elseif (ctype_alpha($tok)) {
+ } elseif ($this->is_alpha($tok)) {
$this->tagName();
} else {
$this->parseError('Illegal tag opening');
@@ -347,7 +347,7 @@ class Tokenizer
// > -> parse error
// EOF -> parse error
// -> parse error
- if (!ctype_alpha($tok)) {
+ if (!$this->is_alpha($tok)) {
$this->parseError("Expected tag name, got '%s'", $tok);
if ("\0" == $tok || false === $tok) {
return false;
@@ -1186,4 +1186,10 @@ class Tokenizer
return '&' . $entity;
}
+
+ protected function is_alpha($input)
+ {
+ $code = ord($input);
+ return ($code >= 97 && $code <= 122) || ($code >= 65 && $code <= 90);
+ }
}
--- a/src/HTML5/Parser/Tokenizer.php
+++ b/src/HTML5/Parser/Tokenizer.php
@@ -137,9 +137,7 @@ class Tokenizer
$this->endTag();
} elseif ('?' === $tok) {
$this->processingInstruction();
- } elseif (ctype_alpha($tok)) {
- $this->tagName();
- } else {
+ } elseif (false === $this->tagName()) {
$this->parseError('Illegal tag opening');
// TODO is this necessary ?
$this->characterData();
@@ -343,20 +341,24 @@ class Tokenizer
}
$tok = $this->scanner->next();
.
- // a-zA-Z -> tagname
- // > -> parse error
// EOF -> parse error
// -> parse error
- if (!ctype_alpha($tok)) {
+ if ("\0" == $tok || false === $tok) {
+ $this->parseError("Expected tag name, got '%s'", $tok);
+
+ return false;
+ }
+
+ // Get tag name (or just it's start)
+ $name = $this->scanner->getAsciiAlpha();
+
+ if (false === $name || '' === $name) {
$this->parseError("Expected tag name, got '%s'", $tok);
- if ("\0" == $tok || false === $tok) {
- return false;
- }
.
return $this->bogusComment('</');
}
.
- $name = $this->scanner->charsUntil("\n\f \t>");
+ $name .= $this->scanner->charsUntil("\n\f \t>");
$name = self::CONFORMANT_XML === $this->mode ? $name : strtolower($name);
// Trash whitespace.
$this->scanner->whitespace();
@@ -379,8 +381,15 @@ class Tokenizer
*/
protected function tagName()
{
- // We know this is at least one char.
- $name = $this->scanner->charsWhile(':_-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz');
+ // Get tag name (or just it's start)
+ $name = $this->scanner->getAsciiAlpha();
+
+ if (false === $name || '' === $name) {
+ return false;
+ }
+
+ // Get the rest of tag name
+ $name .= $this->scanner->charsWhile(':_-' . Scanner::CHARS_ALNUM);
$name = self::CONFORMANT_XML === $this->mode ? $name : strtolower($name);
$attributes = array();
$selfClose = false;
The second patch might require some more love. For example I see that Scanner methods return false on EOF. I think this is not really needed (could return an empty string) and not properly documented. |
Extending the first patch with static Performance test done using run.php script on PHP 7.3. |
To have any performance benefits, you'd probably have to do that in the constructor and assign a callable to a property to avoid additional function calls. Also, the polyfill would have to support |
We can also use https://packagist.org/packages/symfony/polyfill-ctype instead. |
Hm, that seems nice |
@alecpl |
This library uses ctype_alpha() function in two places.
I think it should be easy to replace this function with something based on ord() or str[c]spn().
The text was updated successfully, but these errors were encountered: