-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized Tokenizer.php #130
Conversation
b78d7bb
to
7493665
Compare
7493665
to
2414194
Compare
Allow entity references containing numbers
…lotz/html5-php into MichaelHeerklotz-optimized_tokenizer
Hi, thanks for your PR. Looks good.. give me some time to check it properly |
Any timeline on this? I know things like this can be quite intricate, but I'm about to release a new major version of wp-Typography and any speed increase in the tokenizer would be very, very welcome. |
Most probably will have a look at it on Monday
…On 25 Aug 2017 11:35, "Der Mundschenk & Compagnie" ***@***.***> wrote:
Any timeline on this? I know things like this can be quite intricate, but
I'm about to release a new major version of wp-Typography and any speed
increase in the tokenizer would be very, very welcome.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#130 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAvaJ1a8MBrmgIM8KZ7vugcFTR4iw2CAks5sbpV1gaJpZM4Ogewd>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the whole I can see how this would be a speedup and the changes appear to make sense.
@goetas Being that it's changing a protected API (that impacts classes that would extent this) is this API change considered breaking (from a semantic version perspective)? This might be better on the 3.x branch and that be released with some more changes.
do { | ||
$p = $this->scanner->position(); | ||
//$p = $this->scanner->position(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we get the position and don't use it? In any case, these can likely be removed rather than commented out.
dist: precise | ||
fast_finish: true | ||
|
||
dist: trusty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm likely missing something but, why are precise and trusty being specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trusty
does not support php 5.3, so precise
is necessary.
trusty
was not the default build platform on travis ci at the time of the commit
closed by #135 |
I am working on a project where we parse several thousand HTML5 pages with this library.
When profiling the code with xdebug, I noticed extreme amounts of calls to the scanner's .current() and .next() functions. Here is my attempt to optimize the tokenizer code.
Run time comparison:
Original Code: 281s
Only with the optimized quotedAttributeValue() function: 242s
With all optimizations: 226s
-> Saves about ~60s / 20%