Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized Tokenizer.php #130

Closed

Conversation

MichaelRoosz
Copy link

I am working on a project where we parse several thousand HTML5 pages with this library.
When profiling the code with xdebug, I noticed extreme amounts of calls to the scanner's .current() and .next() functions. Here is my attempt to optimize the tokenizer code.

Run time comparison:
Original Code: 281s
Only with the optimized quotedAttributeValue() function: 242s
With all optimizations: 226s

-> Saves about ~60s / 20%

@MichaelRoosz MichaelRoosz force-pushed the optimized_tokenizer branch 2 times, most recently from b78d7bb to 7493665 Compare July 23, 2017 15:27
@goetas
Copy link
Member

goetas commented Jul 26, 2017

Hi, thanks for your PR.

Looks good.. give me some time to check it properly

@mundschenk-at
Copy link
Contributor

Any timeline on this? I know things like this can be quite intricate, but I'm about to release a new major version of wp-Typography and any speed increase in the tokenizer would be very, very welcome.

@goetas
Copy link
Member

goetas commented Aug 25, 2017 via email

Copy link
Member

@mattfarina mattfarina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the whole I can see how this would be a speedup and the changes appear to make sense.

@goetas Being that it's changing a protected API (that impacts classes that would extent this) is this API change considered breaking (from a semantic version perspective)? This might be better on the 3.x branch and that be released with some more changes.

do {
$p = $this->scanner->position();
//$p = $this->scanner->position();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we get the position and don't use it? In any case, these can likely be removed rather than commented out.

dist: precise
fast_finish: true

dist: trusty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm likely missing something but, why are precise and trusty being specified?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trusty does not support php 5.3, so precise is necessary.

trusty was not the default build platform on travis ci at the time of the commit

@goetas
Copy link
Member

goetas commented Sep 1, 2017

closed by #135

@goetas goetas closed this Sep 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants