Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the processing of text between nodes #162

Merged
merged 1 commit into from
Nov 27, 2018

Conversation

stof
Copy link
Contributor

@stof stof commented Nov 27, 2018

Instead of processing the text token one by one in the main loop, it is now processed in batch until the next special token (< and & which have special handling in the main loop and NUL characters which need to report a parse error).

https://blackfire.io/profiles/compare/8d7277d0-e2ed-40cf-b9b6-bffa6a523ae6/graph

There is a 51% improvement there

Instead of processing the text token one by one in the main loop, it is
now processed in batch until the next special token (< and & which have
special handling in the main loop and NUL characters which need to report
a parse error).
@goetas
Copy link
Member

goetas commented Nov 27, 2018

This looks promising :)

@stof
Copy link
Contributor Author

stof commented Nov 27, 2018

php test/benchmark/run.php (current master, i.e. 182f34d)
Loading: 101.72620534897
Writing: 37.083342075348

php test/benchmark/run.php (this PR)
Loading: 69.69865322113
Writing: 37.433831691742

php test/benchmark/run_native.php (same benchmark using DOMDocument::loadHTML and DOMDocument::saveHTML instead)
Loading: 10.595810413361
Writing: 3.5749840736389

And for reference, here is the benchmark running on 2.4.0:
Loading: 127.82767772675
Writing: 37.827260494232

This is indeed quite promising (note that all my optimizations since 2.4.0 are focusing on the loading part only, that's why there is not much improvements on the writing side).

2.4.0 was 12x slower than the native parser
This PR reaches the level of 6.5x slower than native parser.

@goetas
Copy link
Member

goetas commented Nov 27, 2018

give the time to test this by my self tomorrow, but looks great!
Thanks a lot!

@stof
Copy link
Contributor Author

stof commented Nov 27, 2018

And for the first time since I started this optimization work, the DOMTreeBuilder appears in the hot path defined by blackfire, instead of being entirely dominated by the Tokenizer 😄

@stof
Copy link
Contributor Author

stof commented Nov 27, 2018

note that I still have a few ideas to keep going after that one (but not as big as that one)

@goetas
Copy link
Member

goetas commented Nov 27, 2018

Now that I see it, this looks so obvious :)

give the time to test this by my self tomorrow, but looks great!

impatience :)

@goetas goetas merged commit f24a609 into Masterminds:master Nov 27, 2018
@stof stof deleted the optimize_text branch November 27, 2018 13:55
@goetas
Copy link
Member

goetas commented Nov 27, 2018

Nice to see this benchmark:

v2.3.1

$ php test/benchmark/run.php 10
Loading: 230.20720481873

master

$ php test/benchmark/run.php 10
Loading: 66.839385032654

(php 7.2)

That is almost 4 time faster! (and my guess is that there is still room for improvements as example in the Tokenizer::attribute() function or moving the readUntilSequence into the scanner class)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants