Optimize the processing of text between nodes #162

stof · 2018-11-27T13:21:05Z

Instead of processing the text token one by one in the main loop, it is now processed in batch until the next special token (< and & which have special handling in the main loop and NUL characters which need to report a parse error).

https://blackfire.io/profiles/compare/8d7277d0-e2ed-40cf-b9b6-bffa6a523ae6/graph

There is a 51% improvement there

Instead of processing the text token one by one in the main loop, it is now processed in batch until the next special token (< and & which have special handling in the main loop and NUL characters which need to report a parse error).

goetas · 2018-11-27T13:27:20Z

This looks promising :)

stof · 2018-11-27T13:36:38Z

php test/benchmark/run.php (current master, i.e. 182f34d)
Loading: 101.72620534897
Writing: 37.083342075348

php test/benchmark/run.php (this PR)
Loading: 69.69865322113
Writing: 37.433831691742

php test/benchmark/run_native.php (same benchmark using DOMDocument::loadHTML and DOMDocument::saveHTML instead)
Loading: 10.595810413361
Writing: 3.5749840736389

And for reference, here is the benchmark running on 2.4.0:
Loading: 127.82767772675
Writing: 37.827260494232

This is indeed quite promising (note that all my optimizations since 2.4.0 are focusing on the loading part only, that's why there is not much improvements on the writing side).

2.4.0 was 12x slower than the native parser
This PR reaches the level of 6.5x slower than native parser.

goetas · 2018-11-27T13:40:03Z

give the time to test this by my self tomorrow, but looks great!
Thanks a lot!

stof · 2018-11-27T13:42:47Z

And for the first time since I started this optimization work, the DOMTreeBuilder appears in the hot path defined by blackfire, instead of being entirely dominated by the Tokenizer 😄

stof · 2018-11-27T13:43:16Z

note that I still have a few ideas to keep going after that one (but not as big as that one)

goetas · 2018-11-27T13:48:46Z

Now that I see it, this looks so obvious :)

give the time to test this by my self tomorrow, but looks great!

impatience :)

goetas · 2018-11-27T19:10:13Z

Nice to see this benchmark:

v2.3.1

$ php test/benchmark/run.php 10
Loading: 230.20720481873

master

$ php test/benchmark/run.php 10
Loading: 66.839385032654

(php 7.2)

That is almost 4 time faster! (and my guess is that there is still room for improvements as example in the Tokenizer::attribute() function or moving the readUntilSequence into the scanner class)

Optimize the processing of text between nodes

ced5b84

Instead of processing the text token one by one in the main loop, it is now processed in batch until the next special token (< and & which have special handling in the main loop and NUL characters which need to report a parse error).

goetas merged commit f24a609 into Masterminds:master Nov 27, 2018

stof deleted the optimize_text branch November 27, 2018 13:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the processing of text between nodes #162

Optimize the processing of text between nodes #162

stof commented Nov 27, 2018

goetas commented Nov 27, 2018

stof commented Nov 27, 2018 •

edited

Loading

goetas commented Nov 27, 2018

stof commented Nov 27, 2018

stof commented Nov 27, 2018

goetas commented Nov 27, 2018

goetas commented Nov 27, 2018 •

edited

Loading

Optimize the processing of text between nodes #162

Optimize the processing of text between nodes #162

Conversation

stof commented Nov 27, 2018

goetas commented Nov 27, 2018

stof commented Nov 27, 2018 • edited Loading

goetas commented Nov 27, 2018

stof commented Nov 27, 2018

stof commented Nov 27, 2018

goetas commented Nov 27, 2018

goetas commented Nov 27, 2018 • edited Loading

stof commented Nov 27, 2018 •

edited

Loading

goetas commented Nov 27, 2018 •

edited

Loading