-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explore HTML parsing and Adoption Agency Algorithm #1
Conversation
…king a tag closer This commit marks the start of a bookmark one byte before the tag name start for tag openers, and two bytes before the tag name for tag closers. Setting a bookmark on a tag should set its "start" position before the opening "<", e.g.: ``` <div> Testing a <b>Bookmark</b> ----------------^ ``` The current calculation assumes this is always one byte to the left from $tag_name_starts_at. However, in tag closers that index points to a solidus symbol "/": ``` <div> Testing a <b>Bookmark</b> ----------------------------^ ``` The bookmark should therefore start two bytes before the tag name: ``` <div> Testing a <b>Bookmark</b> ---------------------------^ ```
…closers' into wp_html_processor
This implementation works! I benchmarked it on the HTML parsing spec itself, which is a 12MB HTML document: I tried parsing the HTML spec page (12MB):
That's pretty terrible! It's also not surprising. This PR builds an actual document tree and uses inefficient operations such as A text-based version similar to WP_HTML_Tag_Processor should be much faster and more memory-efficient. Let's explore one! |
Adoption Agency Algorithm requires a full pass through the HTML documentIn the worst-case scenario, the entire document must be parsed to know even the second node. Consider this markup: <b>
<div>
<div><!-- 100k tags amounting to 2 MB of normative HTML --></div>
</b> <!-- suddenly, a rogue </b> -->
</div>
</b> The correct DOM would be:
The adoption agency algorithm makes the What if we built an HTML normalizer instead?Since the entire markup must be processed upfront, this could work just as well: class WP_HTML_Processor {
public function __construct( $html, $options ) {
// Apply HTML parsing rules first, unless explicitly asked not to
if ( true !== $options['is_normative'] ) {
$html = WP_HTML_Normalizer::normalize( $html );
}
// From now on, we assume normative markup
$this->html = $html;
}
public function next_by_css( $selector );
public function set_inner_html( $html );
// ... |
…air screen. The table is no longer created by core as of WordPress 3.0, and support for global terms was removed in WordPress 6.1, so `$wpdb->sitecategories` is unset by default. This commit resolves a "passing null to non-nullable" deprecation notice on PHP 8.1: {{{ Deprecated: addcslashes(): Passing null to parameter #1 ($string) of type string is deprecated in wp-includes/class-wpdb.php on line 1804 }}} The `tables_to_repair` filter is available for plugins to readd the table or include any additional tables to repair. Follow-up to [14854], [14880], [54240]. Props ipajen, chiragrathod103, SergeyBiryukov. Fixes #57762. git-svn-id: https://develop.svn.wordpress.org/trunk@55421 602fd350-edb4-49c9-b593-d223f7449a82
…om next_posts(). The `esc_url()` function expects to a string for `$url` parameter. There is no input validation within that function. The function contains a `ltrim()` which also expects a string. Passing `null` to this parameter results in `Deprecated: ltrim(): Passing null to parameter #1 ($string) of type string is deprecated` notice on PHP 8.1+. Tracing the stack back, a `null` is being passed to it within `next_posts()` when `get_next_posts_page_link()` returns `null` (it can return a string or `null`). On PHP 7.0 to PHP 8.x, an empty string is returned from `esc_url()` when `null` is passed to it. The change in this changeset avoids the deprecation notice by not invoking `esc_url()` when `get_next_posts_page_link()` returns `null` and instead sets the `$output` to an empty string, thus maintain the same behavior as before (minus the deprecation notice). Adds a test to validate an empty string is returned and the absence of the deprecation (when running on PHP 8.1+). Follow-up to [11383], [9632]. Props codersantosh, nihar007, hellofromTonya, mukesh27, oglekler, rajinsharwar. Fixes #59154. git-svn-id: https://develop.svn.wordpress.org/trunk@56740 602fd350-edb4-49c9-b593-d223f7449a82
…Info screen. This resolves a fatal error if `strict_types` PHP setting is enabled: {{{ Argument #1 ($num) must be of type float, string given }}} Since the goal of the Site Health Info screen is to display raw values where possible, the `number_format()` call here does not seem to provide any benefit. Props krishneup, sabernhardt, audrasjb, SergeyBiryukov. Fixes #60364. git-svn-id: https://develop.svn.wordpress.org/trunk@58847 602fd350-edb4-49c9-b593-d223f7449a82
Closing in favor of more visible WordPress#4125