Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML API: Fix tag processor token length #6625

Closed

Conversation

sirreal
Copy link
Member

@sirreal sirreal commented May 24, 2024

It seems like the open tags have incorrect length in the tag processor. I'd expect that substr( $html, $token_start, $token_length ); to produce the entire tag. This patch fixes that, before it would exclude the final > on tags.

This is mostly internal and it seems to have adjustments in several places to fix it, but I'd like to have a consistent internal representation. Other types of tokens behave as I'd expect (with substr( $html, $start, $length )).

@dmsnell thoughts?

I'd expect something like this to hold:
<?php

$t = '<div class="target">text</div>trailing text';
$start = 0;
$token_length = 20;
$text_start = $token_length - $start;
$text_length = 4;

$end_start = $text_start + $text_length;
$end_length = 6;

var_dump(
    substr( $t, $start, $token_length ),
    substr( $t, $text_start, $text_length ),
    substr( $t, $end_start, $end_length ),
);

Trac ticket: https://core.trac.wordpress.org/ticket/61301


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.


$length_property = new ReflectionProperty( $processor, 'token_length' );
$length_property->setAccessible( true );
$tag_length = $length_property->getValue( $processor );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather us avoid reflection in tests, giving tests privileges that normal code shouldn't have.

not sure where, but I had a test set once which verified not the lengths, but the bookmark spans.

$processor = new class ( $html ) extends WP_HTML_Tag_Processor {
	public function get_raw_token() {
		$this->set_bookmark( 'here' );
		$mark = $this->bookmarks['here'];

		return substr( $this->html, $mark->start, $mark->length );
	}
}

for ( $i = 0; $i < $nth_token_in_html; $i++ ) {
	$processor->next_token();
}

$this->assertSame(
	$raw_token,
	$processor->get_raw_token(),
	'Failed to identify full extent of token.'
);

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been a longstanding annoyance to me, as it's clearly wrong, but I've had trouble fixing it without breaking other tokens. Let's see if we can test it from the bookmarks, and then add a comprehensive test dataset with all kinds of tokens and various HTML constructs (e.g. the HTML contains more than just the token by itself so we can see edge cases).

@sirreal sirreal force-pushed the fix/tag-processor-token-length-error branch from dabb279 to 85abac9 Compare May 27, 2024 19:21
@sirreal sirreal marked this pull request as ready for review May 27, 2024 19:35
@sirreal sirreal requested a review from dmsnell May 27, 2024 19:35
Copy link

github-actions bot commented May 27, 2024

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell, dmsnell, westonruter.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@dmsnell
Copy link
Member

dmsnell commented May 27, 2024

The test failure in #6651 makes me nervous.

Screenshot 2024-05-27 at 4 37 45 PM

@dmsnell
Copy link
Member

dmsnell commented May 28, 2024

@sirreal can you look into the failure in the linked PR? I think it's related to seeking with updates and getting the wrong $this->bytes_already_processed

@sirreal
Copy link
Member Author

sirreal commented May 28, 2024

The test failure in #6651 makes me nervous.
Can you look into the failure in the linked PR? I think it's related to seeking with updates and getting the wrong $this->bytes_already_processed

68a3251 was missing from that branch. There was a test that used the broken offset. The offset adjustment was removed in this branch, I pushed it to #6651.

@sirreal
Copy link
Member Author

sirreal commented May 28, 2024

@westonruter This is a bugfix for incorrect token lengths in tag bookmarks in the Tag Processor. Subclasses might see and be affected by the bug (and the fix) if they ever work directly with tag bookmarks lengths. I believe you have some plugins that may be doing advanced things with the Tag Processor so wanted to make sure you were aware.

Below is an example fix from this PR, plugins would be impacted in the same way and would need to remove the off-by-one correction on the bookmark lengths:

diff --git a/src/wp-includes/interactivity-api/class-wp-interactivity-api-directives-processor.php b/src/wp-includes/interactivity-api/class-wp-interactivity-api-directives-processor.php
index 3b2dcb123797..b12dcb4b3b15 100644
--- a/src/wp-includes/interactivity-api/class-wp-interactivity-api-directives-processor.php
+++ b/src/wp-includes/interactivity-api/class-wp-interactivity-api-directives-processor.php
@@ -107,7 +107,7 @@ public function append_content_after_template_tag_closer( string $new_content ):
 
 		$bookmark = 'append_content_after_template_tag_closer';
 		$this->set_bookmark( $bookmark );
-		$after_closing_tag = $this->bookmarks[ $bookmark ]->start + $this->bookmarks[ $bookmark ]->length + 1;
+		$after_closing_tag = $this->bookmarks[ $bookmark ]->start + $this->bookmarks[ $bookmark ]->length;
 		$this->release_bookmark( $bookmark );
 
 		// Appends the new content.
@@ -140,7 +140,7 @@ private function get_after_opener_tag_and_before_closer_tag_positions( bool $rew
 		}
 		list( $opener_tag, $closer_tag ) = $bookmarks;
 
-		$after_opener_tag  = $this->bookmarks[ $opener_tag ]->start + $this->bookmarks[ $opener_tag ]->length + 1;
+		$after_opener_tag  = $this->bookmarks[ $opener_tag ]->start + $this->bookmarks[ $opener_tag ]->length;
 		$before_closer_tag = $this->bookmarks[ $closer_tag ]->start;
 
 		if ( $rewind ) {

@westonruter
Copy link
Member

@sirreal See the following method in this subclass of WP_HTML_Tag_Processor:

	/**
	 * Appends HTML to the provided bookmark.
	 *
	 * @param string $bookmark Bookmark.
	 * @param string $html     HTML to inject.
	 * @return bool Whether the HTML was appended.
	 */
	public function append_html( string $bookmark, string $html ): bool {
		if ( ! $this->has_bookmark( $bookmark ) ) {
			return false;
		}

		$start = $this->bookmarks[ $bookmark ]->start;

		$this->lexical_updates[] = new WP_HTML_Text_Replacement(
			$start,
			$this->old_text_replacement_signature_needed ? $start : 0,
			$html
		);
		return true;
	}

So since it's not dealing with length then is a patch needed?

@dmsnell
Copy link
Member

dmsnell commented May 29, 2024

So since it's not dealing with length then is a patch needed?

this should be correct, @westonruter

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great fix ahead of exposing it in any public interface!

pento pushed a commit that referenced this pull request May 29, 2024
The Tag Processor stores the byte-offsets into its HTML document where
the current token starts and ends, and also for every bookmark. In some
cases for tags, the end offset has been off by one.

In this patch the offset is fixed so that a bookmark always properly
refers to the full span of the token it's bookmarking. Also the current
token byte offsets are properly recorded.

While this is a defect in the Tag Processor, it hasn't been exposed 
through the public interface and has not affected any of the working
of the processor. Only subclasses which rely on the length of a bookmark
have been potentially affected, and these are not supported environments
in the ongoing work.

This fix is important for future work and for ensuring that subclasses
performing custom behaviors remain as reliable as the public interface.

Developed in #6625
Discussed in https://core.trac.wordpress.org/ticket/61301

Props dmsnell, gziolo, jonsurrell, westonruter.
Fixes #61301.


git-svn-id: https://develop.svn.wordpress.org/trunk@58233 602fd350-edb4-49c9-b593-d223f7449a82
@dmsnell
Copy link
Member

dmsnell commented May 29, 2024

Merged in [58233]
6ca5bdc

@dmsnell dmsnell closed this May 29, 2024
@dmsnell dmsnell deleted the fix/tag-processor-token-length-error branch May 29, 2024 11:42
markjaquith pushed a commit to markjaquith/WordPress that referenced this pull request May 29, 2024
The Tag Processor stores the byte-offsets into its HTML document where
the current token starts and ends, and also for every bookmark. In some
cases for tags, the end offset has been off by one.

In this patch the offset is fixed so that a bookmark always properly
refers to the full span of the token it's bookmarking. Also the current
token byte offsets are properly recorded.

While this is a defect in the Tag Processor, it hasn't been exposed 
through the public interface and has not affected any of the working
of the processor. Only subclasses which rely on the length of a bookmark
have been potentially affected, and these are not supported environments
in the ongoing work.

This fix is important for future work and for ensuring that subclasses
performing custom behaviors remain as reliable as the public interface.

Developed in WordPress/wordpress-develop#6625
Discussed in https://core.trac.wordpress.org/ticket/61301

Props dmsnell, gziolo, jonsurrell, westonruter.
Fixes #61301.

Built from https://develop.svn.wordpress.org/trunk@58233


git-svn-id: http://core.svn.wordpress.org/trunk@57696 1a063a9b-81f0-0310-95a4-ce76da25c4cd
github-actions bot pushed a commit to gilzow/wordpress-performance that referenced this pull request May 29, 2024
The Tag Processor stores the byte-offsets into its HTML document where
the current token starts and ends, and also for every bookmark. In some
cases for tags, the end offset has been off by one.

In this patch the offset is fixed so that a bookmark always properly
refers to the full span of the token it's bookmarking. Also the current
token byte offsets are properly recorded.

While this is a defect in the Tag Processor, it hasn't been exposed 
through the public interface and has not affected any of the working
of the processor. Only subclasses which rely on the length of a bookmark
have been potentially affected, and these are not supported environments
in the ongoing work.

This fix is important for future work and for ensuring that subclasses
performing custom behaviors remain as reliable as the public interface.

Developed in WordPress/wordpress-develop#6625
Discussed in https://core.trac.wordpress.org/ticket/61301

Props dmsnell, gziolo, jonsurrell, westonruter.
Fixes #61301.

Built from https://develop.svn.wordpress.org/trunk@58233


git-svn-id: https://core.svn.wordpress.org/trunk@57696 1a063a9b-81f0-0310-95a4-ce76da25c4cd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants