HTML API: Complete missing tags in the IN BODY insertion mode. #6972

dmsnell · 2024-07-05T04:40:47Z

Summary

There are some failing tests that need to pass.
This needs a full and careful audit.
Full docblock/example review (probably nothing more is needed since everything should be internal or already-documented)

Description

As part of work to add more spec support to the HTML API, this patch adds support for the remaining missing tags in the IN BODY insertion mode. Not all of the added tags are supported, because in some cases they reset the insertion mode and are reprocessed where they will be rejected.

Included in these changes are some shuffling of the tag rules in IN BODY. This is to make it easier to compare the final code against the HTML specification, so that they appear in the same order as they exist within the spec.

Testing

This change unlocks 54 new test assertions that have previously been skipped.

html5lib tests

- Tests: 614, Assertions: 191, Skipped: 423.
+ Tests: 610, Assertions: 275, Skipped: 335.

github-actions · 2024-07-05T04:53:58Z

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

The Plugin and Theme Directories cannot be accessed within Playground.
All changes will be lost when closing a tab with a Playground instance.
All changes will be lost when refreshing the page.
A fresh instance is created each time the link below is clicked.
Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

dmsnell · 2024-07-05T05:17:00Z

src/wp-includes/html-api/class-wp-html-processor.php

 			? ( WP_HTML_Stack_Event::POP === $this->current_element->operation && '#tag' === $this->get_token_type() )
 			: parent::is_tag_closer();
+
+		// A closing BR tag is treated as if it were a opening tag.
+		return $is_closer_token && 'BR' !== $this->get_tag();


Does this belong in the Tag Processor? I don't know. It in some ways is more accurate there, but then it's not possible to distinguish an opening from closing BR tag. Probably it should move into the Tag Processor since it's never possible in HTML to have a closing BR tag.

sirreal

Leaving notes from a partial review.

sirreal · 2024-07-05T10:43:05Z

src/wp-includes/html-api/class-wp-html-processor-state.php

+	 *
+	 * @var string
+	 */
+	const QUIRKS_MODE = 'quirks-mode';


This is interesting. There are 3 document modes:

"no-quirks"

"quirks"

"limited-quirks"

We're missing "limited-quirks".

However, the compatMode of a document is "BackCompat" if the document is in quirks mode or "CSS1Compat" if the document is in "no-quirks" or "limited-quirks" modes.

Do you know whether we ignore the difference between no- and limited- quirks modes? Is the plan to add the additional mode when and if it becomes necessary?

So far I haven't seen any implications for limited-quirks mode. Most language reads like "if document is in quirks mode".

I'll double-check, but I think limited quirks comes specifically in the fragment case.

after another scan through the spec, it does seem that limited-quirks mode behaves just like no-quirks mode, and the only impact this has is on case-sensitivity in matching IDs and class names.

I don't see a reason to differentiate them, especially since browsers don't. the name CSS1Compat makes more sense to me now too, since it's really about how CSS selectors behave

I'm satisfied 👌

sirreal · 2024-07-05T11:05:26Z

src/wp-includes/html-api/class-wp-html-processor.php

+					 *
+					 * This parser does not currently support this behavior.
+					 *
+					 * @todo The spec is unclear on whether this should be ignored. Should it?


It seems clear that the only thing to do with this is to add the attributes that aren't already present. Is your question more about whether we should pause here or skip it?

yeah it's about pausing or skipping. I think the only acceptable answer is to skip, but I don't want to assume that without questioning it. if we don't skip it, we'll reach a new BODY opening without a closer, I think.

sirreal · 2024-07-05T11:24:54Z

src/wp-includes/html-api/class-wp-html-open-elements.php

+	/**
+	 * Returns the name of the node at the nth position on the stack
+	 * of open elements, or `null` if no such position exists.
+	 *
+	 * Note that this uses a 1-based index, which represents the
+	 * "nth item" on the stack, counting from the top, where the
+	 * top-most element is the 1st, the second is the 2nd, etc...
+	 *
+	 * @todo Skip over "marker" entries on the stack, if appropriate.
+	 *
+	 * @param int $nth Retrieve the nth item on the stack, with 1 being
+	 *                 the top element, 2 being the second, etc...
+	 * @return string|null Name of the node on the stack at the given location,
+	 *                     or `null` if the location isn't on the stack.
+	 */
+	public function at( int $nth ) {
+		foreach ( $this->walk_down() as $item ) {
+			if ( 0 === --$nth ) {
+				return $item->node_name;
+			}
+		}
+
+		return null;
+	}


This is a nice method, I'd like to be able to use it in both directions by providing a negative index to walk_up and a positive index to walk_down.

The negative index would make this case much easier:

wordpress-develop/src/wp-includes/html-api/class-wp-html-processor.php

Lines 1618 to 1626 in aa11864

$current_node = $this->state->stack_of_open_elements->current_node();

if ( $current_node && 'OPTION' === $current_node->node_name ) {

foreach ( $this->state->stack_of_open_elements->walk_up( $current_node ) as $parent ) {

break;

}

if ( $parent && 'OPTGROUP' === $parent->node_name ) {

$this->state->stack_of_open_elements->pop();

}

}

if ( $this->state->stack_of_open_elements->current_node_is( 'OPTION' ) && 'OPTGROUP' === $this->state->stack_of_open_elements->at( -2 ) ) { // pop }

This is how Array.prototype.at works in JS.

Suggested change

/**

* Returns the name of the node at the nth position on the stack

* of open elements, or `null` if no such position exists.

*

* Note that this uses a 1-based index, which represents the

* "nth item" on the stack, counting from the top, where the

* top-most element is the 1st, the second is the 2nd, etc...

*

* @todo Skip over "marker" entries on the stack, if appropriate.

*

* @param int $nth Retrieve the nth item on the stack, with 1 being

* the top element, 2 being the second, etc...

* @return string|null Name of the node on the stack at the given location,

* or `null` if the location isn't on the stack.

*/

public function at( int $nth ) {

foreach ( $this->walk_down() as $item ) {

if ( 0 === --$nth ) {

return $item->node_name;

}

}

return null;

}

/**

* Returns the name of the node at the nth position on the stack

* of open elements, or `null` if no such position exists.

*

* Note that this uses a 1-based index, which represents the

* "nth item" on the stack, counting from the top, where the

* top-most element is the 1st, the second is the 2nd, etc...

*

* A negative index can be used to access the element beginning

* at the bottom of the stack, where the bottom-most element is

* `at( -1 )`.

*

* @todo Skip over "marker" entries on the stack, if appropriate.

*

* @since 6.7.0

*

* @param int $nth Retrieve the nth item on the stack, with 1 being

* the top element, 2 being the second, etc...

* If a negative index is provided, the lookup will

* start from the bottom of the stack.

* @return string|null Name of the node on the stack at the given location,

* or `null` if the location isn't on the stack.

*/

public function at( int $nth ) {

if ( 0 === $nth ) {

return null; // _doing_it_wrong here?

}

if ( 0 > $nth ) {

foreach ( $this->walk_up() as $item ) {

if ( 0 === ++$nth ) {

return $item->node_name;

}

}

return null;

}

foreach ( $this->walk_down() as $item ) {

if ( 0 === --$nth ) {

return $item->node_name;

}

}

return null;

}

I also added handling for $nth === 0 which seems good. There shouldn't be risk of an infinite loop, but there's no reason to do any iteration with 0.

Aside observation

I spent a bit of time reviewing some language which sounded off to me. It seems like the HTML standard inverts conventional stack terminology. The stack of open elements is a LIFO stack, but it starts at the top and pushes to and pops from the bottom, so the terminology used in the HTML API aligns with that:

The html node, however it is created, is the topmost node of the stack. It only gets popped off the stack when the parser finishes.

The current node is the bottommost node in this stack of open elements.

I do want to be careful about conflating the language from other programming languages and their array data structures and the stack in the HTML spec. maybe at() is the wrong name. I originally added this before when I needed to implement another behavior.

perhaps we call it stack_position( $nth )

I like your idea for negative numbers, though maybe we could make it even more explicit.

'OPTGROUP' === $this->state->stack_of_open_elements->at_position( 'from-bottom', 2 );

It's true this could be confusing to use a single method and negative or positive indexes to move from either end, it may not be clear which end is which.

Instead of a single method with some better-named flag, let's match the walk_up (start at bottom and go towards top) and walk_down (start at top and go towards bottom) which already exist and match the specification stack:

'DIV' === $this->state->stack_of_open_elements->nth_from_bottom( 1 ); // analogous to current node on the stack 'HTML' === $this->state->stack_of_open_elements->nth_from_top( 1 ); // always true

this is great, but I think it would be best on its own, so maybe we can follow-up with a change that modifies things in a unified commit. with the other major open PRs, perhaps we can address this once we are basically at full support, or at least until the open PRs are closed.

scratch that. I was playing with it and I'm going to incorporate it here. let's see how we go. there's a part of me that doesn't like adding more index-based methods (as I write it out), but given the spec uses this language, that's fine I guess.

It still feels odd, so I'm going to leave this out for now and ask you your thoughts. For most cases of walk_up() and walk_down() we're referencing another node and not an index, which seems far less likely to be broken by updates to the code. For example, setting $parent = $->nth_from_bottom(2) depends on the current node being at position nth = 1. With ->walk_up( $node ) as $parent we don't have that same fragility by exporting the internals of the stack to the HTML Processor.

I'm open to this, but the gains didn't feel as big as I expected when I went through what you proposed and did my own search to find all the cases.

diff --git a/src/wp-includes/html-api/class-wp-html-open-elements.php b/src/wp-includes/html-api/class-wp-html-open-elements.php index 0cd223657a..6f76265b09 100644 --- a/src/wp-includes/html-api/class-wp-html-open-elements.php +++ b/src/wp-includes/html-api/class-wp-html-open-elements.php @@ -109,8 +109,6 @@ class WP_HTML_Open_Elements { * "nth item" on the stack, counting from the top, where the * top-most element is the 1st, the second is the 2nd, etc... * - * @todo Skip over "marker" entries on the stack, if appropriate. - * * @since 6.7.0 * * @param int $nth Retrieve the nth item on the stack, with 1 being @@ -128,6 +126,53 @@ class WP_HTML_Open_Elements { return null; } + /** + * Returns the name of the node at the nth position from the bottom on + * the stack of open elements, or `null` if no such position exists. + * + * Note that this uses a 1-based index, which represents the + * "nth item" on the stack, counting from the bottom, where the + * bottom-most (or deepest) element is the 1st, the second is + * the 2nd, etc... + * + * @since 6.7.0 + * + * @param int $nth Retrieve the nth item on the stack, with 1 being + * the bottom (depeest) element, 2 being the second, etc... + * @return string|null Name of the node on the stack at the given location, + * or `null` if the location isn't on the stack. + */ + public function nth_from_bottom( int $nth ): ?string { + if ( $nth <= 0 || $nth > count( $this->stack ) ) { + return null; + } + + return $this->stack[ count( $this->stack ) - $nth ]->node_name; + } + + /** + * Returns the name of the node at the nth position from the top on + * the stack of open elements, or `null` if no such position exists. + * + * Note that this uses a 1-based index, which represents the + * "nth item" on the stack, counting from the top, where the + * top-most element is the 1st, the second is the 2nd, etc... + * + * @since 6.7.0 + * + * @param int $nth Retrieve the nth item on the stack, with 1 being + * the top element, 2 being the second, etc... + * @return string|null Name of the node on the stack at the given location, + * or `null` if the location isn't on the stack. + */ + public function nth_from_top( int $nth ): ?string { + if ( $nth <= 0 || $nth > count( $this->stack ) ) { + return null; + } + + return $this->stack[ $nth - 1 ]->node_name; + } + /** * Reports if a node of a given name is in the stack of open elements. * diff --git a/src/wp-includes/html-api/class-wp-html-processor.php b/src/wp-includes/html-api/class-wp-html-processor.php index f66498b19f..0f2135387e 100644 --- a/src/wp-includes/html-api/class-wp-html-processor.php +++ b/src/wp-includes/html-api/class-wp-html-processor.php @@ -1211,7 +1211,7 @@ class WP_HTML_Processor extends WP_HTML_Tag_Processor { case '+BODY': if ( 1 === $this->state->stack_of_open_elements->count() || - 'BODY' !== $this->state->stack_of_open_elements->at( 2 ) || + 'BODY' !== $this->state->stack_of_open_elements->nth_from_top( 2 ) || $this->state->stack_of_open_elements->contains( 'TEMPLATE' ) ) { // Ignore the token. @@ -1237,7 +1237,7 @@ class WP_HTML_Processor extends WP_HTML_Tag_Processor { case '+FRAMESET': if ( 1 === $this->state->stack_of_open_elements->count() || - 'BODY' !== $this->state->stack_of_open_elements->at( 2 ) || + 'BODY' !== $this->state->stack_of_open_elements->nth_from_top( 2 ) || false === $this->state->frameset_ok ) { // Ignore the token. @@ -2358,10 +2358,8 @@ class WP_HTML_Processor extends WP_HTML_Tag_Processor { case '-OPTGROUP': $current_node = $this->state->stack_of_open_elements->current_node(); if ( $current_node && 'OPTION' === $current_node->node_name ) { - foreach ( $this->state->stack_of_open_elements->walk_up( $current_node ) as $parent ) { - break; - } - if ( $parent && 'OPTGROUP' === $parent->node_name ) { + $parent = $this->state->stack_of_open_elements->nth_from_bottom( 2 ); + if ( 'OPTGROUP' === $parent ) { $this->state->stack_of_open_elements->pop(); } } @@ -3338,10 +3336,7 @@ class WP_HTML_Processor extends WP_HTML_Tag_Processor { */ public function reset_insertion_mode(): void { // Set the first node. - $first_node = null; - foreach ( $this->state->stack_of_open_elements->walk_down() as $first_node ) { - break; - } + $first_node = $this->state->stack_of_open_elements->nth_from_top( 1 ); /* * > 1. Let _last_ be false.

Strangely, with that patch applied I get one more skipped test/one fewer assertions. I'm not sure why, but I'm thinking it must be a logic error, which only adds to my skepticism of replacing semantic relations with array indices.

Let's leave it out for now and revisit after this lands, that's fine.

sirreal · 2024-07-05T11:42:51Z

src/wp-includes/html-api/class-wp-html-processor.php

+				 *
+				 * This parser does not currently support this behavior.
+				 *
+				 * @todo The spec is unclear on whether this should be ignored. Should it?


Whatever we do we should be consistent with how we handle HTML above.

My inclination is that we should not stop. It would be good to handle the attributes, but in the interest of providing developers an interface similar to what they'd expect in a DOM tree, it doesn't make sense for these nodes to appear where they're written in HTML because they'll never actually exist there. There's a possible side-effect of them being present, but side-effects are not a reason to stop at next_tag or next_token.

yeah I think this is the only viable option.

originally some of this was held up by the need to communicate these changes. someone could have been looking for a BODY element with data-is-dark on it, and skipped the BODY because at the time it didn't have that attribute, but only once parsing the full document do we realize it had it.

I'd still like to handle this by communicating the retroactive document change (a need for adoption and fostering as well), but for now it's probably good enough simply to list it as a limit of this parser.

sirreal · 2024-07-05T11:58:23Z

src/wp-includes/html-api/class-wp-html-processor.php

+				 * @todo This may need to be handled in the Tag Processor and turn into
+				 *       a single self-contained tag like TEXTAREA, whose modifiable text
+				 *       is the rest of the input document as plaintext.


That seems like a good idea, I was wondering how to handle this. We can figure that out in a follow-up.

sirreal · 2024-07-05T12:06:12Z

src/wp-includes/html-api/class-wp-html-processor.php

+				if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) {
+					// @todo Record parse error: this error doesn't impact parsing.
+				}


We could turn this into a comment like we've done elsewhere, there's no reason to check if we're not reporting errors.

src/wp-includes/html-api/class-wp-html-processor.php

sirreal

Partial review, I'll continue.

src/wp-includes/html-api/class-wp-html-processor.php

sirreal · 2024-07-05T15:58:10Z

src/wp-includes/html-api/class-wp-html-processor.php

+					if ( ! $this->state->stack_of_open_elements->current_node_is( 'FORM' ) ) {
+						// @todo Indicate a parse error once it's possible. This error does not impact the logic here.
+					}


No-op parse error, this can be a comment.

sirreal · 2024-07-05T16:01:55Z

src/wp-includes/html-api/class-wp-html-processor.php

+				if ( ! $this->state->stack_of_open_elements->current_node_is( $token_name ) ) {
+					// @todo Record parse error: this error doesn't impact parsing.
+				}


Noop parse error, comment instead.

src/wp-includes/html-api/class-wp-html-processor.php

tests/phpunit/tests/html-api/wpHtmlProcessorHtml5lib.php

src/wp-includes/html-api/class-wp-html-processor.php

sirreal · 2024-07-05T17:42:08Z

src/wp-includes/html-api/class-wp-html-processor.php

+			return $text;
+		}
+
+		if ( '#comment' === $this->get_token_name() ) {


Funky comments probably need this handling too?

maybe? I want to audit this because the rules are all different for various contexts and insertion modes, mainly whether null bytes transform into U+FFFD or whether it's stripped away. I'll follow-up with this, but I don't feel like it has to be in this patch

Updated to handle all situations, which I think is right now.

I also moved the logic into the Tag Processor, because it seems like it's capable enough to know whether to skip the newline. Turns out that if it's anything other than a text node immediately following a PRE or LISTING (since TEXTAREA is self-contained, the logic there is clear), then there's no skipping. A simple flag WP_HTML_Tag_Processor::skip_next_line_feed suffices.

sirreal

I've left a lot of comments, the majority of this is ready to go once you've considered them. I won't be able to review again soon, so I'm leaving a preemptive approval.

src/wp-includes/html-api/class-wp-html-open-elements.php

src/wp-includes/html-api/class-wp-html-tag-processor.php

westonruter · 2024-07-08T21:33:57Z

src/wp-includes/html-api/class-wp-html-token.php

@@ -63,7 +63,7 @@ class WP_HTML_Token {
 	/**
 	 * Called when token is garbage-collected or otherwise destroyed.
 	 *
-	 * @var callable|null
+	 * @var ?callable


Apparently PHPDoc doesn't actually support this nullable syntax, although PHP itself does.

https://stackoverflow.com/a/50704179/93579

Same goes for below.

Thanks for flagging, [58742] was landed to fix PHPDoc annotations in many places.

src/wp-includes/html-api/class-wp-html-active-formatting-elements.php

src/wp-includes/html-api/class-wp-html-open-elements.php

From WordPress#6972

This includes template support in body

github-actions · 2024-07-19T17:35:38Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props dmsnell, jonsurrell, westonruter.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

As part of work to add more spec support to the HTML API, this patch adds support for the remaining missing tags in the IN BODY insertion mode. Not all of the added tags are supported, because in some cases they reset the insertion mode and are reprocessed where they will be rejected. See Core-61576.

Co-authored-by: Weston Ruter <westonruter@google.com>

Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com>

dmsnell · 2024-07-21T05:09:23Z

@westonruter @sirreal I think all of the feedback has been addressed. additionally:

get_modifiable_text() in the Tag Processor has been updated to properly handle leading newlines and NULL bytes.
an additional test asserts this behavior for get_modifiable_text().
I've pulled in clear_up_to_last_marker() from HTML API: Add table support #6040 (thanks @sirreal).
Fixed some mistakes/oversights after additional review.

I might merge this tomorrow, but I'd appreciate any reviews in post. This is so exciting: we're there - at the end of IN BODY.

sirreal

I have some concerns about the get_modifiable_text fixes for null characters and omitted newlines.

If we pull that out, I think everything else is ready in this PR and that change can be done in a follow-up.

src/wp-includes/html-api/class-wp-html-open-elements.php

sirreal · 2024-07-22T08:37:07Z

src/wp-includes/html-api/class-wp-html-processor-state.php

+	 *
+	 * @var string
+	 */
+	const QUIRKS_MODE = 'quirks-mode';


I'm satisfied 👌

sirreal · 2024-07-22T08:40:01Z

src/wp-includes/html-api/class-wp-html-processor-state.php

+	 *
+	 * @var string
+	 */
+	public $compat_mode = self::NO_QUIRKS_MODE;


The mode has potential to be confusing. I suspect the processor will almost always be used in "no-quirks" mode. I do think there's value in making sure these things are clear. I'm a bit concerned we're mixing mode/compatMode and their respective values.

The HTML standard refers to a mode associated with every document:

Each document has an associated… mode ("no-quirks", "quirks", or "limited-quirks")…

Then it follows with this section "for web developers" (which is how you detect the mode, roughly, via DOM APIs):

document . compatMode
Returns the string "BackCompat" if document’s mode is "quirks"; otherwise "CSS1Compat".

The most correct thing may be to rename this property to mode and continue using the quirks values because this is from an HTML processing perspective.

my first inclination was to use the term "mode" from the spec, but that term is so generic and conflates with "insertion mode" and I was nervous using it.

this isn't currently exposed to calling code, and the terms come from the DOM spec, as you linked, and not from the HTML spec.

I can change it to document_mode, but I'm still nervous that adopting mode is going to be confusing, since it plays such a small role overall and stands against insertion_mode

I have similar concerns about mode being a very generic term. I do like document_mode, that would be my preference if you agree 👍

sirreal · 2024-07-22T09:58:58Z

src/wp-includes/html-api/class-wp-html-tag-processor.php

 		}
+		$this->skip_next_linefeed = false;


This worries me. Is this line necessary or can we just make sure it's reset when advancing to another token?

It's important that this method be idempotent. We call this from the processor and it can't change the state so that when consuming code calls it it's broken. And consuming code should not need to know that it can't call this more than once, this should be fine but wouldn't work correctly:

if ( '' !== $this->get_modifiable_text() ) { return "<h1>" . $this->get_modifiable_text() . "</h1>"; }

I noticed this both in the tests when debugging and in the html-api-debugger plugin. I always see the incorrect behavior (with this line) but I'm not sure where the method is being called twice in the plugin.

When this line is commented, things seem to work as expected and tests pass.

it's hard to change when advancing to the next token, because that wipes out the flag when moving from LISTING and PRE to #text.

I'll explore setting last_node_name or something like this. it's a good catch, thank you.

in 1323448 I've changed this to $this->skip_newline_at as a way of declaratively detecting where to ignore the newline. this is more resilient, but I still have concerns about seeking and updating the document.

I'll try to figure out seeking, but as it stands we don't expose a way to change a PRE or LISTING tag into something else, so I will not worry too much about fixing those cases until we allow getting into them.

for now I've simply listed the behavior after seeking as a limitation and warned to seek to the PRE or LISTING, not to the #text node. I don't know how to fix this without introducing unseen overhead in the Tag Processor.

do you think this is reasonable enough?

sirreal · 2024-07-22T10:12:07Z

src/wp-includes/html-api/class-wp-html-open-elements.php

+	/**
+	 * Returns the name of the node at the nth position on the stack
+	 * of open elements, or `null` if no such position exists.
+	 *
+	 * Note that this uses a 1-based index, which represents the
+	 * "nth item" on the stack, counting from the top, where the
+	 * top-most element is the 1st, the second is the 2nd, etc...
+	 *
+	 * @todo Skip over "marker" entries on the stack, if appropriate.
+	 *
+	 * @param int $nth Retrieve the nth item on the stack, with 1 being
+	 *                 the top element, 2 being the second, etc...
+	 * @return string|null Name of the node on the stack at the given location,
+	 *                     or `null` if the location isn't on the stack.
+	 */
+	public function at( int $nth ) {
+		foreach ( $this->walk_down() as $item ) {
+			if ( 0 === --$nth ) {
+				return $item->node_name;
+			}
+		}
+
+		return null;
+	}


Let's leave it out for now and revisit after this lands, that's fine.

Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com>

…ive one.

sirreal

Thanks for the work on this.

This is a remaining issue where a newline is removed from a text node that is only a newline, resulting in a text node with no text. The text node should not be visited at all in this case:

<pre>&#x0A;</pre>

This PR is big enough and in a good place, I don't mind landing now and handling that known issue in a follow-up.

dmsnell · 2024-07-22T20:02:12Z

Thanks for the work on this.

This is a remaining issue where a newline is removed from a text node that is only a newline, resulting in a text node with no text. The text node should not be visited at all in this case:
<pre>&#x0A;</pre>
This PR is big enough and in a good place, I don't mind landing now and handling that known issue in a follow-up.

Thanks @sirreal. I'm puzzled on this one. While there's no child node in the DOM, I find it strange that anyone would consider there to be no text node there. Perhaps I would feel differently about the actual newline byte 0x0A, but intentionally encoding it makes it seem that from a syntactic perspective, someone wanted it to be there.

Let's continue to explore this in the follow-up.

As part of work to add more spec support to the HTML API, this patch adds support for the remaining missing tags in the IN BODY insertion mode. Not all of the added tags are supported, because in some cases they reset the insertion mode and are reprocessed where they will be rejected. This patch also improves the support of `get_modifiable_text()`, removing a leading newline inside a LISTING, PRE, or TEXTAREA element. Developed in #6972 Discussed in https://core.trac.wordpress.org/ticket/61576 Props dmsnell, jonsurrell, westonruter. See #61576. git-svn-id: https://develop.svn.wordpress.org/trunk@58779 602fd350-edb4-49c9-b593-d223f7449a82

As part of work to add more spec support to the HTML API, this patch adds support for the remaining missing tags in the IN BODY insertion mode. Not all of the added tags are supported, because in some cases they reset the insertion mode and are reprocessed where they will be rejected. This patch also improves the support of `get_modifiable_text()`, removing a leading newline inside a LISTING, PRE, or TEXTAREA element. Developed in WordPress/wordpress-develop#6972 Discussed in https://core.trac.wordpress.org/ticket/61576 Props dmsnell, jonsurrell, westonruter. See #61576. Built from https://develop.svn.wordpress.org/trunk@58779 git-svn-id: http://core.svn.wordpress.org/trunk@58181 1a063a9b-81f0-0310-95a4-ce76da25c4cd

As part of work to add more spec support to the HTML API, this patch adds support for the remaining missing tags in the IN BODY insertion mode. Not all of the added tags are supported, because in some cases they reset the insertion mode and are reprocessed where they will be rejected. This patch also improves the support of `get_modifiable_text()`, removing a leading newline inside a LISTING, PRE, or TEXTAREA element. Developed in WordPress/wordpress-develop#6972 Discussed in https://core.trac.wordpress.org/ticket/61576 Props dmsnell, jonsurrell, westonruter. See #61576. Built from https://develop.svn.wordpress.org/trunk@58779 git-svn-id: https://core.svn.wordpress.org/trunk@58181 1a063a9b-81f0-0310-95a4-ce76da25c4cd

dmsnell · 2024-07-22T22:51:25Z

Merged in [58779]
db30ce9

dmsnell requested review from ockham, westonruter and adamziel July 5, 2024 04:40

dmsnell mentioned this pull request Jul 5, 2024

HTML API: Plans for 6.7 WordPress/gutenberg#60396

Open

19 tasks

dmsnell commented Jul 5, 2024

View reviewed changes

sirreal reviewed Jul 5, 2024

View reviewed changes

src/wp-includes/html-api/class-wp-html-processor.php Show resolved Hide resolved

sirreal reviewed Jul 5, 2024

View reviewed changes

sirreal approved these changes Jul 5, 2024

View reviewed changes

dmsnell force-pushed the html-api/finish-in-body-insertion-mode branch 2 times, most recently from 1e13e07 to 8c591bc Compare July 6, 2024 23:45

westonruter reviewed Jul 8, 2024

View reviewed changes

sirreal reviewed Jul 17, 2024

View reviewed changes

src/wp-includes/html-api/class-wp-html-active-formatting-elements.php Outdated Show resolved Hide resolved

sirreal reviewed Jul 17, 2024

View reviewed changes

src/wp-includes/html-api/class-wp-html-open-elements.php Outdated Show resolved Hide resolved

sirreal added a commit to sirreal/wordpress-develop that referenced this pull request Jul 17, 2024

Add contains method to open elements

d28d9ec

From WordPress#6972

sirreal added a commit to sirreal/wordpress-develop that referenced this pull request Jul 17, 2024

Add at method to open elements

2eb1e01

From WordPress#6972

sirreal added a commit to sirreal/wordpress-develop that referenced this pull request Jul 17, 2024

Add some tag handling from WordPress#6972

724dde1

This includes template support in body

sirreal mentioned this pull request Jul 17, 2024

HTML API: Implement TEMPLATE and related support #7046

Closed

dmsnell marked this pull request as ready for review July 19, 2024 17:35

dmsnell force-pushed the html-api/finish-in-body-insertion-mode branch 3 times, most recently from 85dd560 to 90abdee Compare July 20, 2024 00:07

dmsnell and others added 5 commits July 19, 2024 17:09

Add return type annotations to new methods

b509d04

Co-authored-by: Weston Ruter <westonruter@google.com>

Add argument type annotation to contains method

8eca544

Note behavioral change in BR tag.

0c8de62

Remove handling for -BR which cannot appear.

2c68eb8

dmsnell and others added 5 commits July 20, 2024 10:34

Add clear-active-formatting-elements method from Jon's TABLE PR.

5ad5cbc

Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com>

Add missing rules in IN BODY

d5976ad

Refine get_modifiable_text() to handle null bytes.

8d3b7b5

Fix modifiable text.

2c0ec74

Add new tests for modifiable text transforms.

6aff057

Merge branch 'trunk' into html-api/finish-in-body-insertion-mode

d7da469

sirreal suggested changes Jul 22, 2024

View reviewed changes

dmsnell and others added 2 commits July 22, 2024 11:17

Clarify @SInCE tag for open elements.

0c9c40f

Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com>

Test idempotency of get_modifiable_text().

8c38159

dmsnell force-pushed the html-api/finish-in-body-insertion-mode branch from d4bb5fe to 8c38159 Compare July 22, 2024 18:44

Use declarative method of tracking ignored newline instead of imperat…

1323448

…ive one.

sirreal approved these changes Jul 22, 2024

View reviewed changes

Note limitation of newline behavior and add test which currently skips.

39febe3

dmsnell added 6 commits July 22, 2024 13:03

Rename compat_mode to document_mode

47f5303

Merge branch 'trunk' into html-api/finish-in-body-insertion-mode

5141d4c

Linter

be1c65c

Update comments

081fc5e

Update comments.

3bcab2a

Update comments.

72d672c

dmsnell closed this Jul 22, 2024

dmsnell deleted the html-api/finish-in-body-insertion-mode branch July 22, 2024 22:51

sirreal mentioned this pull request Jul 23, 2024

HTML API: Add table support #6040

Closed

dmsnell mentioned this pull request Jul 29, 2024

HTML API: Remove null bytes from get_modifiable_text #7047

Closed

sirreal mentioned this pull request Aug 26, 2024

HTML API: CSS class name methods should behave according to quirks mode #7169

Closed

	$current_node = $this->state->stack_of_open_elements->current_node();
	if ( $current_node && 'OPTION' === $current_node->node_name ) {
	foreach ( $this->state->stack_of_open_elements->walk_up( $current_node ) as $parent ) {
	break;
	}
	if ( $parent && 'OPTGROUP' === $parent->node_name ) {
	$this->state->stack_of_open_elements->pop();
	}
	}

HTML API: Complete missing tags in the IN BODY insertion mode. #6972

HTML API: Complete missing tags in the IN BODY insertion mode. #6972

Conversation

dmsnell commented Jul 5, 2024 • edited Loading

Summary

Description

Testing

github-actions bot commented Jul 5, 2024

Test using WordPress Playground

Some things to be aware of

Choose a reason for hiding this comment

sirreal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sirreal Jul 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sirreal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sirreal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jul 19, 2024 • edited Loading

dmsnell commented Jul 21, 2024

sirreal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sirreal left a comment

Choose a reason for hiding this comment

dmsnell commented Jul 22, 2024

dmsnell commented Jul 22, 2024

dmsnell commented Jul 5, 2024 •

edited

Loading

sirreal Jul 5, 2024 •

edited

Loading

github-actions bot commented Jul 19, 2024 •

edited

Loading