-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML API: Add support for missing FRAMESET and "after" insertion modes. #7165
HTML API: Add support for missing FRAMESET and "after" insertion modes. #7165
Conversation
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the Core Committers: Use this line as a base for the props when committing in SVN:
To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
As part of work to add more spec support to the HTML API, this patch adds support for the FRAMESET-related insertion modes, as well as the set of missing _after_ insertion modes. These modes run at the end of parsing a document, closing it and taking care of any lingering tags. Developed in WordPress#7165 Discussed in https://core.trac.wordpress.org/ticket/61576 See #61576.
51894ef
to
a71c63f
Compare
@sirreal there are failures in the tests that look like BODY isn't closing. If you have insight, I'd appreciate it. |
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
I've been reviewing the specification and this may be very tricky! The body tag never seems to close, but things may be inserted outside of it 😵 When the BODY and HTML tags close, nothing changes the stack of open elements. This is different from most other tags and insertion mode transitions. There's no pop here!
In "after body" (BODY closed) and "after after body" (HTML closed), comments are inserted in-place (outside BODY or HTML elements) but other things are handled using rules for "in body" insertion mode. The things that are inserted in place have special handling like "Insert a comment as the last child of the first element in the stack of open elements (the html element)" or "Insert a comment as the last child of the Document object." This is fascinating. We basically operate just like we're in body with a few invisible exceptions that are inserted out of place. The stack of open elements even continues to be modified! <i><s>
</body>
<!-- "after body" -->
body » i » s » #text
</s><em>
</html>
<!-- "after after body" -->
body » i » em » #text
</i></em>
body » #text This is similar to TABLE elements, where some elements may be bumped before the table (foster parenting). Some things, mostly comments, are bumped to after BODY or HTML nodes. It also has some similarity to FORM element closers that may be removed from the stack of open elements out-of-place. I'm not sure how best to handle this 😕 Reviewing the spec, I believe that only comments are inserted outside of the BODY and HTML nodes. I wonder if we could collect these in-place comments in a couple of lists then iterate through them then the document is finished. Update: I had overlooked that "after body" and "after after body" actually switch to mode back to body and reprocess the token, so we effectively step out of body or html tags, have an opportunity to insert some comments, then switch back if anything else is found. We do not process the tokens using the rules for "in body".
|
I explored some ideas for dealing with those comments in dmsnell#18. For now, I'm going to push a change to this PR to switch the bail conditions in after body. It seems better to bail on comments outside of |
Ideally, we could support all of this and only bail if the processor prints a comment and then re-enters.
This reverts commit a1304b5.
Some content can exist outside of BODY and HTML tags. See the after body and after after body insertion modes. These modes are difficult because the tree may be constructed out-of-order. Disallow this through the use of a few flags that are used to bail if possible out-of-order behavior is detected. This allows some HTML to be processed as long as elements are encountered in-order. Although the body tag may close, elements can still be found in-order. For example: `</html>x</body>y</body></!></html></!>` is not problematic because text nodes are found and return to "in body" insertion mode before any content is produced outside of the body or html nodes: Document ├HTML │ ├── HEAD │ ├── /HEAD │ ├── BODY │ │ ├── #text: "x" │ │ └── #text: "y" │ ├── /BODY │ └── #funky-comment: "!" ├── /HTML └── #funky-comment: "!" However, as soon as content is produced outside if body, the processor will bail if it attempts to return to in body or to produce content inside body tags. For example `</html></!></body>x` bails. It would produce this tree: Document ├HTML │ ├── HEAD │ ├── /HEAD │ └── BODY │ └── #text: "x" ├── /HTML └── #funky-comment: "!" In this case, the `#funky-comment` was found first in the document root, then a text node "x" was added to the body. These out-of-order nodes are disallowed and bail.
These modes effectively ignore non-whitespace (not supported) and insert whitespace text nodes under HTML node. Bail if out-of-order behavior is detected.
/* | ||
* > A comment token | ||
*/ | ||
case '#cdata-section': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self: I'm not sure how these got in here, but they aren't comments, and I don't see a line in the spec for them. I'm thinking it could have been a copy/paste error I missed.
This seems like a plausible idea, @sirreal. Presumably we could reuse the event queue, such that we track these comments in those lists, and From a performance standpoint, I don't see this being that troublesome, especially if all we stored in those lists were the |
I implemented that in this PR in a1304b5. I reverted the change. I had issues getting the text for the comments outside of body because |
Simplify this PR to handle common scenarios and avoid deviation from specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be in a good place 👍 We get a good increase in HTML5lib-tests which also helps my confidence:
OK, but incomplete, skipped, or risky tests!
-Tests: 1498, Assertions: 930, Skipped: 568.
+Tests: 1495, Assertions: 1026, Skipped: 469.
These should not have appeared as CDATA cannot appear in HTML.
…fter-insertion-modes
Trac ticket: Core-61576.
As part of work to add more spec support to the HTML API, this patch
adds support for the FRAMESET-related insertion modes, as well as the
set of missing after insertion modes. These modes run at the end of
parsing a document, closing it and taking care of any lingering tags.
Developed in https://github.com/wordpress/wordpress-develop
Discussed in https://core.trac.wordpress.org/ticket/61576
See #61576.
html5lib
tests