Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML API: Add explicit handling or failure for all tags #5762

Conversation

sirreal
Copy link
Member

@sirreal sirreal commented Dec 13, 2023

Trac ticket: https://core.trac.wordpress.org/ticket/60092

This implements part of the work described here: WordPress/gutenberg#54579

Abort with "unsupported" for all called-out tag names in the IN BODY section so that we can implement the "Any other start tag" and "Any other end tag" rules, including support for custom elements.


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@sirreal sirreal force-pushed the html-api/throw-on-spcial-handling-in-body branch from 851fb08 to 1e9be91 Compare December 18, 2023 09:48
@sirreal sirreal changed the title HTML API: Fail parsing on unsupported tags that require special "in body" parsing HTML API: Fail parsing unsupported tags that require special "in body" parsing Dec 18, 2023
@sirreal sirreal marked this pull request as ready for review December 18, 2023 10:17
@dmsnell
Copy link
Member

dmsnell commented Dec 18, 2023

@sirreal I've manually reviewed each tag to ensure that it's listed where it should be and missing where it shouldn't. although it doesn't require special handling, I have added in SARCASM so that we can add support for that by itself, perhaps as the last tag for which we add support - I'm sentimental or something.

note that I uncovered failing tests only by implementing the any other tag sections here. we can remove that handling, but given your changes I think it's a nice boost to add them in, and it was hard to review the changes without ensuring that those tags are or aren't properly processed.

I welcome your thoughts on this. thanks for the hard work!

@sirreal
Copy link
Member Author

sirreal commented Dec 19, 2023

I uncovered failing tests only by implementing the any other tag sections here

I was worried about that myself, I think it makes sense to add the any other tag handling here as well. I'll review your changes.

@sirreal sirreal force-pushed the html-api/throw-on-spcial-handling-in-body branch from 409ff38 to f707135 Compare December 19, 2023 19:24
@sirreal
Copy link
Member Author

sirreal commented Dec 19, 2023

I ran the test suite from #5794 against this and I see we handle ~30 more cases (previously skipped as unsupported). No new errors.

@dmsnell
Copy link
Member

dmsnell commented Dec 19, 2023

@sirreal it looks like something may have gone wrong in the last two commits as this is now changing the .github directory

@sirreal sirreal force-pushed the html-api/throw-on-spcial-handling-in-body branch from f707135 to 8f3999a Compare December 19, 2023 20:41
@sirreal
Copy link
Member Author

sirreal commented Dec 19, 2023

Thanks @dmsnell, I removed the offending commits.

@dmsnell
Copy link
Member

dmsnell commented Dec 19, 2023

@sirreal I added the WPCS exclusion for goto statements in here to silence the warning. that came from #5539

@sirreal sirreal changed the title HTML API: Fail parsing unsupported tags that require special "in body" parsing HTML API: Add explicit handling for all tags Dec 20, 2023
@sirreal
Copy link
Member Author

sirreal commented Dec 20, 2023

I've updated the title to reflect that we explicitly handle all tags now. Tags that require special handling will fail when unsupported and the "all other tags" rules have been implemented (by @dmsnell).

I reviewed the implementation of the all other tags handling. The opener rule is simple. The closer rule is much more difficult to follow, but I've carefully compared the implementation with the specification and I believe it's correct. I'm also confident in these changes thanks to the tests here and from #5794 which I've also run.

@sirreal sirreal changed the title HTML API: Add explicit handling for all tags HTML API: Add explicit handling or failure for all tags Dec 20, 2023
Comment on lines 37 to 38
public function data_single_tag_of_supported_elements() {
$supported_elements = array(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming of the function and the variable is slightly misleading now, since IIUC, the newly added elements are still _un_supported, no? 🤔 (They do throw WP_HTML_Unsupported_Exception.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the supported tags. If you compare it with data_unsupported_special_in_body_tags from the wpHtmlProcessor suite, there should be no overlap between elements in this array and that array.

Are you sure these tags throw?

A lot of tags move from unsupported to supported in this change because the catch-all "any other tag" rules are implemented here, and only the tags that require (currently unimplemented) special handling remain unsupported.

@sirreal sirreal force-pushed the html-api/throw-on-spcial-handling-in-body branch from dc9399a to 7e8eb4c Compare December 22, 2023 15:47
'CODE',
'DATA',
'DATALIST',
'DEFN',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: should be DFN

@dmsnell
Copy link
Member

dmsnell commented Dec 22, 2023

@sirreal I've made several changes to this PR

  • refactored the closing tag logic and eliminated the goto. I confirmed that it's safe to remove elements from the stack of open elements while walking up.
  • added the list of supported tags to the docblock at the start of the class.
  • updated comments that will generate linting nags.
  • fixed a typo in one of the tests: DEFN was listed but it should have been DFN.

additionally I ran the html5lib test suite against this branch and saw no regressions. the 10 failures relate to the tests not starting in IN BODY context, I believe, and are present in trunk

Tests: 1814, Assertions: 2685, Errors: 1, Failures: 10, Skipped: 802.

I'm a bit concerned that we haven't fully evaluated every branch in the "any other tag" logic; in fact I may try and run coverage on our test suite if I can get that working and make sure.

@dmsnell
Copy link
Member

dmsnell commented Dec 22, 2023

try and run coverage on our test suite if I can get that working and make sure.

good news everybody! I ran the coverage tests and it looks like we have 100% coverage for these changes. we are missing coverage for other parts of IN BODY, so maybe we should target 100% of step_in_body() for 6.5

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll try and get this in next, and then we can target the list elements. I think we can also add BR, HR, and PRE for 6.5 and cover most of all we need.

I'd like your final review of my change to the closing tag logic, but this checks out on all the tests I'm throwing at it.

@sirreal
Copy link
Member Author

sirreal commented Dec 26, 2023

I've reviewed the additional changes and they look good to me. Consider this my approval 👍

@ockham
Copy link
Contributor

ockham commented Jan 8, 2024

Committed to Core in https://core.trac.wordpress.org/changeset/57248.

@ockham ockham closed this Jan 8, 2024
@sirreal sirreal deleted the html-api/throw-on-spcial-handling-in-body branch January 8, 2024 16:10
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Jan 13, 2024
Updates from WordPress/wordpress-develop at 54a09a7ec99c59b8e640bd5aacebfdbf03bb02cc

 - WordPress/wordpress-develop#5535
   Adds support for H1 - H6 elements in the HTML Processor.

 - WordPress/wordpress-develop#5725
   Pause the Tag Processor when reaching the end of the document
   inside an incomplete syntax element.

 - Incorporates linting changes from "TODO:" to "todo".

 - WordPress/wordpress-develop#5762
   Handles the "all other tags" rules in IN BODY insertion mode.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

This patch adds the blanket exclusion from the HTML API compatability
layer after they were removed and started blocking updates from Core.

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files. From Gutenberg's
perspective they are no different than NPM packages.

Co-authored-by: Anton Vlasenko <43744263+anton-vlasenko@users.noreply.github.com>
Co-authored-by: Bernie Reiter <96308+ockham@users.noreply.github.com>
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Jan 17, 2024
Updates from WordPress/wordpress-develop at 54a09a7ec99c59b8e640bd5aacebfdbf03bb02cc

 - WordPress/wordpress-develop#5535
   Adds support for H1 - H6 elements in the HTML Processor.

 - WordPress/wordpress-develop#5725
   Pause the Tag Processor when reaching the end of the document
   inside an incomplete syntax element.

 - Incorporates linting changes from "TODO:" to "todo".

 - WordPress/wordpress-develop#5762
   Handles the "all other tags" rules in IN BODY insertion mode.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

This patch adds the blanket exclusion from the HTML API compatability
layer after they were removed and started blocking updates from Core.

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files. From Gutenberg's
perspective they are no different than NPM packages.

Co-authored-by: Anton Vlasenko <43744263+anton-vlasenko@users.noreply.github.com>
Co-authored-by: Bernie Reiter <96308+ockham@users.noreply.github.com>
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Jan 23, 2024
Updates from WordPress/wordpress-develop at f4dda54df785d0a6957dedda3648f7fab58b829f

 - Coding style changes.

 - WordPress/wordpress-develop#5762
   Adds support for the "any other tag" sections in the HTML Processor.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

 - WordPress/wordpress-develop#5897
   Adds support for HR elements in the HTML Processor.

 - WordPress/wordpress-develop#5895
   Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements
   in the HTML Processor.

 - WordPress/wordpress-develop#5903
   Adds support for the PRE and LISTING elements in the HTML Processor.

 - WordPress/wordpress-develop#5913
   Updates "all other tags" support in HTML Processor and updates list
   of void elements.

 - WordPress/wordpress-develop#5906
   Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor.

 - WordPress/wordpress-develop#5683
   Provides mechanism to scan all tokens in an HTML document in the Tag Processor.

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files.

Co-authored-by: Sergey Biryukov <sergeybiryukov.ru@gmail.com>
Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com>
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Jan 25, 2024
Updates from WordPress/wordpress-develop at f4dda54df785d0a6957dedda3648f7fab58b829f

 - Coding style changes.

 - WordPress/wordpress-develop#5762
   Adds support for the "any other tag" sections in the HTML Processor.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

 - WordPress/wordpress-develop#5897
   Adds support for HR elements in the HTML Processor.

 - WordPress/wordpress-develop#5895
   Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements
   in the HTML Processor.

 - WordPress/wordpress-develop#5903
   Adds support for the PRE and LISTING elements in the HTML Processor.

 - WordPress/wordpress-develop#5913
   Updates "all other tags" support in HTML Processor and updates list
   of void elements.

 - WordPress/wordpress-develop#5906
   Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor.

 - WordPress/wordpress-develop#5683
   Provides mechanism to scan all tokens in an HTML document in the Tag Processor.

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files.

Co-authored-by: Sergey Biryukov <sergeybiryukov.ru@gmail.com>
Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com>
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Jan 29, 2024
Updates from WordPress/wordpress-develop at f4dda54df785d0a6957dedda3648f7fab58b829f

 - Coding style changes.

 - WordPress/wordpress-develop#5762
   Adds support for the "any other tag" sections in the HTML Processor.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

 - WordPress/wordpress-develop#5897
   Adds support for HR elements in the HTML Processor.

 - WordPress/wordpress-develop#5895
   Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements
   in the HTML Processor.

 - WordPress/wordpress-develop#5903
   Adds support for the PRE and LISTING elements in the HTML Processor.

 - WordPress/wordpress-develop#5913
   Updates "all other tags" support in HTML Processor and updates list
   of void elements.

 - WordPress/wordpress-develop#5906
   Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor.

 - WordPress/wordpress-develop#5683
   Provides mechanism to scan all tokens in an HTML document in the Tag Processor.

 - WordPress/wordpress-develop#5907
   Adds support for the INPUT element in the HTML Processor

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files.

Co-authored-by: Sergey Biryukov <sergeybiryukov.ru@gmail.com>
Co-authored-by: Jon Surrell <sirreal@users.noreply.github.com>
dmsnell added a commit to WordPress/gutenberg that referenced this pull request Feb 6, 2024
Updates from WordPress/wordpress-develop:
 - From: WordPress/wordpress-develop@54a09a7
 - To: WordPress/wordpress-develop@7a71339

 - Coding style changes.

 - WordPress/wordpress-develop#5762
   Adds support for the "any other tag" sections in the HTML Processor.

 - WordPress/wordpress-develop#5539
   Adds support for list elements in the HTML Processor.

 - WordPress/wordpress-develop#5897
   Adds support for HR elements in the HTML Processor.

 - WordPress/wordpress-develop#5895
   Adds support for the AREA, BR, EMBED, KEYGEN, and WBR elements
   in the HTML Processor.

 - WordPress/wordpress-develop#5903
   Adds support for the PRE and LISTING elements in the HTML Processor.

 - WordPress/wordpress-develop#5913
   Updates "all other tags" support in HTML Processor and updates list
   of void elements.

 - WordPress/wordpress-develop#5906
   Adds support for the PARAM, SOURCE, and TRACK elements in the HTML Processor.

 - WordPress/wordpress-develop#5907
   Adds support for the INPUT element in the HTML Processor

 - WordPress/wordpress-develop#5683
   Provides mechanism to scan all tokens in an HTML document in the Tag Processor.

 - WordPress/wordpress-develop#5976
   Avoids splitting text nodes on "<" character.

 - WordPress/wordpress-develop#5992
   Only recognize true CDATA-lookalike nodes.

 - WordPress/wordpress-develop#5975
   Prevent void tag nesting when calling `next_token()`

 - WordPress/wordpress-develop#6021
   Reset parser state after seeking.

 - https://core.trac.wordpress.org/changeset/57528
   Fix typo in setting token flag.

 - WordPress/wordpress-develop#6041
   Ensure consecutive text is all joined into one text node.

The PHP files in the compatability layer are merged and maintained in
the Core repo and all changes or updates need to happen first in Core
and then be brought over to Gutenberg as built files.

Co-authored-by: sergeybiryukov <sergeybiryukov@git.wordpress.org>
Co-authored-by: sirreal <jonsurrell@git.wordpress.org>
Co-authored-by: dmsnell <dmsnell@git.wordpress.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants