Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML API: Introduce WP_HTML::tag() for safely creating HTML. #5884

Draft
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Jan 17, 2024

Trac ticket: Core-50867

Alternative to #4065

In this patch the WP_HTML::tag() method is introduced for safely creating HTML content for a given tag and a number of attributes or inner text. No support is proposed yet for inner tags.

In lieu of pushing HTML templating into WordPress 6.5 this limited API provides some of the benefit of templating without raising broader questions about performance, API design, and the like. Templating will cover the needs that this fulfills, but there's a reasonably place for both to exist, so this would not be entirely obviated from the introduction of true templating.

Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@WordPress WordPress deleted a comment from aijaz227 Jan 22, 2024
@WordPress WordPress deleted a comment from leor347 Jan 22, 2024
@WordPress WordPress deleted a comment from careway22 Jan 22, 2024
Copy link
Member

@sirreal sirreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just some notes. I'll update the tag list to include any missing tags I found.

@@ -2286,15 +2286,16 @@ public function is_tag_closer() {
*
* For boolean attributes special handling is provided:
* - When `true` is passed as the value, then only the attribute name is added to the tag.
* - When `false` is passed, the attribute gets removed if it existed before.
* - When `false` or `null` is passed, the attribute gets removed if it existed before.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes seems worth landing on its own.

/**
* Returns whether a given element is an HTML tag name.
*
* @todo Verify this list.
Copy link
Member

@sirreal sirreal Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I scraped this following list from MDN and w3.org. It looks like there may be a few more elements in this list, but I notice there are some missing as well. I can push a change to merge the lists.

HTML elements

A
ABBR
ACRONYM
ADDRESS
AREA
ARTICLE
ASIDE
AUDIO
B
BASE
BDI
BDO
BIG
BLOCKQUOTE
BODY
BR
BUTTON
CANVAS
CAPTION
CENTER
CITE
CODE
COL
COLGROUP
COMMAND
CONTENT
DATA
DATALIST
DD
DEL
DETAILS
DFN
DIALOG
DIR
DIV
DL
DT
EM
EMBED
FIELDSET
FIGCAPTION
FIGURE
FONT
FOOTER
FORM
FRAME
FRAMESET
H1
H2
H3
H4
H5
H6
HEAD
HEADER
HGROUP
HR
HTML
I
IFRAME
IMAGE
IMG
INPUT
INS
KBD
KEYGEN
LABEL
LEGEND
LI
LINK
MAIN
MAP
MARK
MARQUEE
MATH
MENU
MENUITEM
META
METER
NAV
NOBR
NOEMBED
NOFRAMES
NOSCRIPT
OBJECT
OL
OPTGROUP
OPTION
OUTPUT
P
PARAM
PICTURE
PLAINTEXT
PORTAL
PRE
PROGRESS
Q
RB
RP
RT
RTC
RUBY
S
SAMP
SCRIPT
SEARCH
SECTION
SELECT
SHADOW
SLOT
SMALL
SOURCE
SPAN
STRIKE
STRONG
STYLE
SUB
SUMMARY
SUP
SVG
TABLE
TBODY
TD
TEMPLATE
TEXTAREA
TFOOT
TH
THEAD
TIME
TITLE
TR
TRACK
TT
U
UL
VAR
VIDEO
WBR
XMP

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In deeper inspection here I feel like this is the wrong approach. We should be able to instead determine if we're in foreign content and apply the rules there accordingly. That way we don't have to keep a list of HTML elements updated and we don't have to worry about conflating elements with the same name of HTML elements with foreign elements, e.g. TITLE inside an SVG.

So I think more important now is getting foreign content detection in place. I've started working on this in #6006.

It may be the case that this relies on the HTML Processor because the rules get complicated with MathML and foriegn content integration points.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously after exploring #6006 there's no easy way to infer the outside context when creating a tag in isolation. this leads me to think that we need a good way to communicate this to developers.

maybe instead of passing 'self-closing' we could have people pass 'xml-empty-tag' or 'empty-tag-in-foreign-content'. I'd like to communicate that this isn't for IMG or any HTML tag.

the trickiest part I've come to realize is that we could have an SVG IMG tag as well, which could adopt the self-closing identity by nature of being inside foreign content, and that would be meaningful.

I don't see this being often used, so I feel comfortable making it a bit awkward. it seems incredibly unlikely to be common that someone is intentionally creating an empty XML element inside foreign content.

Comment on lines +2 to +10
/**
* HTML API: WP_HTML class
*
* Provides a public interface for HTML-related functionality in WordPress.
*
* @package WordPress
* @subpackage HTML-API
* @since 6.5.0
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used WP_HTML_Processor::is_void_tag a few times and it feels like it should live on a utility class like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it started in a utility class like this! but then we didn't include the utility class for pragmatic reasons.

@dlh01
Copy link

dlh01 commented Feb 6, 2024

If I could suggest one thing to put on your list, if it isn't there already: This new method and any future templating methods should be added to the list of auto-escaped functions in the core coding standards. Until then, developers will need to // phpcs:ignore WordPress.Security.EscapeOutput.OutputNotEscaped each use of the new methods, or they'll need to amend their PHPCS configurations to exempt the methods themselves. Or, perhaps worse, they'll run wp_kses_post() over the output just to get rid of the error. The need for a workaround might unnecessarily slow adoption of the new APIs by making them appear to be "not worth the hassle."

If I have availability to take things up with WPCS when the time comes, I'll be happy to, but I'm not sure that I will, so I thought I'd mention it separately.

@gziolo
Copy link
Member

gziolo commented Feb 7, 2024

This new method and any future templating methods should be added to the list of auto-escaped functions in the core coding standards.

Should also other methods be on that list? Example, set_attribute from the WP_HTML_Tag_Processor. Or did I misunderstood the comment left by @dmsnell in #5888 (comment)?

@dmsnell
Copy link
Member Author

dmsnell commented Feb 7, 2024

I'd be grateful for help adding this to where it belongs in the WPCS ruleset.

@gziolo that's a good point, but I don't think set_attribute() and friends are where it will matter. My guess, which is uninformed, is that we only need one or two exemptions for get_updated_html() and __toString(). It's another question if we need these both on the Tag Processor and also on the HTML Processor; I'm not sure if the rules assume that subclasses inherit this property of their base class.

On this note it would be rather helpful if we could indicate not to escape things to methods like set_attribute(). Do we even have any rules that say "be sure not to escape this!"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants