-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML API: Introduce WP_HTML::tag()
for safely creating HTML.
#5884
base: trunk
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2286,15 +2286,16 @@ public function is_tag_closer() { | |
* | ||
* For boolean attributes special handling is provided: | ||
* - When `true` is passed as the value, then only the attribute name is added to the tag. | ||
* - When `false` is passed, the attribute gets removed if it existed before. | ||
* - When `false` or `null` is passed, the attribute gets removed if it existed before. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This changes seems worth landing on its own. |
||
* | ||
* For string attributes, the value is escaped using the `esc_attr` function. | ||
* | ||
* @since 6.2.0 | ||
* @since 6.2.1 Fix: Only create a single update for multiple calls with case-variant attribute names. | ||
* @since 6.5.0 Allows passing `null` to remove attribute. | ||
* | ||
* @param string $name The attribute name to target. | ||
* @param string|bool $value The new attribute value. | ||
* @param string $name The attribute name to target. | ||
* @param string|bool|null $value The new attribute value. | ||
* @return bool Whether an attribute value was set. | ||
*/ | ||
public function set_attribute( $name, $value ) { | ||
|
@@ -2354,7 +2355,7 @@ public function set_attribute( $name, $value ) { | |
* > To represent a false value, the attribute has to be omitted altogether. | ||
* - HTML5 spec, https://html.spec.whatwg.org/#boolean-attributes | ||
*/ | ||
if ( false === $value ) { | ||
if ( null === $value || false === $value ) { | ||
return $this->remove_attribute( $name ); | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,154 @@ | ||
<?php | ||
/** | ||
* HTML API: WP_HTML class | ||
* | ||
* Provides a public interface for HTML-related functionality in WordPress. | ||
* | ||
* @package WordPress | ||
* @subpackage HTML-API | ||
* @since 6.5.0 | ||
*/ | ||
Comment on lines
+2
to
+10
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've used There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it started in a utility class like this! but then we didn't include the utility class for pragmatic reasons. |
||
|
||
/** | ||
* WP_HTML class. | ||
* | ||
* @since 6.5.0 | ||
*/ | ||
class WP_HTML { | ||
/** | ||
* Generates HTML for a given tag and attribute set. | ||
* | ||
* Although this doesn't currently support nesting HTML tags inside | ||
* the generated tag, it may do so in the future. When that happens | ||
* the `$inner_text` parameter will transform into `$inner_content` | ||
* and allow passing an array of strings and other tags to nest. | ||
* | ||
* Example: | ||
* | ||
* echo WP_HTML::tag( 'div', array( 'class' => 'is-safe' ), 'Hello, world!' ); | ||
* // <div class="is-safe">Hello, world!</div> | ||
* | ||
* echo WP_HTML::tag( 'input', array( 'type' => '"></script>', 'disabled' => true ), 'Is this > that?' ); | ||
* // <input type=""></script>" disabled> | ||
* | ||
* echo WP_HTML::tag( 'p', null, 'Is this > that?' ); | ||
* // <p>Is this > that?</p> | ||
* | ||
* echo WP_HTML::tag( 'wp-emoji', array( 'name' => ':smile:' ), null, 'self-closing' ); | ||
* // <wp-emoji name=":smile:" /> | ||
* | ||
* @since 6.5.0 | ||
* | ||
* @param string $tag_name Name of tag to create. | ||
* @param ?array $attributes Key/value pairs of attribute names and their values. | ||
* Values may be boolean, null, or a string. | ||
* @param ?string $inner_text Will always be escaped to preserve the given string in the rendered page. | ||
* @param ?string $element_type 'self-closing' to self-close the generated HTML for a custom-element. | ||
* This only generates the self-closing flag for non-HTML tags, as HTML | ||
* itself contains no self-closing tags. | ||
* @return string|null Generated HTML for the tag if provided valid inputs, otherwise null. | ||
*/ | ||
public static function tag( $tag_name, $attributes = null, $inner_text = null, $element_type = 'html' ) { | ||
if ( | ||
! is_string( $tag_name ) || | ||
( null !== $attributes && ! is_array( $attributes ) ) || | ||
( null !== $inner_text && ! is_string( $inner_text ) ) | ||
) { | ||
return null; | ||
} | ||
|
||
// Validate tag name. | ||
if ( 0 === strlen( $tag_name ) ) { | ||
return null; | ||
} | ||
|
||
// Compare the first byte against [a-zA-Z]. | ||
$tag_initial = ord( $tag_name[0] ); | ||
if ( | ||
// Before A or after Z. | ||
( $tag_initial < 65 || $tag_initial > 90 ) && | ||
|
||
// Before a or after z. | ||
( $tag_initial < 97 || $tag_initial > 122 ) | ||
) { | ||
return null; | ||
} | ||
if ( strlen( $tag_name ) !== strcspn( $tag_name, " \t\f\r\n/>" ) ) { | ||
return null; | ||
} | ||
|
||
$is_void = WP_HTML_Processor::is_void( $tag_name ); | ||
$self_closes = ( | ||
! $is_void && | ||
'self-closing' === $element_type && | ||
! WP_HTML_Processor::is_html_tag( $tag_name ) | ||
); | ||
|
||
/* | ||
* This is unexpected with the closing tag, but it's required | ||
* for special tags with modifiable text, such as TEXTAREA. | ||
*/ | ||
$source_html = $self_closes ? "<{$tag_name}/></{$tag_name}>" : "<{$tag_name}></{$tag_name}>"; | ||
|
||
$processor = new WP_HTML_Tag_Processor( $source_html ); | ||
$processor->next_tag(); | ||
|
||
if ( null !== $attributes ) { | ||
foreach ( $attributes as $name => $value ) { | ||
$processor->set_attribute( $name, $value ); | ||
} | ||
} | ||
|
||
/* | ||
* Strip off expected closing tag; it will be appropriately | ||
* re-added if necessary after appending the inner text. | ||
*/ | ||
$html = substr( $processor->get_updated_html(), 0, -strlen( "</{$tag_name}>" ) ); | ||
|
||
if ( $is_void || $self_closes ) { | ||
return $html; | ||
} | ||
|
||
if ( $inner_text ) { | ||
$big_tag_name = strtoupper( $tag_name ); | ||
|
||
/* | ||
* Since HTML PRE and TEXTAREA elements strip a leading newline, if | ||
* their inner content contains a leading newline, then they _need_ | ||
* to begin with a leading newline before the inner text so that it | ||
* doesn't confuse the syntax for the content. | ||
*/ | ||
if ( | ||
( 'PRE' === $big_tag_name || 'TEXTAREA' === $big_tag_name ) && | ||
"\n" === $inner_text[0] | ||
) { | ||
$html .= "\n"; | ||
} | ||
|
||
switch ( $big_tag_name ) { | ||
case 'SCRIPT': | ||
case 'STYLE': | ||
/* | ||
* Over-zealously prevent escaping from SCRIPT and STYLE tags. | ||
* It would be more complete to run the Tag Processor and look | ||
* for the appropriate closers, but that requires parsing the | ||
* contents which could add unexpected cost. This simplification | ||
* will reject some rare and valid SCRIPT and STYLE text contents, | ||
* but will never allow invalid ones. | ||
*/ | ||
if ( false !== stripos( $inner_text, "</{$big_tag_name}" ) ) { | ||
return null; | ||
} | ||
$html .= $inner_text; | ||
break; | ||
|
||
default: | ||
$html .= esc_html( $inner_text ); | ||
} | ||
} | ||
|
||
$html .= "</{$tag_name}>"; | ||
|
||
return $html; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I scraped this following list from MDN and w3.org. It looks like there may be a few more elements in this list, but I notice there are some missing as well. I can push a change to merge the lists.
HTML elements
A
ABBR
ACRONYM
ADDRESS
AREA
ARTICLE
ASIDE
AUDIO
B
BASE
BDI
BDO
BIG
BLOCKQUOTE
BODY
BR
BUTTON
CANVAS
CAPTION
CENTER
CITE
CODE
COL
COLGROUP
COMMANDCONTENTDATA
DATALIST
DD
DEL
DETAILS
DFN
DIALOG
DIR
DIV
DL
DT
EM
EMBED
FIELDSET
FIGCAPTION
FIGURE
FONT
FOOTER
FORM
FRAME
FRAMESET
H1
H2
H3
H4
H5
H6
HEAD
HEADER
HGROUP
HR
HTML
I
IFRAME
IMAGE
IMG
INPUT
INS
KBD
KEYGEN
LABEL
LEGEND
LI
LINK
MAIN
MAP
MARK
MARQUEE
MATH
MENU
MENUITEMMETA
METER
NAV
NOBR
NOEMBED
NOFRAMES
NOSCRIPT
OBJECT
OL
OPTGROUP
OPTION
OUTPUT
P
PARAM
PICTURE
PLAINTEXT
PORTALPRE
PROGRESS
Q
RB
RP
RT
RTC
RUBY
S
SAMP
SCRIPT
SEARCH
SECTION
SELECT
SHADOWSLOT
SMALL
SOURCE
SPAN
STRIKE
STRONG
STYLE
SUB
SUMMARY
SUP
SVG
TABLE
TBODY
TD
TEMPLATE
TEXTAREA
TFOOT
TH
THEAD
TIME
TITLE
TR
TRACK
TT
U
UL
VAR
VIDEO
WBR
XMP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In deeper inspection here I feel like this is the wrong approach. We should be able to instead determine if we're in foreign content and apply the rules there accordingly. That way we don't have to keep a list of HTML elements updated and we don't have to worry about conflating elements with the same name of HTML elements with foreign elements, e.g.
TITLE
inside an SVG.So I think more important now is getting foreign content detection in place. I've started working on this in #6006.
It may be the case that this relies on the HTML Processor because the rules get complicated with MathML and foriegn content integration points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obviously after exploring #6006 there's no easy way to infer the outside context when creating a tag in isolation. this leads me to think that we need a good way to communicate this to developers.
maybe instead of passing
'self-closing'
we could have people pass'xml-empty-tag'
or'empty-tag-in-foreign-content'
. I'd like to communicate that this isn't forIMG
or any HTML tag.the trickiest part I've come to realize is that we could have an
SVG IMG
tag as well, which could adopt the self-closing identity by nature of being inside foreign content, and that would be meaningful.I don't see this being often used, so I feel comfortable making it a bit awkward. it seems incredibly unlikely to be common that someone is intentionally creating an empty XML element inside foreign content.