-
Notifications
You must be signed in to change notification settings - Fork 11
Should we follow the HTML specification to the letter? #132
Comments
Thanks for opening an issue about this topic.
If I recall correctly, we briefly discussed it with Riad and Miguel (Gutenberg contributors). It's true In terms of DX, I would personally prefer to keep it as We also thought about replacing all the So right now, my personal opinion would be to present them using
Yes, I (a bit sadly) agree. And I think the chances of having to resort to Anyway, thanks again for opening this issue. Let's use it to start collecting feedback!! 🙂 |
@dmsnell shared on this issue that he would go with @WordPress/frontend-dx what's your take? Go with |
If we go with the On the other hand, regarding implementation, attributes with the |
Thanks for the updates. I personally also would like the |
I'll repeat here for completeness, what I mentioned over there.
There's no way that adding What would be a much bigger problem for validity is choosing to break HTML syntax, as if we were writing IMO the only real risk I've seen in these discussions is using |
wp-
or data-wp-
?
@SantosGuillamot and I have just realized that the So I guess that if we want to ship valid HTML, we'd better change all our <div data-wp-on.click="actions.someAction" data-wp-class.red="selectors.isRed">
<img data-wp-bind.src="state.someImageURL" />
</div> |
@luisherranz according to the spec both The only real guidance is this sentence:
Obviously this is broken all the time and it's a single mention in a huge document and the WWW works fine. My interpretation is that the spirit of the rule is to avoid reinterpreting well-defined behaviors and to avoid with due diligence anything that could become part of the spec in the future. To me this rules out the use of |
Sorry, I meant colons ( The spec says the names should be XML-compatible and XML names can't have colons:
|
I guess the question we have to answer here is whether we want to propose an API that follows HTML specifications to the letter or not (even if we know that not following it will be fine and won't cause any issues). In this case, it's just about using I'm going to change the title of the issue to reflect that. |
wp-
or data-wp-
?
Well I'd encourage us not to frame it this way, as I don't think "following it to the letter" is something that's meaningful, possible, or practical. The spec covers a number of use cases and provides a spectrum of conformance. Rules are spread out and suggest internal contradiction. For instance, in the attribute name parsing rules, there's nothing wrong with the So I prefer to ask about what the specification allows and what implementations will handle. Keep in mind that everything being discussed with directives breaks the HTML specification, because we aren't allowed to add these attributes if they aren't What I prefer to follow is how our choices will impact the web and how the web impacts the results of our choices. According to the HTML5 specification we cannot use Again, I don't think it's possible to follow the spec to the letter, and if we even could theoretically do that, I'm not sure it would be desirable. |
Trying to resume this conversation, it seems to me we are talking about three different things:
If we end up deciding not to follow the HTML specification to the letter, we may want to open separate discussion for the other questions. |
I still think this is a weird question, or at least a weird phrasing. My vote is 100% against Also my vote is for the |
My vote between For
So even though the chances that someone is validating their HTML output are practically zero, I think my vote goes for |
+1 for I prefer
@luisherranz does it have any practical consequences for parsing, SEO, or anything else other than setting ourselves up for hearing this feedback over and over again? I can't think of anything. |
I also vote for Regarding |
Let's go for
For some reason, my JS brain makes it easier to detect Edit: I add some examples so you can compare. <button
data-wp-text="state.myBlock.buttonText"
data-wp-on:click="actions.myBlock.clickHandler"
data-wp-class:active="state.myBlock.isActive"
data-wp-bind:disabled="state.myBlock.isDisabled"
/>
<button
data-wp-text="state.myBlock.buttonText"
data-wp-on.click="actions.myBlock.clickHandler"
data-wp-class.active="state.myBlock.isActive"
data-wp-bind.disabled="state.myBlock.isDisabled"
/> |
I don't think there are any practical consequences, but neither for |
So yeah, being super honest, we would be basically optimizing for not having to argue that there are no practical consequences over and over 😆 |
if HTML validation mattered then planes would fall out of the sky, mars would dive into the sun, zebras would stop painting striped camouflage on their bodies, and the web as we know it would crash. Gopher and Gemini would of course still be fine, as would Xanadu. I jest of course. Zebras will never stop painting stripes on their bodies. |
|
Regarding HTML validity (from an accessibility perspective) and working for (semi-)government agencies... Although the EN 301 549 en WCAG do not specify anything regarding "valid" HTML, there are still government agencies that require HTML output to pass validation. In this respect, it might be ill-advised to use |
The |
Thanks all for the conversation, and special thanks to @Potherca for chiming in and confirming that validation errors can cause problems for some sites. It's decided then, let's go with |
@michalczaplinski the point is that all this processing is likely going to add a significant overhead and a quick bail-out could be incredibly important if we can know quickly if a post or block cannot contain a directive. One way we can do that is to use a prefix for attributes whose literal value is unlikely to occur as part of legitimate expected existing HTML. In the case of function html_possibly_contains_directives( $html ) {
return false !== strpos( $html, 'wp-' );
} On the other hand if we choose a prefix like The goal is to find a way to quickly assess if we can know beyond a doubt that a given snippet of HTML cannot have directives and then avoid doing all the more costly processing in those cases. If our prefix doesn't fit the bill we can find one that does (as appears to be with what @luisherranz chose) |
tl;dr: +1 Just wanted to share some related experience with AMP here. In AMP, there is something similar to <amp-state id="myVisibility">
<script type="application/json">
false
</script>
</amp-state>
<button type="button" on="tap:AMP.setState({myVisibility: !myVisibility})">
Toggle Visibility
</button>
<p id="visible-via-state" hidden [hidden]="!myVisibility">
This is now shown!
</p> This ended up being a big challenge when implementing support for amp-bind in the AMP plugin for WordPress because PHP's DOM absolutely does not like these attributes (even when parsing as HTML). In part because of such cases, amp-bind has an alternate XML/React-compatible syntax which uses an I know that PHP DOM is not involved here since |
Reopening this issue because it seems the dot notation we are using for directives like This means that, to support serializing directives to the database using JSX, we need to change the syntax again. At this moment, we can't use the following code in static blocks: export const Save = () => {
const blockProps = useBlockProps.save();
return (
<div data-wp-context='{ "hidden": true }' {...blockProps}>
<button data-wp-on.click="actions.toggle">Show Text</button>
<div data-wp-bind.hidden="context.hidden">
<span>Some Text!</span>
</div>
</div>
);
}; We could go back to the With those considerations, these are the alternatives that come to our mind:
All of them are valid HTML and valid JSX, and the first two possibilities are inspired by BEM (Block, Element, Modifier) methodology, which is also used for CSS WordPress coding guidelines. Any opinions about which one is better? Any other option that has not been considered? |
there's still the option of abstracting this, which might help down the line if we have to adjust again: create a helper function which lets us focus on the behaviors we want to add and not the html. return (
<div {...wpBehaviors({context: {hidden: true}})} {...blockProps}>
<button {...wpBehaviors({on: {click: 'actions.toggle'}})}>Show Text</button>
<div {...wpBehaviors({bind: {hidden: 'context.hidden'}})}>
<span>Some Text!</span>
</div>
</div>
); this lets us keep the syntax we prefer, being it const wpBehaviors = behaviors => {
const attributes = {};
foreach ( const [ key, value ] of Object.entries( behaviors ) ) {
// extract relevant info from values
// e.g. {on: {click: 'actions.toggle'}} into "data-wp-on.click='actions.toggle'"
}
return attributes;
} |
@dmsnell I like your suggestion of abstracting the syntax in JSX. This keeps the syntax centralized in a single utility function for JSX, while sticking with the syntax originally decided on which is already supported by the HTML standard. We could take this idea even further and also implement a PHP utility function of the same kind, encouraged over direct output of e.g. return '<div' . wp_behaviors( array( 'bind' => array( 'hidden' => 'context.hidden' ) ) ) . '><span>Some Text!</span></div>'; And the function like: // $behaviors would be an associative array - could also be an object, but those aren't great to use in PHP.
function wp_behaviors( array $behaviors ) {
$attributes = '';
foreach ( $behaviors as $key => $value ) {
foreach ( $value as $k => $v ) {
$attributes .= ' data-wp-' . $key . '.' . $k . '="' . $v . '"';
}
}
return $attributes;
} |
@felixarntz Nit: in PHP we now have $p = new WP_HTML_Tag_Processor('<div><span>Some Text!</span></div>');
$p->next_tag();
foreach ( $behaviors as $key => $value ) {
foreach ( $value as $k => $v ) {
$p->set_attribute( "data-wp-$key-$k", $v );
}
}
$html = $p.''; |
What about a dollar character JSX attributes consist of IdentifierPart which consists of IdentifierPartChar which itself can contain the I checked these using the following python script: import unicodedata
for code_point in range(0x110000):
char = chr(code_point)
if unicodedata.category(char)[0] == "L" or unicodedata.category(char) == "Nd" or unicodedata.category(char) == "Pc":
print(char, end="") None of these unicode symbol stood out to me as international and easy to type enough, but the dollar sign could work. |
IMO, the |
I also prefer Regarding the proposed syntax abstraction, one of the main disadvantages of the Interactivity API is that it introduces a third templating language in WordPress. I wouldn't want that to be split into yet another three slightly different ways of writing it (JSX, PHP and HTML). In my opinion, the closer we get to being able to copy/paste the code between JSX, PHP and HTML, the better and simpler the DX will be. So if this is something that can be solved by simply switching to |
I also feel the I would go with |
did any of us consider joking aside, thanks for the open attitude all; finding a solution that works may be "ugly" to some but likely whatever we choose is actually quite fine both technically and socially, as long as we don't change it afterwards and break existing code. one piece of value in having the JS and PHP abstractions is that (while it remains close to the language in the HTML) it won't suffer from encouraging un-encoded and un-serialized attribute use. if we use the Tag Processor to add them this won't be a problem, but in most of the code examples and discussions I've followed we're mostly demonstrating a normative way of serializing that encourages skipping the safety step. once we get the "funky comment" support in the Tag Processor it's my intention to quickly build up the templating engine, which I think can alleviate some of this though. I'm sharing some personal worry here, but it makes me nervous demonstrating unescaped JSON data in an attribute, regardless of how the attribute is named. </tangent> |
Upon considering this a bit further, I'd like to exclude This is not a problem, however, for double hyphens ( Use of a single underscore ( All this being said, couldn't we solve this problem and just go with a single hyphen? The hyphen could just be the separator. Considering these cases:
There doesn't seem to be an ambiguity problem to just go with:
or
If an attribute begins with |
I like the easiness of a single hyphen. If we finally promote the use of namespaces in custom directives to avoid naming collisions between plugins, maybe we can allow single underscores in the directive names part: |
It probably would also work to have |
I've always wondered about that 😄 |
There's no point in delaying this more, so I think we should make a decision. My vote goes for @westonruter's proposal of a single hyphen. I think it's the most elegant solution. The only core directive that may have more than one word and need an underscore is if we add dangerous to Also, if we go with that and we want to avoid underscores on custom directives, we could use a prefix to indicate that it's custom and includes a namespace. maybe
But that's not a decision we need to make now. |
Yeah, I'd prefer |
I agree using a single hyphen seems the cleanest solution. I'm a bit afraid of the syntax of custom directives, and, from a newbie, it could feel a bit more confusing to differentiate between the directive and the attributes: (In Anyway, it seems none of them is perfect so I'm happy to go with a single hyphen. |
I don't really like any of the options 😞, but if I had to choose one, I would go for hyphens instead of underscores. A single hyphen would force devs to use a single word, at least for the namespace and directive name (not needed for the directive attribute). It would be a good idea if you have in mind that the whole directive identifier would be shorter. However, you don't have any visual clue to differentiate between any of the identifier parts, so it would be harder to read. <div
data-wp-bind-aria-label="..."
data-wp-x-plugin-directive-attribute="..."
></div> On the other hand, a double hyphen would allow devs to use multiple words for each identifier part. That would make them easier to read and differentiate, although it would lead to longer identifiers. This option could also be easier to explain. <div
data-wp--bind--aria-label="..."
data-wp-x-my-plugin--my-directive-name--some-attribute="..."
></div> I think I prefer the double hyphen, as it could have more advantages (just my point of view). I don't really like it anyway, though. 😅 |
After you said so, I started using the single hyphen in my code, and I think you're right. Locating the class name here seems hard: At least in my current mental subjective experience 🤷 😓 If I step away from the computer screen and look at it, the third one is the easiest to spot to me. <button
data-wp-on-click="actions.toggle"
data-wp-bind-aria-expanded="context.isOpen"
data-wp-class-is-open="context.isOpen"
> <button
data-wp-on_click="actions.toggle"
data-wp-bind_aria-expanded="context.isOpen"
data-wp-class_is-open="context.isOpen"
> <button
data-wp-on--click="actions.toggle"
data-wp-bind--aria-expanded="context.isOpen"
data-wp-class--is-open="context.isOpen"
> I didn't consider |
From my subjective perspective, they single hyphen as well 😄 As mentioned in the comment where we started this part of the discussion, using something like |
As an outside observer just reading up on the Interactivity API, if I had a vote I think the double hyphen |
Thanks for sharing that. Everybody's opinion is as valid as anyone else's here 🙂 Should we go with |
|
|
Ok, decided then. Let's see if this is finally the final syntax 😄 Even though we are already migrating things to Gutenberg, I think it makes sense to make the change here as well so people coming to this repository see the new syntax. And if someone thinks there may be any potential issue with the double hyphen |
Hi everyone,
I am wondering if you have considered the fact, that the directives introduced here are not valid html when they are used as an attribute?
On regular, built in html elements you are not allowed to define arbitrary attributes. In these cases, you have to use a
data-
attribute to keep the markup valid.For custom elements you are basically free to use whatever you want.
Although most JS frameworks use custom attributes when coding, will compile them away / remove these attributes when rendering to the DOM or on the server.
If you are aware of this issue, was it a conscious choice to still use those directives as attributes? If so, what was the reasoning? While there is an ongoing discussion about how important valid markup is, I think this will be a sensible topic in the WordPress ecosystem.
If you do not intend to use the directives as attributes, I apologise for raising this issue. It is still not so easy to get all of the details right for this project from the outside.
The text was updated successfully, but these errors were encountered: