Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Issue #843: Validate Gutenberg blocks for AMP compliance #902

Closed
wants to merge 2 commits into from

Conversation

kienstra
Copy link
Contributor

@kienstra kienstra commented Jan 25, 2018

WIP Pull Request

Hi @westonruter and @ThierryA,
Here's the beginning of the script to validate Gutenberg blocks. The script is similar to the one pasted in Issue 843. But it applies AMP validation to every block.

As @westonruter suggested, it gets the content by calling the block's save() function. As this is what goes to the server.

We might consider not displaying notices for all blocks. Some are valid AMP once the back-end sanitizer alters them. Like the 'Audio' block.

This still works mainly like in the earlier screencast, just for all of the blocks. Though I need to test this on more of them.

I still need to handle Travis issues.

For full-page validation on saving, especially in the "classic" editor, we might do something like this snippet.

See #843.

In wp-admin/post.php, load a script.
This overwrites the edit() function of registered blocks.
The new function validates the block's content.
And conditionally displays a notice.
@todo: consider not using this for all blocks.
Some blocks are valid once the back-end sanitizer alters them.
Like the 'Audio' block.
This simply uses the save() function of the block to get the markup.
On saving a post in post.php, make request to the permalink for the post.
And validate that for AMP compatibility.
This applies to the 'classic' editor, not AMP.
But refactoring validatePage() would allow it to be used on Gutenberg.
@westonruter
Copy link
Member

As we talked about on Slack…

The AMP validator running client-side on the Gutenberg blocks is running on the pre-sanitized output. So if you add an image then it's going to marked as invalid because it is not an amp-img. Now, one possibility would be to try modifying all of the blocks to use AMP components instead of HTML, but that seems not scalable or maintainable, at all. So… perhaps what is needed is an endpoint where we can POST some HTML to then have it run through the sanitization logic and then get back the result. This would mean that the block validation wouldn't be live, but it would mean that it wouldn't be reporting errors for things that are going to be sanitized on the frontend.

This will depend on #912.

@pbakaus
Copy link

pbakaus commented Jan 27, 2018

I actually have different opinion about this. Within the AMP team, we are fairly confident in our belief that the sanitizers/converters of this world will never be a perfect solution. The exception would be if they happen at a very granular level, but even then I'm curious how you'd 1:1 translate something like the layout attribute in <amp-img>.

So while we see the need for the sanitizers to exist, we'd like to strive for a world where they are not needed, and a CMS like WP can generate native AMP. As example, I'd much rather patch the_thumbnail to produce valid AMP, than run the final result through a sanitizer.

For Gutenberg, it is a little more tricky due to third party Gutenblocks. But I'd love to investigate if we can have at least the native, bundled blocks generate valid AMP from the get go. Please feel free to tell me if I'm delusional :)

@westonruter
Copy link
Member

@pbakaus Thanks for the input. I agree that the ideal is that the CMS itself would be outputting AMP components from the start. And it's true that sanitizers won't be a perfect solution, and users would be able to have more tailored controls provided for them if AMP components were the target. But this really would require that the CMS only ever be supporting AMP period.

For the featured image (the_post_thumbnail()) this is something that could be made to output an amp-img tag if current_theme_supports( 'amp' ). That would be straightforward, as this is dynamically-generated. It is more complicated with content that is stored in the DB. For content created with the classic editor and Gutenberg alike, the images get written into post_content as <img> tags. Now, we could wrap the Image block's save function to replace using img with amp-img:

- const image = <img src={ url } alt={ alt } { ...extraImageProps } />;
+ const image = <amp-img src={ url } alt={ alt } { ...extraImageProps } />;

However, this would then mean the AMP tag would be persistently stored in the DB. If the user deactivates the AMP plugin then all of their images will be broken. The problem is worse if the site operates in a paired mode, with AMP and non-AMP versions of content: the post rendered in non-AMP would be broken. The problem is somewhat mitigated if the site is AMP Canonical, which is what we're working on now, but this requires opt-in theme support. If the user switches to a theme that doesn't support amp then it would be the same as if the AMP plugin were disabled, and the images would be broken.

The only way I can think of to address this issue long term is for Gutenberg blocks to rely more heavily on server-side rendering. There are two kinds of blocks in Gutenbeg: static and dynamic. The vast majority of the blocks are static, in that the block's attributes are largely serialized to HTML which then allows them to be portable and resilient to not being lost when a plugin that introduced the block is deactivated: it's essentially built-in to have a fallback rendering. Dynamic blocks, on the other hand, have no serialized HTML that is saved: everything is generated by PHP on the frontend. The Latest Posts block is the best example of this. A concern with such blocks is that the editing interface still renders using JS via React whereas there is a separate PHP-based rendering function on the frontend. This results in a duplication of rendering systems. Perhaps this is acceptable for the core blocks where the user experience warrants having duplicated rendering logic, but for most developers I can't imaging they'd want to implement a shortcode-like PHP renderer in addition to a React-based renderer. They'd want to just do the PHP one alone.

We could implement a PHP renderer for the Image block. This could be the granular-level conversion you are describing. Additional attributes for managing layout could be added as block attributes and then used in the dynamic PHP rendering of the block, which would parse out the <img> and convert it into <amp-img> with the additional layout block attributes added to it at runtime. I think this would work, but it would involve having to start maintaining PHP-based renderers for all of the relevant core blocks that need AMP-specific versions. This would involve a maintenance overhead, but if limited to the core blocks then it should be ok.

For 3rd-party blocks that don't have dynamic renderers, however, we'll need the sanitizer. So I think that we should take these two separate routes in parallel, but always falling back to sanitization to enforce compliance.

@westonruter
Copy link
Member

westonruter commented Jan 28, 2018

Here is a proof of concept for what such a dynamic granular handling of the Image block could look like from the PHP side of things. There'd probably need to be some layout options added to the block's edit function as well.

add_action( 'init', function() {
	if ( ! function_exists( 'register_block_type' ) ) {
		return;
	}

	register_block_type( 'core/image', array(
		'render_callback' => function( $attributes, $raw_content ) {

			// Return raw content if not serving AMP.
			if ( ! function_exists( 'is_amp_endpoint' ) || ! is_amp_endpoint() ) {
				return $raw_content;
			}

			// Defer to sanitizer if image is external.
			if ( empty( $attributes['id'] ) || 0 !== strpos( get_post_mime_type( $attributes['id'] ), 'image/' ) ) {
				return $raw_content;
			}

			$html_attr = '';

			list( $src, $width, $height ) = wp_get_attachment_image_src ( $attributes['id'], 'large' );
			if ( false === strpos( 'width="', $raw_content ) ) {
				$html_attr .= sprintf( ' width="%d"', $width );
			}
			if ( false === strpos( 'height="', $raw_content ) ) {
				$html_attr .= sprintf( ' height="%d"', $height );
			}

			return preg_replace(
				'#<img\s([^>]*?)/?>#',
				'<amp-img $1 ' . $html_attr . '></amp-img>',
				$raw_content
			);
		},
	) );
} );

@westonruter
Copy link
Member

I should also mention that Gutenberg may eventually include built-in PHP implementation of content selector used in JS to parse out the img attributes. This would avoid having to use regular expressions as seen here.

@westonruter
Copy link
Member

westonruter commented Jan 29, 2018

As an alternative to implementing custom block renderers like above in #902 (comment) perhaps a better approach would be add AMP processing instructions to the HTML which the pre-processor (sanitizer) can use to convert the HTML into AMP as required. For example, if the core/image block in Gutenberg added a layout control with options for fixed, responsive, fill, etc. Then the layout property could get serialized into the HTML as a data attribute, e.g. <img data-amp-layout="responsive".... This would allow for the markup to be portable between AMP and non-AMP. A non-AMP stylesheet could then also target such HTML for its own stylesheet, e.g. img[data-amp-layout="responsive"] { width: … }.

We can shift from thinking of the whitelist sanitizer just about sanitization but rather now more about pre-processing/transformation. With additional attributes placed on elements this can then inform the pre-processor to output exactly what should be output, rather than having to make a best guess for the lowest-common denominator.

@pbakaus
Copy link

pbakaus commented Feb 1, 2018

Thanks for the thoughtful response @westonruter, and apologies for the delay.

I was initially under the impression that most of Gutenberg would help separate concerns in WordPress, so to eventually not have any HTML in the database anymore, but sad to hear that's not the case. The fallback reasoning is a fair one though.

Is there a way to store to alternate copies of a block in the database, where if you've enabled AMP, it saves the generated <amp-img> but also stores the <img> as fallback, for if you disable AMP, for instance?

If not, then I agree that the best path might be to map existing and new classes that change the layout of an image in WordPress to their AMP counterparts when parsed in the sanitizer. That also means that in the image block, we need to introduce the layouting capabilities of AMP for the non-AMP world, so that the user experience is consistent. It would be weird if we did the data attribute thing to say, for instance, that an image when rendered in AMP should fill to the width and height of its parent, but the rule would be ignored for non-AMP.

@westonruter
Copy link
Member

@pbakaus:

Is there a way to store to alternate copies of a block in the database, where if you've enabled AMP, it saves the generated but also stores the as fallback, for if you disable AMP, for instance?

There is not really an existing mechanism to store different variations of a block. It could perhaps be shimmed in to store different versions in meta, but then there's perhaps the worse problem of duplicate content.

If not, then I agree that the best path might be to map existing and new classes that change the layout of an image in WordPress to their AMP counterparts when parsed in the sanitizer. That also means that in the image block, we need to introduce the layouting capabilities of AMP for the non-AMP world, so that the user experience is consistent. It would be weird if we did the data attribute thing to say, for instance, that an image when rendered in AMP should fill to the width and height of its parent, but the rule would be ignored for non-AMP.

Actually, it's pretty standard for some things in WordPress to only work in the context of a given theme being active. For example, Gutenberg is adding the ability for images to be full-bleed as one of the layout options in the block UI. In order for this option to be shown, however, a theme has to explicitly add_theme_support( 'align-wide' ). Similarly, Gutenberg allows for for blocks to provide different color choices for a block, but a theme can restrict which colors are available via add_theme_support( 'editor-color-palette', …).

If a user switches to a theme that doesn't have support for these then the options are not available. I think this actually would work perfectly for layout in AMP. We can add the layout option when the theme (or a plugin) has done add_theme_support('amp'). The layout would be ignored if switching themes, in the same way that WordPress displays fallback content for blocks when the block definition is no longer registered.

@pbakaus
Copy link

pbakaus commented Feb 10, 2018

@westonruter ok got it, thanks for the clarification. I mostly talked about the case where, say, you're publishing both non-AMP and AMP pages (e.g. through paired mode) on your Wordpress instance with one theme. In this case, I think it would be majorly confusing if there would be a lot of options in Gutenberg's image block, some of which would only apply to the AMP-rendered version. As long as the experience is consistent within a theme, I'm a happy camper :)

@westonruter
Copy link
Member

@pbakaus Yeah, that's a good point. This really presumes a native AMP site, without a separate non-AMP version, at least for the singular template. Otherwise we'd either have to mark the AMP-only settings, or re-implement the features available in AMP for non-AMP responses. That being said, what if we just embraced “dirty AMP” for serving out AMP components in non-AMP documents?

@pbakaus
Copy link

pbakaus commented Feb 21, 2018

@westonruter not a bad thought in general, but risky, as the AMP v0 library assumes it's the only DOM manipulator on the page, and thus has a pretty simplistic world view where it doesn't observe mutations, scroll etc. So dirty AMP only works if there's no JS on the page that manipulates the DOM.

@ThierryA
Copy link
Collaborator

ThierryA commented Feb 21, 2018

AMP offers a lot of great components which we can leverage not only in the context of image layout but also to build custom blocks such as carousel for example. That would obviously not work for paired mode unless we build a non AMP fallback or embrasse the risk of dirty mode (which isn't bulletproof at the moment).
I would personally not be in favor of adding fallbacks as it is unecessarly re-inventing the wheel and will be a nightmare to maintain. So I would vote to either make AMP component work in "dirty AMP" in a bullet proof way if possible (to investigate further) or make native AMP a requirement in order to benefit from all the "advanced" features.

@westonruter
Copy link
Member

The discovery that has come out of this PR has been really insightful and important going forward. However, I think that the specific approach taken in this PR for validation (loading the validator.js from the AMP project for client-side validation) should give way to the validation of server-side rendering which is now enabled with #971 powered by the plugin itself. In short, with that PR after each save we run the the post through the preprocessor to validate the content for errors. It currently has support for detecting validation errors introduced by shortcodes, and the same approach can then also be applied to blocks. Then it is a matter of taking the validation results and displaying them meaningfully with their corresponding blocks.

As I see them, the next steps for Gutenberg support are:

  1. Complete discovery of Gutenberg blocks and their base compatibility (whether anything gets removed through validation) with the AMP plugin out of the box. See Conduct a Discovery to determine support for Gutenberg blocks. #845.
  2. Integrate block validation results with the UI to reveal when a validation error does occur. Normally such errors will not happen for core blocks, but for the Custom HTML block in particular it is a distinct possibility. It is also key for plugins which introduce blocks so that their AMP validity can also be reported.
  3. Extend core blocks which have validation errors so that the appropriate AMP markup is output to faithfully render the block when the page is served as AMP. This would be done via some combination of the render_callback that integrates with a new block sanitizer/preprocessor.
  4. Further extend core blocks to leverage AMP functionality, such as layout.
  5. Introduce blocks that feature AMP-specific components (e.g. amp-ad and even amp-story in Implement AMP Stories #968) which are served in both AMP and non-AMP documents as Dirty AMP (if possible).

That's what I see as the next steps on the roadmap for Gutenberg/AMP integration.

kienstra pushed a commit that referenced this pull request Mar 18, 2018
The POST to the REST API in the JS file doesn't work yet.
And this doesn't apply a requirement in PR #1019:
'Pass validation errors back in REST API response when Gutenberg saves a post.'
But this is a prototype of a way to display notices for each block, async.
The UX would be like that in the previous PR #902.
As Weston mentioned earlier, edit() is synchronous.
And REST requests to validate content could delay it.
So add a component for 'edit.'
Display a notice, based on the state of isValidAMP.
The REST API request will update the state.
@schlessera schlessera deleted the add/843-gutenberg-amp-validation branch November 6, 2020 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants