Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate AMP validator into post authorship flow, to alert if content within the post editor is not compatible with AMP #843

Closed
MackenzieHartung opened this issue Jan 4, 2018 · 17 comments
Milestone

Comments

@MackenzieHartung
Copy link

Acceptance Criteria

AC1: On saving a post on /wp-admin/post.php, if there’s post content that’s not compatible with AMP, display an error at the top of the page.
AC2: The error will not display specifically what is incompatible.
AC3: The error will display the following text: “Notice: your post fails AMP validation”.

@ThierryA ThierryA added this to the v0.7 milestone Jan 8, 2018
@kienstra
Copy link
Contributor

kienstra commented Jan 15, 2018

Request To Work On This

Hi @ThierryA and @westonruter,
Could I please work on this? Though this is in Sprint 3, It looks like we're doing well on Sprint 2. But maybe you have other priorities in mind.

If so, I'll write a simple solution design here so we're on the same page.

Update: I'm now working on a sub-issue of #839.

Thanks!

@kienstra kienstra assigned ThierryA and unassigned ThierryA Jan 15, 2018
@westonruter
Copy link
Member

@kienstra Please do. I see this highly related to #842, but probably a different mechanism. For content, I suppose it would involve hooking into the whitelist sanitizer to find elements that are dropped due to being illegal? If this could identify the specific paragraph in the content that has invalid content in it, that would be ideal. Naturally this would straightforward to accomplish with Gutenberg, and perhaps Gutenberg should be the focus for this here then.

For #842 my thought was do do something at a higher level, when hooks are run, to run the validator on the entire page output rather than on just the content.

I don't see these two as being mutually exclusive. One is focused on the content authorship—telling users what they can't author—and the other is focused on the site administration—telling users what plugins are doing illegally.

@kienstra kienstra self-assigned this Jan 18, 2018
@kienstra
Copy link
Contributor

In Development

Hi @westonruter,
Thanks for your details here. If it's alright, I'll post an update soon.

@westonruter
Copy link
Member

Let's chat about the approach to take here.

@kienstra
Copy link
Contributor

kienstra commented Jan 24, 2018

Possible Approach

Hi @westonruter,
Thanks for waiting here. One approach that I've prototyped is front-end validation in Gutenberg (screencast).

Unfortunately, this only applies on editing a block. Not on saving it, as you mentioned.

Here's a very rough prototype that you could paste in the console on a Gutenberg post.php page:

var isValidAMP,
	previousWp = wp,
	el = wp.element.createElement,
	Notice = wp.components.Notice,
	blockType = 'core/html',
	htmlBlock = wp.blocks.unregisterBlockType( blockType ),
	OriginalBlockEdit = htmlBlock.edit;

/**
 * Whether markup is valid AMP.
 *
 * Use the AMP validator from the AMP CDN.
 * And place the passed markup inside the <body> tag of a basic valid AMP page.
 *
 * @param string markup
 * @returns {boolean} $valid Whether the passed markup is valid AMP.
 */
isValidAMP = function( markup ) {
	var ampDocument = `<!doctype html><html ⚡><head><meta charset="utf-8"><link rel="canonical" href="./regular-html-version.html"><meta name="viewport" content="width=device-width,minimum-scale=1"><style amp-boilerplate>body{-webkit-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-moz-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-ms-animation:-amp-start 8s steps(1,end) 0s 1 normal both;animation:-amp-start 8s steps(1,end) 0s 1 normal both}@-webkit-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-moz-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-ms-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@-o-keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}@keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}</style><noscript><style amp-boilerplate>body{-webkit-animation:none;-moz-animation:none;-ms-animation:none;animation:none}</style></noscript><script async src="https://cdn.ampproject.org/v0.js"></script></head><body>${markup}</body></html>`,
		validated = amp.validator.validateString( ampDocument );
	return ( validated.hasOwnProperty( 'status' ) && 'PASS' === validated.status );
};

jQuery.getScript( 'https://cdn.ampproject.org/v0/validator.js', function() {
	/*
	 * Workaround, not recommended in actual plugin.
	 * Replaces the wp value that the script above overwrote.
	 */
	wp = previousWp;
} );

/**
 * Overwrites the 'Custom HTML' block's edit() function.
 *
 * Outputs the original component, in OriginalBlockEdit.
 * If it's not valid AMP, it also outputs a Notice.
 */
htmlBlock.edit = function( props ) {
	var content = props.attributes.hasOwnProperty( 'content' ) ? props.attributes.content : '';
	        result = [ el( OriginalBlockEdit, Object.assign( props, { key: 'original' } ) ) ];
	if ( ! isValidAMP( content ) ) {
		result.unshift( el(
			Notice,
			{
				key: 'notice',
				title: 'Notice',
				status: 'warning',
				content: 'This is not valid AMP',
			}
		) );
	}
	return result;
};
wp.blocks.registerBlockType( blockType, htmlBlock );
  • Overwrites a block's edit() function. In this case, the Custom HTML block. But it could be others, like the paragraph.
  • Validates the content in edit() via a function available by loading this AMP CDN validator.js script.

Issues:

  • Like you mentioned, it'd be better to only run validation on saving the post. Here's the callback for saving a post. Maybe there's a way to use that to trigger messages to appear.
  • Duplicate sanitization. There's already back-end sanitization, as you mentioned.
  • The AMP CDN validator.js script overwrites the wp object with a function.

This is based on the 'Adding a caption' section in this post.

@westonruter
Copy link
Member

@kienstra that looks amazing! Having that client-side JS-based validation is perfect as it is realtime and it is block-based. You should absolutely keep going with this. I don't see this in any way conflicting with the validation of the frontend as a whole. When saving, we can send out a request to get the frontend HTML and then run it through the validator as you're doing at the block level. This will need a bit more

The AMP CDN validator.js script overwrites the wp object with a function.

Why would it do this? That seems like a big flaw in the validator script. Apparently it's not encapsulating the minified function names which is causing wp to be defined be accident. There is also a wo and many other variables that are polluting the global namespace. This should get fixed in the AMP project as a prereq.

@amedina
Copy link
Member

amedina commented Jan 24, 2018

@pbakaus @Gregable take a look at this issue with the validator...

@Gregable
Copy link
Member

Would you mind filing an issue on the AMP github tracker:
https://github.com/ampproject/amphtml/issues/new

kienstra pushed a commit that referenced this issue Jan 25, 2018
In wp-admin/post.php, load a script.
This overwrites the edit() function of registered blocks.
The new function validates the block's content.
And conditionally displays a notice.
@todo: consider not using this for all blocks.
Some blocks are valid once the back-end sanitizer alters them.
Like the 'Audio' block.
This simply uses the save() function of the block to get the markup.
@kienstra
Copy link
Contributor

kienstra commented Jan 26, 2018

AMP Validation In 'Classic' Editor

Here's a screencast of validating a post in the 'classic' editor. On saving the post, the JavaScript file makes an AJAX request to the permalink. Then, it validates that entire HTML document for AMP compatibility, using amp.validator.validateString().

But the back-end sanitizers work really well, and it's hard to enter content that won't be AMP-compatible by the time the page is rendered.

class-editor-validation

Next steps include applying similar front-page validation to Gutenberg.

kienstra pushed a commit that referenced this issue Jan 26, 2018
On saving a post in post.php, make request to the permalink for the post.
And validate that for AMP compatibility.
This applies to the 'classic' editor, not AMP.
But refactoring validatePage() would allow it to be used on Gutenberg.
kienstra pushed a commit that referenced this issue Jan 28, 2018
Building upon Weston's work and solution design,
Add a class to track whenever a node or attribute is removed.
And a method to get whether a node was removed.
The format of the stored nodes and attributes might change.
This will probably depend on the error reporting needed
in the REST API and GET request response.
kienstra pushed a commit that referenced this issue Jan 28, 2018
There was an error:
Class file names should be based on the class name with 'class-'
But the format of the other test files is different.
So use that format, and exclude this rule for test files.
kienstra pushed a commit that referenced this issue Jan 28, 2018
The 'mutation_callback' will then track removed nodes and attributes.
Also, change the way in which we pass the 'mutation_callback.'
Before, it was part of the constructor of:
AMP_Tag_And_Attribute_Sanitizer.
Instead, move it to the $args of:
AMP_Content_Sanitizer::sanitize().
This will pass it to all of the sanitizer/* files when they're instantiated.
@todo: look at whether to call the callback for all node removals.
kienstra pushed a commit that referenced this issue Jan 28, 2018
Before, there were 3 places in the file that called removeChild().
This was fine, but they now need to call the mutation callback.
So abstract these into remove_child().
Also, call the mutation callback in AMP_Video_Sanitizer.
kienstra pushed a commit that referenced this issue Jan 29, 2018
Per Weston's description in PR #912,
It allows sending a POST with markup for validation.
The headers should have 'Content-Type' of 'application/json.'
And it should pass the markup in the param 'markup.'
The current response only has 'is_error.'
@todo: look at returning more in the response,
like the stripped tags and attributes.
Also, add nonce verification.
kienstra pushed a commit that referenced this issue Jan 30, 2018
This is only one approach.
But for now, the response has counts for:
'removed_nodes' and 'removed_attributes'.
If a <script> is removed, 'removed_nodes' will be:
{"script":1}.
The count will increment every time the same node type is removed.
There is a similar histogram for 'removed_attributes'.
kienstra pushed a commit that referenced this issue Jan 30, 2018
In response to Travis errors.
@todo: apply next requirement in PR #912.
kienstra pushed a commit that referenced this issue Jan 30, 2018
Abstract the logic for the response into get_response().
This enables using it for the existing REST API logic,
And the new use-case of full-page GET requests.
kienstra pushed a commit that referenced this issue Feb 13, 2018
Before, this was on the 'save_post' action.
But as Weston mentioned,
this could be in one place,
and this removes the need for a nonce.
Also, remove the function is_authorized().
This checked for a nonce.
Replace this with the existing has_cap().
kienstra pushed a commit that referenced this issue Feb 13, 2018
Because 'wpautop' runs on 'the_content',
process_markup() showed errors from removing <p> tags.
For example, it created this markup:
<p><script async src=https://example.com/script></script></p>
It seemed to have removed the <p> tag,
As it contained a disallowed element.
But it's not needed to report that the <p> is removed.
So remove 'wpautop' as a callback for 'the_content.'
Gutenberg also does this, unless the post has no block.
@see gutenberg_wpautop().
kienstra pushed a commit that referenced this issue Feb 13, 2018
Instead of wrapping the 'invalid_callback' in this,
simply exist process_markup().
If that callback isn't added,
there's no need for the rest of the function.
@kienstra
Copy link
Contributor

kienstra commented Feb 20, 2018

Moving Into QA

Hi @westonruter,
If it's alright, I'm moving this into QA.

Steps To Test

  1. Create a new post on the test site
  2. Paste the following content in the editor:
<button onclick="alert('this')">This</button>
<button onclick="alert('this')">This</button>

<script></script>
  1. Click "Save Draft"
  2. Expected: a warning with
  • A message that the post isn't valid
  • The invalid elements
  • The invalid attributes

validation-failure-message

  1. Remove the content, and paste:
https://www.youtube.com/watch?v=GGS-tKTXw4Y
Example content
  1. Click "Save Draft"
  2. Expected: there's no warning

Known Issue
The sanitizer seems to also report removal of <p>, when also removing another element.

@kienstra kienstra assigned csossi and unassigned kienstra Feb 20, 2018
@kienstra kienstra reopened this Feb 20, 2018
@westonruter
Copy link
Member

The sanitizer seems to also report removal of <p>, when also removing another element.

This is due to the whitelist stanitizer removing empty parent elements when an invalid element is removed. I believe it is this logic here:

https://github.com/Automattic/amp-wp/blob/388f79f369997938224aa489fa86cf003a3eb9cd/includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php#L1479-L1485

At the very least this can be changed from remove_invalid_child to just removeChild since it is not invalid. But I'm not 100% sure this routine is needed.

@ThierryA
Copy link
Collaborator

@westonruter @kienstra what is the reason for the preprocessor remove the parent if the parent is valid, is the intention not leave empty HTML Tag?

@kienstra
Copy link
Contributor

kienstra commented Feb 21, 2018

Removed Parent Element

Hi @ThierryA,
It looks like the intent is for the preprocessor to remove the parent if its only child was the removed element.

In that case, there's probably no reason to report its removal, as we don't know if it's invalid. @westonruter's suggestion works locally:

$parent = $parent->parentNode;
if ( $parent ) {
-         $this->remove_invalid_child( $node );
+         $parent->removeChild( $node );
}

There's also a slightly-relate edge case where this doesn't report the removal of a parent if the child is removed. For example:

<nonexistent><notatag></notatag></nonexistent>

This only reports:

Invalid elements: notatag

Making this change to replace_node_with_children() seems to address this:

		// Replace node with fragment.
		$node->parentNode->replaceChild( $fragment, $node );
+		if ( isset( $this->args['remove_invalid_callback'] ) ) {
+			call_user_func( $this->args['remove_invalid_callback'], $node );
+		}

@csossi
Copy link

csossi commented Feb 21, 2018

@kienstra , not seeing any warning here
image

@csossi csossi assigned kienstra and unassigned csossi Feb 21, 2018
@kienstra
Copy link
Contributor

kienstra commented Feb 22, 2018

Dev Site

Hi @csossi,
Thanks for testing this. Could you please use the dev site? It has more up-to-date code. It looks to display the errors, as expected.

@kienstra kienstra assigned csossi and unassigned kienstra Feb 22, 2018
@csossi
Copy link

csossi commented Feb 22, 2018

verified in QA

@kienstra
Copy link
Contributor

kienstra commented Sep 5, 2018

Request For Regression Testing

Hi @csossi,
Thanks for testing this. Could you please retest, to ensure there wasn't a regression?

Please create a post using the classic editor. Here are the test steps.

@kienstra kienstra assigned csossi and unassigned westonruter Sep 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants