Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sanitization reporting #912

Merged
merged 43 commits into from
Feb 9, 2018
Merged

Add sanitization reporting #912

merged 43 commits into from
Feb 9, 2018

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Jan 27, 2018

  • Add callback mechanism for being able to be informed of when an element or attribute is removed. (mainly applied in e9f39, pending review)
  • Add REST API endpoint allowing for HTML to be POST'ed and get back results for what changes were made. An element removal could be considered a sanitization error. The request would require an edit_post cap and require a nonce (see this screencast)
  • For normal frontend GET requests, consider being able to return validation errors in response header when requested via query param. The errors can be gathered during sanitization and then added before the output buffer is sent. (applied, pending review)

Relates to content validation (#843) and also determining when a theme or plugin does something invalid (#842).

Fixes #843.

@kienstra
Copy link
Contributor

kienstra commented Jan 27, 2018

In Development

Hi @westonruter,
Thanks for starting this pull request, and for the outline of how to proceed. I'm working on this now if that's alright.

Ryan Kienstra added 11 commits January 27, 2018 19:50
Building upon Weston's work and solution design,
Add a class to track whenever a node or attribute is removed.
And a method to get whether a node was removed.
The format of the stored nodes and attributes might change.
This will probably depend on the error reporting needed
in the REST API and GET request response.
There was an error:
Class file names should be based on the class name with 'class-'
But the format of the other test files is different.
So use that format, and exclude this rule for test files.
The 'mutation_callback' will then track removed nodes and attributes.
Also, change the way in which we pass the 'mutation_callback.'
Before, it was part of the constructor of:
AMP_Tag_And_Attribute_Sanitizer.
Instead, move it to the $args of:
AMP_Content_Sanitizer::sanitize().
This will pass it to all of the sanitizer/* files when they're instantiated.
@todo: look at whether to call the callback for all node removals.
Before, there were 3 places in the file that called removeChild().
This was fine, but they now need to call the mutation callback.
So abstract these into remove_child().
Also, call the mutation callback in AMP_Video_Sanitizer.
Per Weston's description in PR #912,
It allows sending a POST with markup for validation.
The headers should have 'Content-Type' of 'application/json.'
And it should pass the markup in the param 'markup.'
The current response only has 'is_error.'
@todo: look at returning more in the response,
like the stripped tags and attributes.
Also, add nonce verification.
There's an existing handler to create 'amp-carousel' elements:
class AMP_Gallery_Embed_Handler.
So override the 'Gallery' widget class.
And use that in render_media().
Otherwise, that function is copied from the parent.
It calls gallery_shortcode() at the end.
Which doesn't have a filter for the markup.
This is only one approach.
But for now, the response has counts for:
'removed_nodes' and 'removed_attributes'.
If a <script> is removed, 'removed_nodes' will be:
{"script":1}.
The count will increment every time the same node type is removed.
There is a similar histogram for 'removed_attributes'.
In response to Travis errors.
@todo: apply next requirement in PR #912.
Abstract the logic for the response into get_response().
This enables using it for the existing REST API logic,
And the new use-case of full-page GET requests.
@kienstra
Copy link
Contributor

Screencast Of REST API Responses

Here's a screencast of the current validation via the REST API. The schema of the response will probably change as I look at outputting it in a frontend GET request.

In a frontend GET request, add a header:
'AMP-Validation-Error'.
This outputs whether the sanitizers stripped nodes or tags.
A possible output is:
'{"has_error":true,"removed_nodes":{"script":1},"removed_attributes":{"async":1}}'
*/
public static function finish_buffer_add_header( $output ) {
$markup = self::finish_output_buffering( $output );
AMP_Mutation_Utils::add_header();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to limit this to only be added when a user specifically requests this additional information. Like there should be a nonce that must be present to authorize the reporting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @westonruter. That's a good idea to add the header only for users with a nonce.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied with a nonce, details to come shortly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMP_Mutation_Utils::add_header() now has a check for authorization via self::is_authorized(). That verifies the nonce and capability.

@@ -523,6 +535,7 @@ public static function finish_output_buffering( $output ) {
$dom = AMP_DOM_Utils::get_dom( $output );
$args = array(
'content_max_width' => ! empty( $content_width ) ? $content_width : AMP_Post_Template::CONTENT_MAX_WIDTH, // Back-compat.
'mutation_callback' => 'AMP_Mutation_Utils::track_removed',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above, this should be conditional based on whether an authorized nonce is present.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This is now only added if AMP_Mutation_Utils::is_authorized(). That checks for the nonce and capability.

@@ -506,13 +506,25 @@ public static function get_amp_component_scripts() {
* Start output buffering.
*/
public static function start_output_buffering() {
ob_start( array( __CLASS__, 'finish_output_buffering' ) );
ob_start( array( __CLASS__, 'finish_buffer_add_header' ) );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can leave this as finish_output_buffering I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this commit changes the name back to finish_output_buffering().

@kienstra
Copy link
Contributor

kienstra commented Feb 1, 2018

Example Of GET Response Headers
Question About Schema

Hi @westonruter,
Here's a screencast of the frontend response headers showing AMP errors. What do you think about the schema of the response?

amp-validation-issue

@westonruter
Copy link
Member Author

@kienstra thanks for that great video. Really cool to see.

Aside: you can still use Postman with authenticated requests if you just install the basic auth plugin.

* @since 0.7
*
* @param object $child The node to remove.
* @param object $parent The parent node for which to remove the child (optional).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of object these should probably be DOMNode.

Also, is the $parent even needed since it will always be available via $child->nodeParent?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @westonruter. This commit changes the @param type to DOMNode, and removes the second parameter. Thanks for pointing out that the second isn't needed.

$parent->removeChild( $child );
}

if ( isset( $this->args['mutation_callback'] ) ) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that remove_child could be called without any node being removed, per the above if/elseif conditions. Should this check to see if something was actually removed? I'm curious as to the scenarios when the else condition would be met.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a good point. This commit from above has the if conditional nested, to ensure there was a removal before calling the 'mutation_callback'.

As Weston mentioned, the child could get the parentNode.
So there's no reason for the elseif.
Also, this makes it possible to nest the 'mutation_callback.'
So it's only called if there's a removal.
$response = array(
'has_error' => self::was_node_removed(),
'removed_nodes' => self::$removed_nodes,
'removed_attributes' => self::$removed_attributes,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful if the processed markup were also returned here. It could then be used for previewing, for example.

Copy link
Contributor

@kienstra kienstra Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @westonruter, that would help. Do you think we should return the processed markup only if we're validating a limited amount of markup, like a single Gutenberg block? We could detect this by whether get_response() is called with a $markup argument.

if ( isset( $markup ) ) {
        $response['processed_markup'] = $markup;
};

If get_response() is called without a $markup argument, it's validating the entire document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if markup is supplied to validate, then it should get returned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @westonruter.

These commits output the 'processed_markup', when $markup is passed to the function.
request-postman-validate

Ryan Kienstra added 3 commits January 31, 2018 20:08
This function has the same logic as the current get_buffer().
But the name is more descriptive.
The return value is simply void.
So there's no need for any more information.
Respond with the markup that is submitted in the request,
In the value 'processed_markup'.
Full-page requests won't have the markup in the response.
esc_html() might not be the best way to escape the markup.
But it doesn't display properly without escaping.
Ryan Kienstra added 2 commits February 7, 2018 14:10
The function prepare_response() doesn't exist.
Instead, use get_buffer().
Before, the error message always appeared.
This is because it only checked that the response
had a value for 'has_error'.
But this needs to be true in order for there to be a reported error.
@kienstra
Copy link
Contributor

kienstra commented Feb 8, 2018

Fixing My Mistakes In Resolving Merge Conflicts

Hi @ThierryA,
I also see an issue with this PR. I think they're related to how I resolved the merge conflicts. I'm working on them now.

I had renamed some methods in this branch: add/sanitization-reporting.
Also, remove the parameter from finish_output_buffering().
That function in the 'develop' branch no longer has as parameter.
*
* @return void.
*/
public static function add_header() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually used yet. I think it was a foundation for validating plugins on activation (#842)

@@ -11,6 +11,9 @@
</properties>
</rule>

<rule ref="WordPress.Files.FileName.InvalidClassFileName">
<exclude-pattern>tests/*</exclude-pattern>
</rule>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all of the test files begin with test-, while this rule states that they should begin with class-.

There were different characters in prepare_response(),
Mabye from copying from GitHub.
Also, adjust documentation, and add a @codingStandardsIgnoreEnd.
*
* @return void.
*/
public static function amp_rest_validation() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't currently used, though it would be helpful for Gutenberg block validation.

@kienstra
Copy link
Contributor

kienstra commented Feb 8, 2018

Ready For Review

Hi @ThierryA,
This pull request is ready for review, with the error addressed.

Copy link
Collaborator

@ThierryA ThierryA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @kienstra, nice an clean coding here. I left my CR below.

@@ -704,6 +704,9 @@ private function sanitize_disallowed_attributes_in_node( $node, $attr_spec_list

foreach ( $attrs_to_remove as $attr ) {
$node->removeAttributeNode( $attr );
if ( isset( $this->args['mutation_callback'], $attr->name ) ) {
call_user_func( $this->args['mutation_callback'], $node, 'removed_attr', $attr->name );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe AMP_Mutation_Util::ATTRIBUTE_REMOVED could be used here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this commit from above substitutes the constant.

if ( method_exists( $node, 'removeAttribute' ) ) {
$node->removeAttribute( $attribute );
if ( isset( $this->args['mutation_callback'] ) ) {
call_user_func( $this->args['mutation_callback'], $node, 'removed_attr', $attribute );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe AMP_Mutation_Util::ATTRIBUTE_REMOVED could be used here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this commit from above substitutes the constant.

if ( method_exists( $child->parentNode, 'removeChild' ) ) { // phpcs:ignore WordPress.NamingConventions.ValidVariableName.NotSnakeCaseMemberVar.
$child->parentNode->removeChild( $child ); // phpcs:ignore WordPress.NamingConventions.ValidVariableName.NotSnakeCaseMemberVar.
if ( isset( $this->args['mutation_callback'] ) ) {
call_user_func( $this->args['mutation_callback'], $child, 'removed' );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe AMP_Mutation_Util::NODE_REMOVED could be used here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this commit uses that constant instead of the string.

/**
* The argument if an attribute was removed.
*
* @const array.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For constants the @var tag must be used for the PHPDoc to be valid. This comment applies in multiple constants in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this commit changes the tags to @var.

* Tracks when a sanitizer removes an attribute or node.
*
* @param array $histogram The count of attributes or nodes removed.
* @param string $key The attribute or node name removed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a wonder Travis didn't pickup alignment comments miss alignment, it might only sniff the variables alignment actually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also align the comments, like:

@param array  $histogram The count of attributes or nodes removed.
@param string $key       The attribute or node name removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit aligns the spacing of the comments. The pre-commit hook blocked aligning this in two places, possibly because there was only one @param and a @return.

* @return void.
*/
public static function display_error() {
$error = isset( $_GET[ self::ERROR_QUERY_KEY ] ) ? sanitize_text_field( wp_unslash( $_GET[ self::ERROR_QUERY_KEY ] ) ) : ''; // WPCS: CSRF ok.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compliance sake, it would be good to check nonce here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, @ThierryA. This commit adds nonce verification, via check_admin_referer().

*
* @since 0.7
*/
class AMP_Mutation_Utils {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does Mutation stands for here, it could because I am native French but AMP_Validation_Utils would make more sense to me.

Copy link
Contributor

@kienstra kienstra Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that AMP_Validation_Utils would be better. Mutation refers to the mutation_callback, which tracks when the sanitizer removes an attribute or node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit changes the name of the classes.

*
* @since 0.7
*/
class AMP_Mutation_Utils {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an architectural perspective, it is interesting that all methods are public static. Are they really all meant to but used in that purposes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be much better to not have all of these methods static. This might mean bootstrapping the class somewhere else.

It's currently calling AMP_Validation_Utils::init(); here in amp.php. It might help if we could instantiate the class somewhere, like:

$validation_utils = new AMP_Validation_Utils();
validation_utils->init();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amp_post_meta_box() might be a good model for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this is not really a pressing need or a blocker though, it doesn't have to be addressed in this PR.

Ryan Kienstra added 5 commits February 8, 2018 15:02
As Thierry mentioned,
this is required for a valid PHPDoc.
This was previously 'AMP_Mutation_Utils'
The new name describes better what this does.
On Thierry's suggestion,
As these were already stored in constants.
Use check_admin_referer(),
as this will display the 'are you sure' message.
Also , update the test.
In PHPDoc blocks, most of the comments weren't aligned.
The types aren't aligned.
@kienstra
Copy link
Contributor

kienstra commented Feb 8, 2018

Applied Code Review Suggestions

Hi @ThierryA,
Thanks for waiting for this. All of your suggestions above are applied, exception for the (good) question about whether all of the methods should be static.

* Skip reporting iframe removal when merely being moved
* Skip reporting removal of form[action] attribute when transformed to action-xhr.
* Rename sanitizer base methods to make explicit they are for removal of invalid nodes.
*
* @var array.
*/
public static $removed_nodes;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this to removed_elements since that is what it contains.

*/
public static function init() {
add_action( 'rest_api_init', array( __CLASS__, 'amp_rest_validation' ) );
add_action( 'save_post', array( __CLASS__, 'validate_content' ), 10, 2 );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this in the save_post action, let's do this right inside the edit_form_top so that when the post is loaded they will see the notification. In this way they will be able to activate the AMP plugin and then be informed before they even start editing a post that it has invalid elements.

This would then eliminate the need for the nonce in the request.

*/
public static function validate_content( $post_id, $post ) {
unset( $post_id );
if ( ! self::is_authorized() ) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to check if post_supports_amp( $post ). Because if not, the user wouldn't want a warning.

@@ -752,7 +752,9 @@ public static function start_output_buffering() {
* @see AMP_Theme_Support::start_output_buffering()
*/
public static function finish_output_buffering() {
echo self::prepare_response( ob_get_clean() ); // WPCS: xss ok.
$output = self::prepare_response( ob_get_clean() );
AMP_Validation_Utils::add_header();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to remove thus until we have it being utilized in the plugin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's fine.

Copy link
Contributor

@kienstra kienstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

Hi @westonruter,
This pull request is approved. Thanks a lot for improving this so much. The notices of which tags and attributes were removed really helps.

@westonruter westonruter merged commit 7847f0e into develop Feb 9, 2018
@westonruter westonruter deleted the add/sanitization-reporting branch February 9, 2018 02:49
@westonruter westonruter mentioned this pull request Feb 10, 2018
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants