Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sanitization reporting #912

Merged
merged 43 commits into from
Feb 9, 2018
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
0a44284
Add initial callback for listening for element/attribute removal in w…
westonruter Jan 27, 2018
e9f394a
Issue #843: Tracking for removed nodes and attributes.
Jan 28, 2018
d7104b8
Issue #843: Correct a failed Travis build by excluding a PHPCS rule.
Jan 28, 2018
616262f
Issue #843: Add a method to process markup for AMP validtity.
Jan 28, 2018
4714062
Issue #843: Track removed iframes in a helper method.
Jan 28, 2018
e209005
Issue #843: Initial registration of the REST endpoint for validation.
Jan 29, 2018
aed76c6
Issue #864: Support <amp-carousel> in 'Gallery' widget.
Jan 30, 2018
30c666f
Issue #843: Report removed attributes and nodes in a histogram.
Jan 30, 2018
76b0f17
Revert "Issue #864: Support <amp-carousel> in 'Gallery' widget."
Jan 30, 2018
0b7c3fc
Issue #843: Align equals signs vertically.
Jan 30, 2018
71744e5
Issue #843: Prepare to add headers to frontend GET requests.
Jan 30, 2018
da408bf
Issue #843: Align an equals sign to correct the failed Travis build.
Jan 30, 2018
ab3909a
Issue #864: Validation data in the response header.
Feb 1, 2018
90090b0
Issue #864: Remove an extra conditional, nest the 'mutation_callback.'
Feb 1, 2018
83dac87
Issue #864: Rename function to finish_output_buffering().
Feb 1, 2018
70521ef
Issue #843: Remove the extra variabl in the @return tag.
Feb 1, 2018
cc4fe85
Issue #843: Add processed markup to REST API response.
Feb 1, 2018
0b6a4ee
Issue #843: Output the 'processed_markup' at the bottom of the response.
Feb 1, 2018
db49a33
Issue #843: Merge in 'develop' branch and resolve conflicts.
Feb 1, 2018
c92bee4
Issue #843: Apply validation to post update on wp-admin/post.php.
Feb 1, 2018
6b5593b
Issue #843: Verify the nonce before validating on 'save_post'.
Feb 1, 2018
a168a5c
Issue #843: Sanitize $_GET value in addition to the 'ignore' comment.
Feb 1, 2018
a52ae05
Issue #843: Correct Travis error by changing error text.
Feb 1, 2018
0472a33
Issue #843: Report node removal in the rest of the sanitizers.
Feb 4, 2018
6d2350f
Merge branch 'develop' of https://github.com/Automattic/amp-wp into a…
westonruter Feb 5, 2018
3de38b8
Issue #843: Call the 'mutation_callback' for more attribute removals.
Feb 5, 2018
46e7084
Issue #843: Merge in develop, resolve conflicts.
Feb 5, 2018
972ac2c
Issue #843: Merge in develop again, resolve conflicts.
Feb 7, 2018
d2ab9ee
Issue #843: Rename test to 'get_buffer' instead of 'prepare_response'.
Feb 7, 2018
cb0cfc9
Issue #843: Fix an issue in the error message.
Feb 7, 2018
084acaa
Issue #843: Revert renaming of methods, adjust unit tests.
Feb 8, 2018
13ab280
Issue #843: Remove special characters, update documentation.
Feb 8, 2018
f996673
Issue #843: Change @const to @var for constants.
Feb 8, 2018
f8aeca8
Issue #843: Rename class to 'AMP_Validation_Utils'
Feb 8, 2018
69317b8
Issue #843; Use constants instead of string literals.
Feb 8, 2018
583a6bc
Issue #843: Add nonce verification for the editor message.
Feb 8, 2018
cd910ff
Issue #843: Align comments in addition to variable names.
Feb 8, 2018
a6d071a
List the AMP-invalid elements and attributes that have been saved
westonruter Feb 8, 2018
b40729b
Only report mutations when node/attribute is removed due to invalidity
westonruter Feb 9, 2018
9954c11
Improve naming of callback and removal methods
westonruter Feb 9, 2018
15d9186
Remove add_header until use case is confirmed
westonruter Feb 9, 2018
bb7a175
Skip reporting removal of iframe's former parent node
westonruter Feb 9, 2018
29caa8a
Skip sanitization reporting when saving post that doesn't support AMP
westonruter Feb 9, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions amp.php
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ function amp_after_setup_theme() {
}

add_action( 'init', 'amp_init' );
add_action( 'rest_api_init', 'AMP_Mutation_Utils::amp_rest_validation' );
add_action( 'widgets_init', 'AMP_Theme_Support::register_widgets' );
add_action( 'admin_init', 'AMP_Options_Manager::register_settings' );
add_filter( 'amp_post_template_analytics', 'amp_add_custom_analytics' );
Expand Down
1 change: 1 addition & 0 deletions includes/class-amp-autoloader.php
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ class AMP_Autoloader {
'AMP_DOM_Utils' => 'includes/utils/class-amp-dom-utils',
'AMP_HTML_Utils' => 'includes/utils/class-amp-html-utils',
'AMP_Image_Dimension_Extractor' => 'includes/utils/class-amp-image-dimension-extractor',
'AMP_Mutation_Utils' => 'includes/utils/class-amp-mutation-utils',
'AMP_String_Utils' => 'includes/utils/class-amp-string-utils',
'AMP_WP_Utils' => 'includes/utils/class-amp-wp-utils',
'AMP_Widget_Archives' => 'includes/widgets/class-amp-widget-archives',
Expand Down
17 changes: 15 additions & 2 deletions includes/class-amp-theme-support.php
Original file line number Diff line number Diff line change
Expand Up @@ -506,13 +506,25 @@ public static function get_amp_component_scripts() {
* Start output buffering.
*/
public static function start_output_buffering() {
ob_start( array( __CLASS__, 'finish_output_buffering' ) );
ob_start( array( __CLASS__, 'finish_buffer_add_header' ) );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can leave this as finish_output_buffering I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this commit changes the name back to finish_output_buffering().

}

/**
* Finish output buffering.
* Get the result of the output buffering, and add a header.
*
* @todo Do this in shutdown instead of output buffering callback?
* @param string $output Buffered output.
* @return string Finalized output.
*/
public static function finish_buffer_add_header( $output ) {
$markup = self::finish_output_buffering( $output );
AMP_Mutation_Utils::add_header();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to limit this to only be added when a user specifically requests this additional information. Like there should be a nonce that must be present to authorize the reporting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @westonruter. That's a good idea to add the header only for users with a nonce.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied with a nonce, details to come shortly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMP_Mutation_Utils::add_header() now has a check for authorization via self::is_authorized(). That verifies the nonce and capability.

return $markup;
}

/**
* Finish output buffering.
*
* @global int $content_width
* @param string $output Buffered output.
* @return string Finalized output.
Expand All @@ -523,6 +535,7 @@ public static function finish_output_buffering( $output ) {
$dom = AMP_DOM_Utils::get_dom( $output );
$args = array(
'content_max_width' => ! empty( $content_width ) ? $content_width : AMP_Post_Template::CONTENT_MAX_WIDTH, // Back-compat.
'mutation_callback' => 'AMP_Mutation_Utils::track_removed',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above, this should be conditional based on whether an authorized nonce is present.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This is now only added if AMP_Mutation_Utils::is_authorized(). That checks for the nonce and capability.

);

$assets = AMP_Content_Sanitizer::sanitize_document( $dom, self::$sanitizer_classes, $args );
Expand Down
3 changes: 3 additions & 0 deletions includes/sanitizers/class-amp-form-sanitizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,9 @@ public function sanitize() {
}
} elseif ( 'post' === $method ) {
$node->removeAttribute( 'action' );
if ( isset( $this->args['mutation_callback'] ) ) {
call_user_func( $this->args['mutation_callback'], $node, 'removed_attr', 'action' );
}
if ( ! $xhr_action ) {
$node->setAttribute( 'action-xhr', $action_url );
} elseif ( 'http://' === substr( $xhr_action, 0, 7 ) ) {
Expand Down
27 changes: 24 additions & 3 deletions includes/sanitizers/class-amp-iframe-sanitizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ public function sanitize() {
* @see: https://github.com/ampproject/amphtml/issues/2261
*/
if ( empty( $new_attributes['src'] ) ) {
$node->parentNode->removeChild( $node );
$this->remove_child( $node );
continue;
}

Expand All @@ -95,11 +95,11 @@ public function sanitize() {
$parent_node->replaceChild( $new_node, $node );
} else {
// AMP does not like iframes in <p> tags.
$parent_node->removeChild( $node );
$this->remove_child( $node );
$parent_node->parentNode->insertBefore( $new_node, $parent_node->nextSibling );

if ( AMP_DOM_Utils::is_node_empty( $parent_node ) ) {
$parent_node->parentNode->removeChild( $parent_node );
$this->remove_child( $parent_node ); // phpcs:ignore WordPress.NamingConventions.ValidVariableName.NotSnakeCaseMemberVar.
}
}
}
Expand Down Expand Up @@ -192,4 +192,25 @@ private function build_placeholder( $parent_attributes ) {

return $placeholder_node;
}

/**
* Removes a child of a node.
*
* Also, calls the mutation callback for it.
* This tracks all the nodes that were removed.
*
* @since 0.7
*
* @param DOMNode $child The node to remove.
* @return void.
*/
public function remove_child( $child ) {
if ( method_exists( $child->parentNode, 'removeChild' ) ) { // phpcs:ignore WordPress.NamingConventions.ValidVariableName.NotSnakeCaseMemberVar.
$child->parentNode->removeChild( $child ); // phpcs:ignore WordPress.NamingConventions.ValidVariableName.NotSnakeCaseMemberVar.
if ( isset( $this->args['mutation_callback'] ) ) {
call_user_func( $this->args['mutation_callback'], $child, 'removed' );
}
}
}

}
3 changes: 3 additions & 0 deletions includes/sanitizers/class-amp-style-sanitizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,9 @@ private function collect_styles_recursive( $node ) {
$this->styles[ '.' . $class_name ] = $style;
}
$node->removeAttribute( 'style' );
if ( isset( $this->args['mutation_callback'] ) ) {
call_user_func( $this->args['mutation_callback'], $node, 'removed_attr', 'style' );
}
}
}

Expand Down
12 changes: 12 additions & 0 deletions includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -595,6 +595,9 @@ private function sanitize_disallowed_attributes_in_node( $node, $attr_spec_list

foreach ( $attrs_to_remove as $attr_name ) {
$node->removeAttribute( $attr_name );
if ( isset( $this->args['mutation_callback'] ) ) {
call_user_func( $this->args['mutation_callback'], $node, 'removed_attr', $attr_name );
}
}
}

Expand Down Expand Up @@ -701,6 +704,9 @@ private function delegated_sanitize_disallowed_attribute_values_in_node( $node,
$node->setAttribute( $attr_name, '' );
} else {
$node->removeAttribute( $attr_name );
if ( isset( $this->args['mutation_callback'] ) ) {
call_user_func( $this->args['mutation_callback'], $node, 'removed_attr', $attr_name );
}
}
}
}
Expand Down Expand Up @@ -1279,12 +1285,18 @@ private function remove_node( $node ) {
$parent = $node->parentNode;
if ( $node && $parent ) {
$parent->removeChild( $node );
if ( isset( $this->args['mutation_callback'] ) ) {
call_user_func( $this->args['mutation_callback'], $node, 'removed' );
}
}
while ( $parent && ! $parent->hasChildNodes() && 'body' !== $parent->nodeName ) {
$node = $parent;
$parent = $parent->parentNode;
if ( $parent ) {
$parent->removeChild( $node );
if ( isset( $this->args['mutation_callback'] ) ) {
call_user_func( $this->args['mutation_callback'], $node, 'removed' );
}
}
}
}
Expand Down
3 changes: 3 additions & 0 deletions includes/sanitizers/class-amp-video-sanitizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,9 @@ public function sanitize() {
*/
if ( 0 === $new_node->childNodes->length && empty( $new_attributes['src'] ) ) {
$node->parentNode->removeChild( $node );
if ( isset( $this->args['mutation_callback'] ) ) {
call_user_func( $this->args['mutation_callback'], $node, 'removed' );
}
} else {
$node->parentNode->replaceChild( $new_node, $node );
}
Expand Down
212 changes: 212 additions & 0 deletions includes/utils/class-amp-mutation-utils.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
<?php
/**
* Class AMP_Mutation_Utils
*
* @package AMP
*/

/**
* Class AMP_Mutation_Utils
*
* @since 0.7
*/
class AMP_Mutation_Utils {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does Mutation stands for here, it could because I am native French but AMP_Validation_Utils would make more sense to me.

Copy link
Contributor

@kienstra kienstra Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that AMP_Validation_Utils would be better. Mutation refers to the mutation_callback, which tracks when the sanitizer removes an attribute or node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit changes the name of the classes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an architectural perspective, it is interesting that all methods are public static. Are they really all meant to but used in that purposes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be much better to not have all of these methods static. This might mean bootstrapping the class somewhere else.

It's currently calling AMP_Validation_Utils::init(); here in amp.php. It might help if we could instantiate the class somewhere, like:

$validation_utils = new AMP_Validation_Utils();
validation_utils->init();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amp_post_meta_box() might be a good model for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this is not really a pressing need or a blocker though, it doesn't have to be addressed in this PR.


/**
* The argument if an attribute was removed.
*
* @var array.
*/
const ATTRIBUTE_REMOVED = 'removed_attr';

/**
* The argument if a node was removed.
*
* @var array.
*/
const NODE_REMOVED = 'removed';

/**
* Key for the markup value in the REST API endpoint.
*
* @var string.
*/
const MARKUP_KEY = 'markup';

/**
* The attributes that the sanitizer removed.
*
* @var array.
*/
public static $removed_attributes;

/**
* The nodes that the sanitizer removed.
*
* @var array.
*/
public static $removed_nodes;

/**
* Tracks when a sanitizer removes an attribute or node.
*
* @param DOMNode|DOMElement $node The node in which there was a removal.
* @param string $removal_type The removal: 'removed_attr' for an attribute, or 'removed' for a node or element.
* @param string $attr_name The name of the attribute removed (optional).
* @return void.
*/
public static function track_removed( $node, $removal_type, $attr_name = null ) {
if ( ( self::ATTRIBUTE_REMOVED === $removal_type ) && isset( $attr_name ) ) { // phpcs:ignore WordPress.NamingConventions.ValidVariableName.NotSnakeCaseMemberVar
self::$removed_attributes = self::increment_count( self::$removed_attributes, $attr_name );
} elseif ( ( self::NODE_REMOVED === $removal_type ) && isset( $node->nodeName ) ) {
self::$removed_nodes = self::increment_count( self::$removed_nodes, $node->nodeName ); // phpcs:ignore WordPress.NamingConventions.ValidVariableName.NotSnakeCaseMemberVar
}
}

/**
* Tracks when a sanitizer removes an attribute or node.
*
* @param array $histogram The count of attributes or nodes removed.
* @param string $key The attribute or node name removed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a wonder Travis didn't pickup alignment comments miss alignment, it might only sniff the variables alignment actually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also align the comments, like:

@param array  $histogram The count of attributes or nodes removed.
@param string $key       The attribute or node name removed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit aligns the spacing of the comments. The pre-commit hook blocked aligning this in two places, possibly because there was only one @param and a @return.

* @return array $histogram The incremented histogram.
*/
public static function increment_count( $histogram, $key ) {
$current_value = isset( $histogram[ $key ] ) ? $histogram[ $key ] : 0;
$histogram[ $key ] = $current_value + 1;
return $histogram;
}

/**
* Gets whether a node was removed in a sanitizer.
*
* @return boolean.
*/
public static function was_node_removed() {
return ! empty( self::$removed_nodes );
}

/**
* Processes markup, to determine AMP validity.
*
* Passes $markup through the AMP sanitizers.
* Also passes a 'mutation_callback' to keep track of stripped attributes and nodes.
*
* @param string $markup The markup to process.
* @return void.
*/
public static function process_markup( $markup ) {
$args = array(
'content_max_width' => ! empty( $content_width ) ? $content_width : AMP_Post_Template::CONTENT_MAX_WIDTH,
'mutation_callback' => 'AMP_Mutation_Utils::track_removed',
);
AMP_Content_Sanitizer::sanitize( $markup, amp_get_content_sanitizers(), $args );
}

/**
* Registers the REST API endpoint for validation.
*
* @return void.
*/
public static function amp_rest_validation() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't currently used, though it would be helpful for Gutenberg block validation.

register_rest_route( 'amp-wp/v1', '/validate', array(
'methods' => 'POST',
'callback' => array( __CLASS__, 'validate_markup' ),
'args' => array(
self::MARKUP_KEY => array(
'validate_callback' => array( __CLASS__, 'validate_arg' ),
),
),
'permission_callback' => array( __CLASS__, 'permission' ),
) );
}

/**
* The permission callback for the REST request.
*
* @return boolean $has_permission Whether the current user has the permission.
*/
public static function permission() {
return current_user_can( 'edit_posts' );
}

/**
* Validate the markup passed to the REST API.
*
* @param WP_REST_Request $request The REST request.
* @return array|WP_Error.
*/
public static function validate_markup( WP_REST_Request $request ) {
$json = $request->get_json_params();
if ( empty( $json[ self::MARKUP_KEY ] ) ) {
return new WP_Error( 'no_markup', 'No markup passed to validator', array(
'status' => 404,
) );
}

return self::get_response( $json[ self::MARKUP_KEY ] );
}

/**
* Gets the AMP validation response.
*
* If $markup isn't passed,
* It will return the validation errors the sanitizers found in rendering the page.
*
* @param string $markup To validate for AMP compatibility (optional).
* @return array $response The AMP validity of the markup.
*/
public static function get_response( $markup = null ) {
if ( isset( $markup ) ) {
self::process_markup( $markup );
}
$response = array(
'has_error' => self::was_node_removed(),
'removed_nodes' => self::$removed_nodes,
'removed_attributes' => self::$removed_attributes,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful if the processed markup were also returned here. It could then be used for previewing, for example.

Copy link
Contributor

@kienstra kienstra Feb 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @westonruter, that would help. Do you think we should return the processed markup only if we're validating a limited amount of markup, like a single Gutenberg block? We could detect this by whether get_response() is called with a $markup argument.

if ( isset( $markup ) ) {
        $response['processed_markup'] = $markup;
};

If get_response() is called without a $markup argument, it's validating the entire document.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if markup is supplied to validate, then it should get returned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @westonruter.

These commits output the 'processed_markup', when $markup is passed to the function.
request-postman-validate

);
self::reset_removed();
return $response;
}

/**
* Reset the stored removed nodes and attributes.
*
* After testing if the markup is valid,
* these static values will remain.
* So reset them in case another test is needed.
*
* @return void.
*/
public static function reset_removed() {
self::$removed_nodes = null;
self::$removed_attributes = null;
}

/**
* Validate the argument in the REST API request.
*
* It would be ideal to simply pass 'is_string' in register_rest_route().
* But it always returned false.
*
* @param mixed $arg The argument to validate.
* @return boolean $is_valid Whether the argument is valid.
*/
public static function validate_arg( $arg ) {
return is_string( $arg );
}

/**
* Output AMP validation data in the response header of a frontend GET request.
*
* This must be called before the document output begins.
* Because the document is buffered,
* The sanitizers run after the 'send_headers' action.
* So it's not possible to call this function on that hook.
*
* @return void. $header The filtered response header.
*/
public static function add_header() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't actually used yet. I think it was a foundation for validating plugins on activation (#842)

header( sprintf( 'AMP-Validation-Error: %s', wp_json_encode( self::get_response() ) ) );
}

}
3 changes: 3 additions & 0 deletions phpcs.xml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@
</properties>
</rule>

<rule ref="WordPress.Files.FileName.InvalidClassFileName">
<exclude-pattern>tests/*</exclude-pattern>
</rule>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost all of the test files begin with test-, while this rule states that they should begin with class-.

<rule ref="WordPress.Arrays.MultipleStatementAlignment.LongIndexSpaceBeforeDoubleArrow">
<exclude-pattern>tests/test-tag-and-attribute-sanitizer.php</exclude-pattern>
</rule>
Expand Down
Loading