Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mustache templates for tables using script[type=text/plain] are corrupted #4254

Closed
westonruter opened this issue Feb 7, 2020 · 8 comments · Fixed by #4333 or #4521
Closed

Mustache templates for tables using script[type=text/plain] are corrupted #4254

westonruter opened this issue Feb 7, 2020 · 8 comments · Fixed by #4333 or #4521
Labels
Bug Something isn't working QA passed Has passed QA and is done
Milestone

Comments

@westonruter
Copy link
Member

westonruter commented Feb 7, 2020

Bug Description

As reported on a support forum topic by @spasibych, a workaround for Mustache templates for tables is broken by the DOM parser in the AMP plugin.

Please refer to the amp.dev docs on Mustache templates for tables.

In particular, given this input:

<script type="text/plain" template="amp-mustache">
  <table>
    <tr>
{{#foo}}<td></td>{{/foo}}
    </tr>
  </table>
</script>

The AMP plugin is outputting this as:

<script type="text/plain" template="amp-mustache">
  <table>
    <tr>
{{#foo}}<td>{{/foo}}		
</script>

However, the use of a script to contain the template is one of the workarounds mentioned:

Workarounds include wrapping Mustache sections in HTML comments (e.g. <!-- {{#bar}} -->), using non-table elements like <div> instead or using a <script type="text/plain"> tag to define your templates.

It turns out that that the HTML comment workaround does work. Given this input:

<template type="amp-mustache">
  <table>
    <tr>
<!--{{#foo}}--><td></td><!--{{/foo}}-->
    </tr>
  </table>
</template>

The AMP plugin makes no modifications in the output.

Expected Behaviour

The mustache template for a table using the <script type="text/plain"> workaround should work.

Steps to reproduce

  1. Create a Custom HTML block with the above <script type="text/plain" template="amp-mustache">...</script> element.
  2. View the AMP page.
  3. Notice that the closing tags are dropped.

Screenshots

Additional context

  • WordPress version:
  • Plugin version:
  • Gutenberg plugin version (if applicable):
  • AMP plugin template mode:
  • PHP version:
  • OS:
  • Browser: [e.g. chrome, safari]
  • Device: [e.g. iPhone6]

Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

Implementation brief

As noted in ampproject/amphtml#26656 (comment):

We'll have to figure out some other mechanism to workaround libxml failing to parse the HTML. Something that comes to mind is doing a search/replace to convert <script type="text/plain" template="amp-mustache"> into some other element (e.g. tmp-script) during parsing, and then replace that placeholder with the original script once the DOM has been constructed.

This turns out to work: https://3v4l.org/0qq5S

<?php

$script = '
    <script type="text/plain" template="amp-mustache">
      <table>
        <tr>
            {{#foo}}<td></td>{{/foo}}
        </tr>
      </table>
    </script>
';

$dom = new DOMDocument();

$tmp_script = preg_replace( 
    '#<script(\s[^>]*template="amp-mustache"[^>]*)>(.*?)</script>#s', 
    '<tmp-script$1>$2</tmp-script>',
    $script
);

@$dom->loadHTML( '<!DOCTYPE html><html><head><meta charset="utf-8"></head><body>' . $tmp_script . '</body></html>' );

$tmp_script_elements = iterator_to_array( $dom->getElementsByTagName( 'tmp-script' ) );
foreach ( $tmp_script_elements as $tmp_script_element ) {
    $script = $dom->createElement( 'script' );
    foreach ( $tmp_script_element->attributes as $attr ) {
        $script->setAttribute( $attr->nodeName, $attr->nodeValue );
    }
    while ( $tmp_script_element->firstChild ) {
        $script->appendChild( $tmp_script_element->firstChild );
    }
    $tmp_script_element->parentNode->replaceChild( $script, $tmp_script_element );
}

echo $dom->saveHTML();

Note the regex for doing the search/replace will need to be hardened, in particular the regex for matching the attributes (in any order).

The logic should go into the Document class and called similarly to the maybe_replace_noscript_elements method:

amp-wp/src/Dom/Document.php

Lines 602 to 639 in ea28148

/**
* Maybe replace noscript elements with placeholders.
*
* This is done because libxml<2.8 might parse them incorrectly.
* When appearing in the head element, a noscript can cause the head to close prematurely
* and the noscript gets moved to the body and anything after it which was in the head.
* See <https://stackoverflow.com/questions/39013102/why-does-noscript-move-into-body-tag-instead-of-head-tag>.
* This is limited to only running in the head element because this is where the problem lies,
* and it is important for the AMP_Script_Sanitizer to be able to access the noscript elements
* in the body otherwise.
*
* @see maybe_restore_noscript_elements() Reciprocal function.
*
* @param string $html HTML string to adapt.
* @return string Adapted HTML string.
*/
private function maybe_replace_noscript_elements( $html ) {
if ( ! version_compare( LIBXML_DOTTED_VERSION, '2.8', '<' ) ) {
return $html;
}
return preg_replace_callback(
'#^.+?(?=<body)#is',
function ( $head_matches ) {
return preg_replace_callback(
'#<noscript[^>]*>.*?</noscript>#si',
function ( $noscript_matches ) {
$placeholder = sprintf( '<!--noscript:%s-->', (string) wp_rand() );
$this->noscript_placeholder_comments[ $placeholder ] = $noscript_matches[0];
return $placeholder;
},
$head_matches[0]
);
},
$html
);
}

Note that this replacement must be done immediately after parsing, since the Document will then need to have the script replaced to successfully go through the sanitizers.

QA testing instructions

Demo

Changelog entry

@westonruter westonruter added the Bug Something isn't working label Feb 7, 2020
@westonruter
Copy link
Member Author

However, the use of a script to contain the template is one of the workarounds mentioned:

Workarounds include wrapping Mustache sections in HTML comments (e.g. <!-- {{#bar}} -->), using non-table elements like <div> instead or using a <script type="text/plain"> tag to define your templates.

It turns out that that the HTML comment workaround does work.

Actually, it only works on origin. The Google AMP Cache and the AMP Optimizer both strip out the HTML comments. See ampproject/amphtml#26656.

@kienstra
Copy link
Contributor

Weston's Steps To Reproduce should be good for testing instructions.

@csossi
Copy link

csossi commented Mar 16, 2020

Verified in QA:
image

@csossi csossi added the QA passed Has passed QA and is done label Mar 16, 2020
@grapheon
Copy link

grapheon commented Apr 3, 2020

Bug is present with plugin version 1.5 (1.5.0 and 1.5.1).

@westonruter
Copy link
Member Author

Bizzare. We did fix this. But I'm not seeing it now.

@westonruter
Copy link
Member Author

@spasibych Here’s a 1.5.2-RC1 pre-release build with the fix for you to test: amp.zip (v1.5.2-RC1-20200403T193031Z-3102c5dea)

@westonruter
Copy link
Member Author

We've just published the 1.5.2 release to WordPress.org: https://wordpress.org/plugins/amp/

You can update now via the WordPress admin.

Release notes: https://github.com/ampproject/amp-wp/releases/tag/1.5.2

@grapheon
Copy link

grapheon commented Apr 3, 2020

@westonruter now everything is fine, thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working QA passed Has passed QA and is done
Projects
None yet
5 participants