Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser: Remove specific support for <!--more--> tag #5061

Merged
merged 2 commits into from
Feb 24, 2018

Conversation

mcsf
Copy link
Contributor

@mcsf mcsf commented Feb 14, 2018

Description

Fixes #2973. This pull request:

  • removes the exceptions to the block parser (<!--more--> and <!--teaser-->), thus keeping the parser "Gutenberg-focus", as @pento put it;
  • treats the More block more or less as any other block, meaning it surrounds the more and optional noteaser tags with regular comment demarcations;
  • retains backwards compatibilty through regular block conversion mechanisms (from freeform via "Convert to Blocks");
  • paves the way for implementing a Pagination (<!--nextpage-->; see Introduce a Next Page Block to allow post pagination #4930) block without introducing further parser exceptions.

How Has This Been Tested?

Pasting hasn't yet been tested. For conversion of legacy content, try:

  1. Using the classic editor, build a post (a lorem ipsum with six paragraphs should suffice). Sprinkle it with the following:
<!--more-->

and

<!--more-->
<!--noteaser-->

and

<!--more Read all about it!-->
<!--noteaser-->
  1. Save it as a draft.
  2. Open the post in Gutenberg.
  3. Find the freeform (Classic) blocks corresponding to the more tags.
  4. Convert them to blocks (advanced block settings > convert to blocks).
  5. Make sure the new More blocks feature the right attributes (customText and noTeaser). The customText attribute can be observed by looking at the text on the More block's visual representation; the noTeaser attribute can be observed with the inspector controls.

Caveat

  • rawHandler's transformers will get rid of HTML comments, and will attempt to merge or eliminate many elements (e.g. paragraphs, line breaks). In order to get around this without making special exceptions for certain comments, commentRemover will detect more and noteaser comments and replace them with a custom element <wp-block>. This element has no particular definition, is never rendered, and is solely used in order to pass block information through rawHandler. The More block's raw transform will then match against its shape to replace it with a proper block.
  • commentRemover happened to be the place where these operations were initially introduced, but the transformer can be split in two.
  • The unit tests for the serializer were kept and amended but, as explained in a comment, they should probably be removed. Right now, they illustrate the overall changes of this PR.

Screenshots (jpeg or gifs if applicable):

Types of changes

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code has proper inline documentation.

@mcsf mcsf added [Feature] Blocks Overall functionality of blocks [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f labels Feb 14, 2018
@mcsf mcsf requested review from pento and mtias February 14, 2018 19:24
@mcsf mcsf force-pushed the remove/parser-more-and-noteaser-exception branch from de3b48e to 68aa510 Compare February 15, 2018 13:20
@mcsf mcsf requested a review from dmsnell February 15, 2018 16:56
@@ -8,5 +8,51 @@ export default function( node ) {
return;
}

if ( node.nodeValue.indexOf( 'more' ) === 0 ) {
// Grab any custom text in the comment
const matches = node.nodeValue.match( /more(.*)/ );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this RegExp is pretty much asking for trouble. could we whitelist characters instead of blacklisting them?

<!--moredanghtmlcomments-->

Copy link
Contributor Author

@mcsf mcsf Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for having a look. I'm not sure what you mean by whitelisting, though. FWIW, this is how core does it (permalink):

preg_match( '/<!--more(.*?)?-->/', $post, $matches )

Copy link
Member

@dmsnell dmsnell Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a valid more tag?

<!--moreYou should continue reading below!-->

The PEG didn't allow this - there had to be whitespace following more if there was custom text in there. If core allows it then we've been wrong so far. At least core was non-greedy on the match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is valid. :)

screen shot 2018-02-15 at 18 17 25

Agreed on the greed, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@dmsnell dmsnell Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, /more(.*)/ is redundant. it matches the same things that /more/ matches because both zero of everything and one of everything and many of everything is .*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zero of everything and one of everything and many of everything is .*

philosoraptor

/more(.*)/ is redundant

Fair enough. We already check for indexOf( 'more' ) === 0, so why don't we just grab the substring starting after 'more'? Then we can .trim() it. No RegExp needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodeValue.slice(4).trim()

@dmsnell
Copy link
Member

dmsnell commented Feb 15, 2018

Love this! Thank you!

@mcsf mcsf force-pushed the remove/parser-more-and-noteaser-exception branch 4 times, most recently from 857e08a to a1c273f Compare February 19, 2018 17:16
from: [
{
type: 'raw',
isMatch: ( node ) => node.dataset.block === 'core/more',
Copy link
Contributor

@youknowriad youknowriad Feb 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the dataset is used here and it's not a generic API right? it's just a specific attribute created by the special-comment-converter.js which is also specific to this bloc.

This makes me wonder if this converter's code shouldn't be written somewhere in this transform instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a generic API right? it's just a specific attribute created by the special-comment-converter.js which is also specific to this bloc.

Correct. It is a bit of a fake separation. However, I named it special-comment-converter instead of more-converter because I'm anticipating its use for <!--nextpage--> later.

This makes me wonder if this converter's code shouldn't be written somewhere in this transform instead?

Hm. Maybe I looked at it the wrong way, but basically dataset + custom tag was what I came up with to preserve the more tag and not let it get destroyed by the rest of rawHandler. This seems like a better trade-off than e.g. making exceptions in the other transformers to ignore more. What do you think? Also pinging @iseulde for ideas. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the separation is weird, but can't think of anything better right now... Hm or maybe? The raw handler itself is supposed to return clean HTML which blocks can hook into to transform. We already have shortcodes in this mix too though, but the transform is shortcode instead of raw. Maybe we could follow the same pattern and have a comment transform? If no block transforms the comment, then we should just drop the comment (not pass through raw transforms). What do you think? This removes the need for the special case comment converter.

* @param {Node} node The node to be processed.
* @return {void}
*/
export default function( node ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should unit test this or is it already covered elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some tests are in order 👍

Copy link
Contributor

@youknowriad youknowriad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 This surfaces this though #4958

@ellatrix
Copy link
Member

Okay, After looking a bit at the code, The approach in #5061 (comment) might be simpler too? Also no isMatch function needed.

@mcsf
Copy link
Contributor Author

mcsf commented Feb 21, 2018

Maybe we could follow the same pattern and have a comment transform?

This made sense to me initially, but less so after giving more thought. The main complexity is that, unlike with shortcode transforms, we don't have a 1:1 relationship between comments and blocks, due to the way more compounds with noteaser. That is, a transform interface like:

{ type: 'comment', /* isMatch, */, transform( commentNode ) { } }

isn't enough, since we need to look further than just the single node. Now, the transform could be clever and look at commentNode's siblings and selectively destroy them, but that doesn't seem like a sound approach—that is, I can expect functions within rawHandler's deepFilterHTML calls to be destructive in breadth, but not the handlers of individual block types.

My complementary question is: how many more comments can we expect to have to parse in Gutenberg to justify the introduction of a new public transform type? Right now we can only expect WordPress legacy markings: more, noteaser, nextpage; ideally, new block types will only care about proper HTML (raw transforms) and shortcodes (shortcode). I'd say that, if an author wants to add support for a very specific input format that includes complex shapes with HTML comments, then they'll need something more powerful, like a hook to access the filter sets of rawHandler, further discarding the need for a new comment-type transform. Does this reasoning make sense, @iseulde?

@mcsf mcsf force-pushed the remove/parser-more-and-noteaser-exception branch from a1c273f to 750422f Compare February 21, 2018 15:34
}

// Grab any custom text in the comment
const customText = node.nodeValue.slice( 4 ).trim();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amusing side note: this fixes a bug in the more handling currently in the parser, which requires a space between "more" and the custom text. Core doesn't require that space. 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed the previous discussion wherein @dmsnell learned yet another wonderful weird thing about Core. 🙃

Copy link
Member

@pento pento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 on this.

// Find the first ancestor to which the More element can be appended;
// appending to the closer P parents fails
let parent = node.parentNode;
while ( parent.nodeName.toLowerCase() === 'p' && parent.parentNode ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that the intention is to ensure that the more element is at the top level (to convert into blocks later), could we loop through parent nodes until the body node is the parent of that node?

while ( parent.nodeName !== 'BODY' ) {
     parent = parent.parentNode;
}

Also not completely sure what "appending to the closer P parents fails" means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that the intention is to ensure that the more element is at the top level (to convert into blocks later), could we loop through parent nodes until the body node is the parent of that node?

This is more straightforward, I like it.

Also not completely sure what "appending to the closer P parents fails" means.

The <wp-block> element couldn't be appended to any <p>; inspecting the parent would show that no children had been added.

}

function createMore( customText, noTeaser ) {
const node = document.createElement( 'wp-block' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting thing :) I wonder if we can also do this for shortcakes... Might enable us to get rid of the "pieces" of HTML and parse it all in one go.

@mcsf mcsf force-pushed the remove/parser-more-and-noteaser-exception branch from 750422f to f5db665 Compare February 23, 2018 11:39
@mcsf
Copy link
Contributor Author

mcsf commented Feb 23, 2018

I'll wait until we have #4958 fixed before merging this one. Thanks for the reviews!


Worth pointing out that, in an impromptu discussion with @mtias, it was brought up that in the long run the <!-- wp:more --/> block shouldn't need to contain the legacy comments, as it itself would be the primitive that WordPress would understand for partitioning a post. However, core WP doesn't really offer the needed hooks to change that, so this would have to be handled in one of two ways:

  1. Wait for the merge into core to change core itself.
  2. Add Gutenberg-only support for <!-- wp:more --/> on the server—different approaches are possible, from adding a specific rule to handle that block (pseudo-block?), to getting creative with More's render_callback (which I'm not convinced is good).

I'm leaning towards number 1.

@mcsf mcsf force-pushed the remove/parser-more-and-noteaser-exception branch from f5db665 to f30caee Compare February 23, 2018 11:55
- No exceptions in serializer either
- Use dedicated rawHandling transformer for `more`, `noteaser`
- Add tests
@mcsf mcsf force-pushed the remove/parser-more-and-noteaser-exception branch from 5876177 to de32975 Compare February 24, 2018 12:39
@mcsf mcsf merged commit c42086f into master Feb 24, 2018
@mcsf mcsf deleted the remove/parser-more-and-noteaser-exception branch February 24, 2018 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Blocks Overall functionality of blocks [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parser: Move support for the <!--more--> tag out of the parser
5 participants