Parser: Remove specific support for  tag #5061

mcsf · 2018-02-14T19:24:23Z

Description

Fixes #2973. This pull request:

removes the exceptions to the block parser ( and ), thus keeping the parser "Gutenberg-focus", as @pento put it;
treats the More block more or less as any other block, meaning it surrounds the more and optional noteaser tags with regular comment demarcations;
retains backwards compatibilty through regular block conversion mechanisms (from freeform via "Convert to Blocks");
paves the way for implementing a Pagination (; see Introduce a Next Page Block to allow post pagination #4930) block without introducing further parser exceptions.

How Has This Been Tested?

Pasting hasn't yet been tested. For conversion of legacy content, try:

Using the classic editor, build a post (a lorem ipsum with six paragraphs should suffice). Sprinkle it with the following:

<!--more-->

and

<!--more-->
<!--noteaser-->

and

<!--more Read all about it!-->
<!--noteaser-->

Save it as a draft.
Open the post in Gutenberg.
Find the freeform (Classic) blocks corresponding to the more tags.
Convert them to blocks (advanced block settings > convert to blocks).
Make sure the new More blocks feature the right attributes (customText and noTeaser). The customText attribute can be observed by looking at the text on the More block's visual representation; the noTeaser attribute can be observed with the inspector controls.

Caveat

rawHandler's transformers will get rid of HTML comments, and will attempt to merge or eliminate many elements (e.g. paragraphs, line breaks). In order to get around this without making special exceptions for certain comments, commentRemover will detect more and noteaser comments and replace them with a custom element <wp-block>. This element has no particular definition, is never rendered, and is solely used in order to pass block information through rawHandler. The More block's raw transform will then match against its shape to replace it with a proper block.
commentRemover happened to be the place where these operations were initially introduced, but the transformer can be split in two.
The unit tests for the serializer were kept and amended but, as explained in a comment, they should probably be removed. Right now, they illustrate the overall changes of this PR.

Screenshots (jpeg or gifs if applicable):

Types of changes

Checklist:

My code is tested.
My code follows the WordPress code style.
My code has proper inline documentation.

dmsnell · 2018-02-15T18:09:23Z

blocks/api/raw-handling/comment-remover.js

@@ -8,5 +8,51 @@ export default function( node ) {
 		return;
 	}

+	if ( node.nodeValue.indexOf( 'more' ) === 0 ) {
+		// Grab any custom text in the comment
+		const matches = node.nodeValue.match( /more(.*)/ );


this RegExp is pretty much asking for trouble. could we whitelist characters instead of blacklisting them?



Thanks for having a look. I'm not sure what you mean by whitelisting, though. FWIW, this is how core does it (permalink):

preg_match( '//', $post, $matches )

Is this a valid more tag?



The PEG didn't allow this - there had to be whitespace following more if there was custom text in there. If core allows it then we've been wrong so far. At least core was non-greedy on the match.

It is valid. :)

Agreed on the greed, though.

also, /more(.*)/ is redundant. it matches the same things that /more/ matches because both zero of everything and one of everything and many of everything is .*

zero of everything and one of everything and many of everything is .*

/more(.*)/ is redundant

Fair enough. We already check for indexOf( 'more' ) === 0, so why don't we just grab the substring starting after 'more'? Then we can .trim() it. No RegExp needed.

nodeValue.slice(4).trim()

dmsnell · 2018-02-15T18:10:33Z

Love this! Thank you!

youknowriad · 2018-02-20T07:42:20Z

blocks/library/more/index.js

+		from: [
+			{
+				type: 'raw',
+				isMatch: ( node ) => node.dataset.block === 'core/more',


So the dataset is used here and it's not a generic API right? it's just a specific attribute created by the special-comment-converter.js which is also specific to this bloc.

This makes me wonder if this converter's code shouldn't be written somewhere in this transform instead?

it's not a generic API right? it's just a specific attribute created by the special-comment-converter.js which is also specific to this bloc.

Correct. It is a bit of a fake separation. However, I named it special-comment-converter instead of more-converter because I'm anticipating its use for  later.

This makes me wonder if this converter's code shouldn't be written somewhere in this transform instead?

Hm. Maybe I looked at it the wrong way, but basically dataset + custom tag was what I came up with to preserve the more tag and not let it get destroyed by the rest of rawHandler. This seems like a better trade-off than e.g. making exceptions in the other transformers to ignore more. What do you think? Also pinging @iseulde for ideas. :)

I agree that the separation is weird, but can't think of anything better right now... Hm or maybe? The raw handler itself is supposed to return clean HTML which blocks can hook into to transform. We already have shortcodes in this mix too though, but the transform is shortcode instead of raw. Maybe we could follow the same pattern and have a comment transform? If no block transforms the comment, then we should just drop the comment (not pass through raw transforms). What do you think? This removes the need for the special case comment converter.

youknowriad · 2018-02-20T07:44:18Z

blocks/api/raw-handling/special-comment-converter.js

+ * @param {Node} node The node to be processed.
+ * @return {void}
+ */
+export default function( node ) {


Do you think we should unit test this or is it already covered elsewhere?

I think some tests are in order 👍

youknowriad

LGTM 👍 This surfaces this though #4958

ellatrix · 2018-02-20T11:08:35Z

Okay, After looking a bit at the code, The approach in #5061 (comment) might be simpler too? Also no isMatch function needed.

mcsf · 2018-02-21T15:14:48Z

Maybe we could follow the same pattern and have a comment transform?

This made sense to me initially, but less so after giving more thought. The main complexity is that, unlike with shortcode transforms, we don't have a 1:1 relationship between comments and blocks, due to the way more compounds with noteaser. That is, a transform interface like:

{ type: 'comment', /* isMatch, */, transform( commentNode ) { } }

isn't enough, since we need to look further than just the single node. Now, the transform could be clever and look at commentNode's siblings and selectively destroy them, but that doesn't seem like a sound approach—that is, I can expect functions within rawHandler's deepFilterHTML calls to be destructive in breadth, but not the handlers of individual block types.

My complementary question is: how many more comments can we expect to have to parse in Gutenberg to justify the introduction of a new public transform type? Right now we can only expect WordPress legacy markings: more, noteaser, nextpage; ideally, new block types will only care about proper HTML (raw transforms) and shortcodes (shortcode). I'd say that, if an author wants to add support for a very specific input format that includes complex shapes with HTML comments, then they'll need something more powerful, like a hook to access the filter sets of rawHandler, further discarding the need for a new comment-type transform. Does this reasoning make sense, @iseulde?

pento · 2018-02-22T01:49:47Z

blocks/api/raw-handling/special-comment-converter.js

+	}
+
+	// Grab any custom text in the comment
+	const customText = node.nodeValue.slice( 4 ).trim();


Amusing side note: this fixes a bug in the more handling currently in the parser, which requires a space between "more" and the custom text. Core doesn't require that space. 🙂

Oh, I missed the previous discussion wherein @dmsnell learned yet another wonderful weird thing about Core. 🙃

pento

👍🏻 on this.

ellatrix · 2018-02-22T15:32:12Z

blocks/api/raw-handling/special-comment-converter.js

+	// Find the first ancestor to which the More element can be appended;
+	// appending to the closer P parents fails
+	let parent = node.parentNode;
+	while ( parent.nodeName.toLowerCase() === 'p' && parent.parentNode ) {


Assuming that the intention is to ensure that the more element is at the top level (to convert into blocks later), could we loop through parent nodes until the body node is the parent of that node?

while ( parent.nodeName !== 'BODY' ) { parent = parent.parentNode; }

Also not completely sure what "appending to the closer P parents fails" means.

Assuming that the intention is to ensure that the more element is at the top level (to convert into blocks later), could we loop through parent nodes until the body node is the parent of that node?

This is more straightforward, I like it.

Also not completely sure what "appending to the closer P parents fails" means.

The <wp-block> element couldn't be appended to any <p>; inspecting the parent would show that no children had been added.

ellatrix · 2018-02-22T15:44:01Z

blocks/api/raw-handling/special-comment-converter.js

+}
+
+function createMore( customText, noTeaser ) {
+	const node = document.createElement( 'wp-block' );


Interesting thing :) I wonder if we can also do this for shortcakes... Might enable us to get rid of the "pieces" of HTML and parse it all in one go.

mcsf · 2018-02-23T11:48:55Z

I'll wait until we have #4958 fixed before merging this one. Thanks for the reviews!

Worth pointing out that, in an impromptu discussion with @mtias, it was brought up that in the long run the <!-- wp:more --/> block shouldn't need to contain the legacy comments, as it itself would be the primitive that WordPress would understand for partitioning a post. However, core WP doesn't really offer the needed hooks to change that, so this would have to be handled in one of two ways:

Wait for the merge into core to change core itself.
Add Gutenberg-only support for <!-- wp:more --/> on the server—different approaches are possible, from adding a specific rule to handle that block (pseudo-block?), to getting creative with More's render_callback (which I'm not convinced is good).

I'm leaning towards number 1.

- No exceptions in serializer either - Use dedicated rawHandling transformer for `more`, `noteaser` - Add tests

mcsf added [Feature] Blocks Overall functionality of blocks [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f labels Feb 14, 2018

mcsf requested review from pento and mtias February 14, 2018 19:24

mcsf mentioned this pull request Feb 14, 2018

Parser: Move support for the  tag out of the parser #2973

Closed

mcsf force-pushed the remove/parser-more-and-noteaser-exception branch from de3b48e to 68aa510 Compare February 15, 2018 13:20

mcsf requested a review from dmsnell February 15, 2018 16:56

dmsnell reviewed Feb 15, 2018

View reviewed changes

mcsf force-pushed the remove/parser-more-and-noteaser-exception branch 4 times, most recently from 857e08a to a1c273f Compare February 19, 2018 17:16

youknowriad reviewed Feb 20, 2018

View reviewed changes

youknowriad approved these changes Feb 20, 2018

View reviewed changes

mcsf force-pushed the remove/parser-more-and-noteaser-exception branch from a1c273f to 750422f Compare February 21, 2018 15:34

pento reviewed Feb 22, 2018

View reviewed changes

pento approved these changes Feb 22, 2018

View reviewed changes

ellatrix reviewed Feb 22, 2018

View reviewed changes

ellatrix approved these changes Feb 22, 2018

View reviewed changes

mcsf force-pushed the remove/parser-more-and-noteaser-exception branch from 750422f to f5db665 Compare February 23, 2018 11:39

mcsf force-pushed the remove/parser-more-and-noteaser-exception branch from f5db665 to f30caee Compare February 23, 2018 11:55

mcsf added 2 commits February 24, 2018 12:33

Parser: Remove support for 

d07da01

- No exceptions in serializer either - Use dedicated rawHandling transformer for `more`, `noteaser` - Add tests

Serializer: Remove More-specific tests

de32975

mcsf force-pushed the remove/parser-more-and-noteaser-exception branch from 5876177 to de32975 Compare February 24, 2018 12:39

mcsf merged commit c42086f into master Feb 24, 2018

mcsf deleted the remove/parser-more-and-noteaser-exception branch February 24, 2018 12:49

This was referenced Mar 1, 2018

Add pagination block #1467

Merged

More tag from classic editor displays warning in Gutenberg #3963

Closed

Content with only "More" block reloads as invalid #4038

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser: Remove specific support for  tag #5061

Parser: Remove specific support for  tag #5061

mcsf commented Feb 14, 2018

dmsnell Feb 15, 2018

mcsf Feb 15, 2018 •

edited

Loading

dmsnell Feb 15, 2018 •

edited

Loading

mcsf Feb 15, 2018

dmsnell Feb 15, 2018

dmsnell Feb 15, 2018 •

edited

Loading

mcsf Feb 15, 2018

dmsnell Feb 15, 2018

dmsnell commented Feb 15, 2018

youknowriad Feb 20, 2018 •

edited

Loading

mcsf Feb 20, 2018

ellatrix Feb 20, 2018

youknowriad Feb 20, 2018

mcsf Feb 20, 2018

youknowriad left a comment

ellatrix commented Feb 20, 2018

mcsf commented Feb 21, 2018

pento Feb 22, 2018

pento Feb 22, 2018

pento left a comment

ellatrix Feb 22, 2018

mcsf Feb 23, 2018

ellatrix Feb 22, 2018

mcsf commented Feb 23, 2018

Parser: Remove specific support for  tag #5061

Parser: Remove specific support for  tag #5061

Conversation

mcsf commented Feb 14, 2018

Description

How Has This Been Tested?

Caveat

Screenshots (jpeg or gifs if applicable):

Types of changes

Checklist:

Choose a reason for hiding this comment

mcsf Feb 15, 2018 • edited Loading

Choose a reason for hiding this comment

dmsnell Feb 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmsnell Feb 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmsnell commented Feb 15, 2018

youknowriad Feb 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

youknowriad left a comment

Choose a reason for hiding this comment

ellatrix commented Feb 20, 2018

mcsf commented Feb 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pento left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcsf commented Feb 23, 2018

mcsf Feb 15, 2018 •

edited

Loading

dmsnell Feb 15, 2018 •

edited

Loading

dmsnell Feb 15, 2018 •

edited

Loading

youknowriad Feb 20, 2018 •

edited

Loading