Parser: Add new list of HTML fragments to parse output #11334

dmsnell · 2018-10-31T21:14:32Z

Attempt three at including positional information from the parse to enable isomorphic reconstruction of the source post_content after parsing.

See alternate attempts: #11082, #11309
Motivated by: #7247, #8760, Automattic/jetpack#10256
Enables: #10463, #10108

Abstract

Add new innerContent property to each block in parser output indicating where in the innerHTML each innerBlock was found.

Status

will update fixtures after design review indicates this is the desired approach
all parsers passing new tests for fragment behavior

Summary

Inner blocks, or nested blocks, or blocks-within-blocks, can exist in Gutenberg posts. They are serialized in post_content in place as normal blocks which exist in between another block's comment delimiters.

<!-- wp:outerBlock -->
Check out my
<!-- wp:voidInnerBlock /-->
and my other
<!-- wp:innerBlock -->
with its own content.
<!-- /wp:innerBlock -->
<!-- /wp:outerBlock -->

The way this gets parsed leaves us in a quandary: we cannot reconstruct the original post_content after parsing because we lose the origin location information for each inner block since they are only passed as an array of inner blocks.

{
	"blockName": "core/outerBlock",
	"attrs": {},
	"innerBlocks": [
		{
			"blockName": "core/voidInnerBlock",
			"attrs": {},
			"innerBlocks": [],
			"innerHTML": ""
		},
		{
			"blockName": "core/innerBlock",
			"attrs": {},
			"innerBlocks": [],
			"innerHTML": "\nwith its own content.\n"
		}
	],
	"innerHTML": "\nCheck out my\n\nand my other\n\n"
}

At this point we have parsed the blocks and prepared them for attaching into the JavaScript block code that interprets them but we have lost our reverse transformation.

In this PR I'd like to introduce a new mechanism which shouldn't break existing functionality but which will enable us to go back and forth isomorphically between the post_content and first stage of parsing. If we can tear apart a Gutenberg post and reassemble then it will let us to structurally-informed processing of the posts without needing to be aware of all the block JavaScript.

The proposed mechanism is a new property as a list of HTML fragments with null values interspersed between those fragments where the blocks were found.

{
	"blockName": "core/outerBlock",
	"attrs": {},
	"innerBlocks": [
		{
			"blockName": "core/voidInnerBlock",
			"attrs": {},
			"innerBlocks": [],
			"blockMarkers": [],
			"innerHTML": ""
		},
		{
			"blockName": "core/innerBlock",
			"attrs": {},
			"innerBlocks": [],
			"blockMarkers": [],
			"innerHTML": "\nwith its own content.\n"
		}
	],
	"innerHTML": "\nCheck out my\n\nand my other\n\n",
	"innerContent": [ "\nCheck out my\n", null, "\n and my other\n", null, "\n" ],
}

Doing this allows us to replace those null values with their associated block (sequentially) from innerBlocks.

Questions

Why not use a string token instead of an array?
- See WIP: Parser: Add tokens for inner blocks in inner HTML #11309. The fundamental problem with the token is that it could be valid content input from a person and so there's a probability that we would fail to split the content accurately.
Why add the null instead of leaving basic array splits like [ 'before', 'after' ]?
- By inspection we can see that without an explicit marker we don't know if the block came before or after or between array elements. We could add empty strings '' and say that blocks exist only between array elements but the parser code would have to be more complicated to make sure we appropriately add those empty strings. The empty strings are a bit odd anyway.
Why add a new property?
- Code already depends on innerHTML and innerBlocks; I don't want to break any existing behaviors and adding is less risky than changing.

aduth · 2018-11-02T14:32:03Z

To clarify, a point of having the null is so that we could potentially make sense of:

Input

<!-- wp:block -->before<!-- wp:void /-->between<!-- wp:void /-->after<!-- /wp:block -->

Output

innerContent: [ 'before', null, 'between', null, 'after' ]

Or even:

Input

<!-- wp:block -->before<!-- wp:void /--><!-- wp:void /-->between<!-- wp:void /-->after<!-- /wp:block -->

Output

innerContent: [ 'before', null, null, 'between', null, 'after' ]

?

Not speaking to any practical implications of the parser implementation, but the editor implementation of nested blocks was very much based on the assumption that inner blocks occur at most once in a single sequence order, so it is not aware of (and could not easily support) interspersed inner blocks.

This can be seen in the documentation of the InnerBlocks component:

Note: A block can render at most a single InnerBlocks and InnerBlocks.Content element in edit and save respectively. To create distinct arrangements of nested blocks, create a separate block type which renders its own InnerBlocks and assign as the sole allowedBlocks type.

https://github.com/WordPress/gutenberg/blob/master/packages/editor/src/components/inner-blocks/README.md

As far as the parser is concerned, I could grant that it not care to be tailored to this editor implementation detail, even going so far as to applaud that it could support some future editor enhancement toward this goal. All the same, it's a misalignment worth pointing out, and I haven't really seen compelling evidence that the InnerBlocks limitations provide an insufficient base upon which to build nested blocks (where more advanced cases can create nested arrangements, e.g. Columns -> Column).

In all then, and even in acknowledging some redundancy between innerContent and innerHTML (where the latter is effectively innerContent.filter( Boolean ).join( '' )?), this seems an agreeable direction.

The editor today would have to assume that the first and last strings are the content surrounding InnerBlocks, and thus disregard any non-block members between (or perhaps parse as of type "fallback"?).

aduth

There are test failures (which, at a glance, appear mostly to do with mere addition of the property where relevant) but conceptually and code-wise this would have my approval.

dmsnell · 2018-11-02T16:10:01Z

Thanks @aduth - to summarize a conversation we had about having one location vs having each inner block location marked: I'm not going to enforce any behaviors through the parser. Even though this change means that the parser is capable of handling the case where inner blocks are interspersed with HTML it doesn't mean they have to. Having separate locations is the "default" behavior of the code whereas forcing a single location for the list of inner blocks would require more special-casing and complexity in the parser. I'd like to leave this simpler version in especially since it gives us the options later to support broader innerBlock behaviors.

aduth

Some merge conflicts to resolve.

aduth · 2018-11-05T16:12:15Z

test/integration/full-content/fixtures/core__columns.parsed.json

                    }
                ],
-                "innerHTML": "\n\t<div class=\"wp-block-column\">\n\t\t\n\t\t\n\t</div>\n\t"
+                "innerHTML": "\n\t<div class=\"wp-block-column\">\n\t\t\n\t\t\n\t</div>\n\t",
+                "innerContent": [


aduth · 2018-11-05T16:13:53Z

test/integration/full-content/fixtures/core__columns.parsed.json

+                "innerContent": [
+                    "\n\t<div class=\"wp-block-column\">\n\t\t",
+                    null,
+                    "\n\t\t",


Trying to think if the serializer behavior of newline-delimited blocks could have an impact in the ability to re-serialize, but I'm assuming the editor will disregard this as far as parsing/rehydrating the top-level columns block (which does and will continue to operate on innerHTML?).

good point. we may have to take some additional care to make sure that the serializer matches behavior with the parser, but since it's been working so far I'm guessing it will continue to work for a while.

dmsnell · 2018-11-05T16:29:22Z

rebased

aduth

LGTM 👍 Nice work here.

aduth · 2018-11-06T21:01:04Z

packages/block-serialization-default-parser/parser.php

 		$this->blockName   = $name;
 		$this->attrs       = $attrs;
 		$this->innerBlocks = $innerBlocks;
 		$this->innerHTML   = $innerHTML;
+		$this->innerContent = $innerContent;


New assignment introduces misalignment on the =.

will follow-up in a new PR, thanks for spotting this!

dmsnell · 2018-11-07T01:58:15Z

A little slower all around but tolerably so given that we're expanding the functionality.

enejb · 2018-11-08T03:53:31Z

❤️

aduth · 2018-11-20T17:47:20Z

test/integration/full-content/fixtures/core__audio.parsed.json

@@ -5,12 +5,14 @@
            "align": "right"
        },
        "innerBlocks": [],
-        "innerHTML": "\n<figure class=\"wp-block-audio alignright\">\n    <audio controls=\"\" src=\"https://media.simplecast.com/episodes/audio/80564/draft-podcast-51-livePublish2.mp3\"></audio>\n</figure>\n"
+        "innerHTML": "\n<figure class=\"wp-block-audio alignright\">\n    <audio controls=\"\" src=\"https://media.simplecast.com/episodes/audio/80564/draft-podcast-51-livePublish2.mp3\"></audio>\n</figure>\n",
+        "innerContent": [ "\n<figure class=\"wp-block-audio alignright\">\n    <audio controls=\"\" src=\"https://media.simplecast.com/episodes/audio/80564/draft-podcast-51-livePublish2.mp3\"></audio>\n</figure>\n" ]


I'm guessing that these were manually edited?

It's meant to be generated via npm run fixtures:regenerate, which on master is producing local changes because of stylistic differences of the generated result.

yeah that's right @aduth - I was unaware of fixtures:regenerate when I wrote this

dmsnell added the [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f label Oct 31, 2018

dmsnell requested review from mcsf and aduth October 31, 2018 21:14

dmsnell force-pushed the parser/add-inner-content branch from 6c62b3f to 7a39478 Compare October 31, 2018 21:18

aduth reviewed Nov 2, 2018

View reviewed changes

This was referenced Nov 2, 2018

WIP: Parser: Add tokens for inner blocks in inner HTML #11309

Closed

Update post grammar to include markers for inner blocks #11082

Closed

dmsnell force-pushed the parser/add-inner-content branch 2 times, most recently from 327f01c to 857b02f Compare November 3, 2018 14:38

dmsnell changed the title ~~WIP: Parser: Add new list of HTML fragments to parse output~~ Parser: Add new list of HTML fragments to parse output Nov 3, 2018

enejb mentioned this pull request Nov 5, 2018

Gutenpack: Add/contact form to the gutenberg editor Automattic/jetpack#10537

Merged

aduth reviewed Nov 5, 2018

View reviewed changes

dmsnell added 8 commits November 5, 2018 10:25

Parser: Add new list of HTML fragments to parse output

c9ba2b3

update parser to new parser output format

132d495

fix typo in test name

7267297

add PHP doc comment for innerContent property

269da3b

updating tests

3a22427

oops how'd that get there

05db954

fix that thing there the default parser was broken

d490167

update fixtures

84ba8d5

dmsnell force-pushed the parser/add-inner-content branch from 9a584f6 to 84ba8d5 Compare November 5, 2018 16:28

aduth approved these changes Nov 6, 2018

View reviewed changes

dmsnell merged commit 1014389 into master Nov 7, 2018

dmsnell deleted the parser/add-inner-content branch November 7, 2018 01:58

catehstn mentioned this pull request Nov 9, 2018

Contact Form Gutenblock Automattic/jetpack#10256

Closed

5 tasks

mtias added this to the 4.3 milestone Nov 12, 2018

aduth reviewed Nov 20, 2018

View reviewed changes

dmsnell mentioned this pull request Feb 1, 2019

Return inner HTML before and after inner blocks when parsing and fix … #8760

Closed

4 tasks

aduth mentioned this pull request Feb 4, 2019

gutenberg_render_block does not render inner blocks #8214

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser: Add new list of HTML fragments to parse output #11334

Parser: Add new list of HTML fragments to parse output #11334

dmsnell commented Oct 31, 2018 •

edited

Loading

aduth commented Nov 2, 2018 •

edited

Loading

aduth left a comment

dmsnell commented Nov 2, 2018

aduth left a comment

aduth Nov 5, 2018

aduth Nov 5, 2018

dmsnell Nov 5, 2018

dmsnell commented Nov 5, 2018

aduth left a comment

aduth Nov 6, 2018

dmsnell Nov 7, 2018

dmsnell commented Nov 7, 2018

enejb commented Nov 8, 2018

aduth Nov 20, 2018

aduth Nov 20, 2018

dmsnell Nov 20, 2018

Parser: Add new list of HTML fragments to parse output #11334

Parser: Add new list of HTML fragments to parse output #11334

Conversation

dmsnell commented Oct 31, 2018 • edited Loading

Abstract

Status

Summary

Questions

aduth commented Nov 2, 2018 • edited Loading

aduth left a comment

Choose a reason for hiding this comment

dmsnell commented Nov 2, 2018

aduth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmsnell commented Nov 5, 2018

aduth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmsnell commented Nov 7, 2018

enejb commented Nov 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmsnell commented Oct 31, 2018 •

edited

Loading

aduth commented Nov 2, 2018 •

edited

Loading