Return inner HTML before and after inner blocks when parsing and fix … #8760

brucepearson · 2018-08-09T08:30:24Z

Description

Fix rendering of inner blocks - #8214

gutenberg_render_block does not render inner blocks

To Reproduce
Use Gutenberg editor to add 2 columns. You end up with something like this

<!-- wp:columns -->
<div class="wp-block-columns has-2-columns"><!-- wp:column -->
<div class="wp-block-column"><!-- wp:paragraph -->
<p>This is a column</p>
<!-- /wp:paragraph --></div>
<!-- /wp:column -->

<!-- wp:column -->
<div class="wp-block-column"><!-- wp:paragraph -->
<p>This is the other column</p>
<!-- /wp:paragraph --></div>
<!-- /wp:column --></div>
<!-- /wp:columns -->

Use gutenberg_parse_blocks and then gutenberg_render_block to re-generate the content.

$blocks = gutenberg_parse_blocks( $content );

$new_content = '';
foreach( $blocks as $block ) {
   $new_content .= gutenberg_render_block( $block );
}

Expected behaviour
$new_content should be the same as $content but it's not. The inner blocks of columns is not rendered.

How has this been tested?

This code has been tested as described above.

This has been covered by PHP Unit class Test_Render.

Types of changes

Introduced new innerHTMLBeforeInnerBlocks and innerHTMLAfterInnerBlocks to block parser. This data is set only when there are inner blocks.
Updated gutenberg_render_block to use this data to properly render the inner blocks.
Added tests to cover this.

Checklist:

My code is tested.
My code follows the WordPress code style.
My code follows the accessibility standards.
My code has proper inline documentation.

…rendering of inner blocks - WordPress#8214

brucepearson · 2018-08-09T11:29:11Z

@youknowriad @mtias @gziolo @aduth Could you please look over this and let me know if this can be considered. The js parsing fails for column and columns blocks and will need some work to produce the same as the PHP parser. I can have a look if you think it's worth continuing with.

chrisvanpatten · 2018-08-09T13:27:03Z

How does this work with many layers of nested blocks? I'm not too familiar with the inner workings of the parser (or any parser, for that matter) but it seems like maybe an approach where blocks are rendered from the inside out, with the rendered blocks passed up the chain, might be a more stable way of handling it?

brucepearson · 2018-08-09T13:44:13Z

This will work fine with many layers of nested blocks. It uses recursion to render any inner blocks. There's a test that includes 3 layers of nesting with a paragraph block inside a column block inside a columns block.

dmsnell · 2018-08-09T14:35:29Z

lib/parser.php

+          $block['innerHTMLBeforeInnerBlocks']  = implode( '', $innerHTMLBeforeInnerBlocks );
+          $block['innerHTMLAfterInnerBlocks']  = implode( '', $innerHTMLAfterInnerBlocks );
+	    }
+	    return $block;


hm…it seems like we are trying to manually edit automatically-generated code here. @brucepearson did you get a chance to look at the spec grammar which is run through php-pegjs to generate this file?

Thanks @dmsnell. Is the concept OK that I've applied here? I can then translate to the grammar file.

dmsnell · 2018-08-09T14:41:47Z

lib/blocks.php

+	if ( isset( $block['innerBlocks'] )  && count( $block['innerBlocks'] ) ) {
+		$raw_content = $block['innerHTMLBeforeInnerBlocks'];
+		foreach ( $block['innerBlocks'] as $inner_block ) {
+			$raw_content .= gutenberg_render_block( $inner_block );


although we shouldn't expect deeply nested blocks we may want to consider the possibility of their existence. suppose someone made a post with 1000 levels of nesting - that could theoretically cause WordPress to crash on account of overflowing the call stack.

I'd recommend considering one of two changes to the behavior here:

set a limit on the recursion and abort with some defined behavior when we're too deep

flip the recursion on its head and recurse using a trampoline structure to track the recursion in an alternate data structure or control structure to eliminate the risk of stack overflow

it should be noted that this work overlaps what's happening in #8083. if we get a fast internal PHP parser we might dramatically rewrite this function.

@dmsnell Is it worth continuing with this considering you might change the PHP parser. The intention of this change is to fix rendering of inner blocks that doesn't happen at all right now.

dmsnell · 2018-08-20T12:32:13Z

sorry for the delay here @brucepearson - I don't want to say too much about this method in the current state of the PR because I think we have to step back and make a change to the way we wrote the code up - actually it can help to take a few steps back and write up from a high level what you are doing and how it solves the problem (no code 😉)

unfortunately we can't just change the generated parser code - if we do that it will immediately disappear once we regenerate it. additionally it's hard to make assertions about how those changes will affect the parse. since the PEG parser places constraints on how we parse we might end up triggering unrelated bugs if we do a kind of deep surgery on it.

instead we need to think about how what changes we want to make from a semantic point of view - what is wrong with the specification (the grammar .pegjs file) that prevents us from rendering inner content well? once we have that figured out we can update the grammar if needed and regenerate the parser.

on the other hand, if we adopt the parser implementation in #8083 then it may not be an issue. I don't believe that we will discover that the problem lies in the grammar but rather in the limited parser implementation in PHP that has been serving us well up until now. we haven't had a spec-compliant parser in PHP because the auto-generated one was too slow to be practical. if we change that we should be able to handle full nesting and dynamic blocks without any exceptional cases or code snippets.

does that help? I'll try to be more responsive with this PR this week.

brucepearson · 2018-08-21T03:49:45Z

I think we have to step back and make a change to the way we wrote the code up - actually it can help to take a few steps back and write up from a high level what you are doing and how it solves the problem (no code wink)

@dmsnell I'm working on adding translation support for Gutenberg to WPML.

Basically what happens is:

The user saves the page
WPML hooks this and looks for strings that need translation. eg. We look for text inside the p tag for the paragraph block
WPML does this by using gutenberg_parse_blocks to get the blocks
For each block we then look for strings using rules for each block type.

Once the strings are extracted out of the blocks we can translate each string separately and the translator does not need to know about blocks at all.

When translation is complete we need to build the translated page.

WPML again parses the original by using gutenberg_parse_blocks to get the blocks
The strings in each block are then replaced with the translated strings
We need to rebuild the page by looping over the blocks and rendering them
The translated page is then saved

This works fine except for the inner blocks. To render the inner blocks we need to render the pre-inner html, then the inner blocks and then the post-inner-html.

dmsnell · 2018-08-25T23:37:59Z

@brucepearson - that's a great description and really helps me (and I'm sure others) understand your use-case.

I'd like to see if we can't get a new and performant parser merged into the project - such as in #8083 - so that you won't have to do any adjustments to make the server-side parse work for you.

additionally and sadly we're going to have to address something else in here that after some inspection I think may be a bit difficult: we can't assume anything about the ordering of inner blocks. in the patch here I see us trying to scan linearly in the following sequence: HTML before first inner block, list of inner blocks, HTML after last inner block. we very much expect a variety of sequences however including HTML between inner blocks.

one of the earliest discussion around nesting involved what to do with inner blocks. as it stands today we cannot infer after the parse in what positions the inner blocks were found - we only get an array of them in the order we found them but no placemarkers.

@mtias and @aduth I know discussed this point as well. as it stands we may still be able to somehow add those markers without breaking the existing format of the parser - say by adding new attributes like indicesOfInnerBlocks though with a terser name of course.

though this may sound frustrating it's not our only option even. there are a number of ways we should be able to address a need like this including one thing I'd like to see at some point: a block visitor where we can register a function to run over each block for additional processing.

I think I'll pause now and let you catch up and respond 😄

brucepearson · 2018-09-13T10:57:16Z

@dmsnell I see there's a new parser in Gutenberg 3.8. Is there plans to improve handling of the inner blocks any time soon?

dmsnell · 2018-09-13T19:35:11Z

@brucepearson thanks for popping in again here!

Is there plans to improve handling of the inner blocks any time soon?

yes! I'd expect the process to work like this…

the new parser is already in as of 3.8.0
we'll make sure nothing major is broken in the parser
the new default parser will replace the top-level-only get_dynamic_blocks_regex() in do_blocks
dynamic blocks on the server will have access to the inner blocks
we'll update the grammar specification to include some indication of where in the inner text the inner blocks existed.

I've played around and had working code to provide a new property to the blocks alongside innerBlocks which would be an array of indices matching the array of inner blocks and that index would reference the remaining innerHTML without the inner blocks…

parse( '<!-- wp:my/block -->a<!-- wp:my/inner -->b<!-- /wp:my/inner -->c<!-- /wp:my/block -->' ) === [
	{
		blockName: 'my/block',
		attrs: {},
		innerHTML: 'ac',
		innerBlocks: [
			{ blockName: 'my/inner', attrs: {}, innerHTML: 'b', innerBlocks: [] }
		],
		innerBlockIndices: [
			1
		]
	}
]

The idea is that when the render callback in PHP has access to not only the inner blocks but also to the indices from whence they came then PHP is free to process and reassemble the original block with the limitations that are currently there.

Thoughts?

brucepearson · 2018-09-14T01:48:21Z

@dmsnell The plan looks good. The innerBlockIndices should be all that is required to rebuild the original block.

One further point...
There's currently no function to rebuild the original content from the parsed data. gutenberg_render_block renders it for display but we need something that renders it so that it will load into the Gutenberg editor.

I'm currently using 'hackish' code to do this:

	public function string_translated(
		$package_kind,
		$translated_post_id,
		$original_post,
		$string_translations,
		$lang
	) {

		if ( self::PACKAGE_ID === $package_kind ) {
			$blocks = gutenberg_parse_blocks( $original_post->post_content );

			$blocks = $this->update_block_translations( $blocks, $string_translations, $lang );

			$content = '';
			foreach ( $blocks as $block ) {
				$content .= $this->render_block( $block );
			}

			wp_update_post( array( 'ID' => $translated_post_id, 'post_content' => $content ) );

		}

	}

	/**
	 * @param array|WP_Block_Parser_Block $block
	 *
	 * @return string
	 */
	private function render_block( $block ) {
		$content = '';

		if ( $block instanceof WP_Block_Parser_Block ) {
			$block_type = preg_replace( '/^core\//', '', $block->blockName );

			$block_attributes = '';
			if ( $block->attrs ) {
				$block_attributes = ' ' . json_encode( $block->attrs );
			}
			$content .= '<!-- wp:' . $block_type . $block_attributes . ' -->';

			$content .= $this->render_inner_HTML( $block );

			$content .= '<!-- /wp:' . $block_type . ' -->';

		} else {
			$content .= $block['innerHTML'];
		}

		return $content;

	}

Do you already have this planned?

There are numerous needs to process posts and block content from its structured form without demanding that plugin authors implement their own parsing systems. Since the new default parser was implemented in #8083 the server-side parse is now fast enough to consider doing full parses of our documents and with that brings the idea that we can filter block content from the parser itself. In this patch I'm exploring an API to allow extending the parser's behavior by post-processing blocks as they enter the parser's output array. This new filter gives the ability to transform all of the block's properties as they finish parsing. In the case of inner blocks the filter runs as the inner blocks have finished their own nesting. In the case of top-level blocks the filter runs after all inner content has finished parsing. One use case is in #8760 where we want to replace the HTML parts of blocks while preserving other structure. Another use case could be removing specific inner blocks or content based on the current user requesting a post. This filter exposes a kind of visitor pattern for the nested parse. > **THIS IS AN INCOMPLETE PATCH DO NOT MERGE**

georgestephanis · 2018-10-10T14:15:33Z

This would address (I believe) #6751 and #7247 (in case it's worth closing those in favor of this)

dmsnell · 2018-10-10T15:01:49Z

@brucepearson I've started working on this in #10463 such that the new parser is used for do_blocks. some normal snags are holding it up but it's also revealing to me some of the tricks we have to pull to make this happen.

in #10108 I started exploring a block-level filter which I think also comes into play.

my idea is that we'll have to end up doing the full parse in do_blocks which lets us actually address the inner blocks and then use a registered filter to process those block data structures from the bottom up.

if you don't mind could you look at the imagined interface in #10108 and provide your feedback on that API, how you would imagine wanting to hook into the block process, how you would imagine writing your plugin to do the replacements on render?

brucepearson · 2018-10-11T09:44:21Z

@dmsnell

how you would imagine writing your plugin to do the replacements on render?

I don't think replace on render would work for us. It has a few problems:

Performance can be slow if there are many strings
Often users want to edit their translated pages manually
It would be a big workflow change compared to the classic editor where a user could edit the translated post.

I do believe that the filter could work as long as there's a function to render the parsed blocks back to the post content in the format for the Gutenberg editor.

eg.

add_filter( 'block_post_parse', function( &$block ) {
    ... Do some magic with the block.
} );

$blocks = gutenberg_parse_blocks( $post->content );
$translated_post->content = gutenberg_render_blocks_for_editor( $blocks );

brucepearson · 2018-10-18T08:44:06Z

@dmsnell I noticed in 4.0.0 that gutenberg_parse_blocks now returns an array of arrays instead of an array of WP_Block_Parser_Block objects. Will it stay that way in the future so we can rely on it return the same data format?

Attempt three at including positional information from the parse to enable isomorphic reconstruction of the source `post_content` after parsing. See alternate attempts: #11082, #11309 Motivated by: #7247, #8760, Automattic/jetpack#10256 Enables: #10463, #10108 ## Abstract Add new `innerContent` property to each block in parser output indicating where in the innerHTML each innerBlock was found. ## Status - will update fixtures after design review indicates this is the desired approach - all parsers passing new tests for fragment behavior ## Summary Inner blocks, or nested blocks, or blocks-within-blocks, can exist in Gutenberg posts. They are serialized in `post_content` in place as normal blocks which exist in between another block's comment delimiters. ```html  Check out my  and my other  with its own content.   ``` The way this gets parsed leaves us in a quandary: we cannot reconstruct the original `post_content` after parsing because we lose the origin location information for each inner block since they are only passed as an array of inner blocks. ```json { "blockName": "core/outerBlock", "attrs": {}, "innerBlocks": [ { "blockName": "core/voidInnerBlock", "attrs": {}, "innerBlocks": [], "innerHTML": "" }, { "blockName": "core/innerBlock", "attrs": {}, "innerBlocks": [], "innerHTML": "\nwith its own content.\n" } ], "innerHTML": "\nCheck out my\n\nand my other\n\n" } ``` At this point we have parsed the blocks and prepared them for attaching into the JavaScript block code that interprets them but we have lost our reverse transformation. In this PR I'd like to introduce a new mechanism which shouldn't break existing functionality but which will enable us to go back and forth isomorphically between the `post_content` and first stage of parsing. If we can tear apart a Gutenberg post and reassemble then it will let us to structurally-informed processing of the posts without needing to be aware of all the block JavaScript. The proposed mechanism is a new property as a **list of HTML fragments with `null` values interspersed between those fragments where the blocks were found**. ```json { "blockName": "core/outerBlock", "attrs": {}, "innerBlocks": [ { "blockName": "core/voidInnerBlock", "attrs": {}, "innerBlocks": [], "blockMarkers": [], "innerHTML": "" }, { "blockName": "core/innerBlock", "attrs": {}, "innerBlocks": [], "blockMarkers": [], "innerHTML": "\nwith its own content.\n" } ], "innerHTML": "\nCheck out my\n\nand my other\n\n", "innerContent": [ "\nCheck out my\n", null, "\n and my other\n", null, "\n" ], } ``` Doing this allows us to replace those `null` values with their associated block (sequentially) from `innerBlocks`. ## Questions - Why not use a string token instead of an array? - See #11309. The fundamental problem with the token is that it could be valid content input from a person and so there's a probability that we would fail to split the content accurately. - Why add the `null` instead of leaving basic array splits like `[ 'before', 'after' ]`? - By inspection we can see that without an explicit marker we don't know if the block came before or after or between array elements. We could add empty strings `''` and say that blocks exist only _between_ array elements but the parser code would have to be more complicated to make sure we appropriately add those empty strings. The empty strings are a bit odd anyway. - Why add a new property? - Code already depends on `innerHTML` and `innerBlocks`; I don't want to break any existing behaviors and adding is less risky than changing.

gziolo · 2019-02-01T08:55:19Z

@brucepearson and @dmsnell - what's the status of this one? lib/blocks.php is almost entirely moved to WordPress core now and lib/parser.php was moved to a new location. This PR needs to be refreshed for sure so I'm marking it as Stale to make review process of 200+ PRs easier.

dmsnell · 2019-02-01T18:36:51Z

@gziolo @brucepearson the parser in core splits the inner HTML into a list of HTML and null values which indicate where the inner blocks are. we should be able to close this issue but I want to let @brucepearson make that call.

in #11334 we added the change which breaks apart the innerHTML so now it should be possible to rearrange or work with inner content using the filters, notably render_block and soon render_block_data (introduced in https://core.trac.wordpress.org/ticket/45451)

by default do_blocks() and render_block() should properly render the nesting content.

aduth · 2019-02-04T19:10:40Z

Let's err on the side of closing. I've tested and commented as such at #8214 (comment) . If further work is needed, the issue can be reopened or new ones created.

Return inner HTML before and after inner blocks when parsing and fix …

43ba24a

…rendering of inner blocks - WordPress#8214

gziolo requested review from aduth, pento and dmsnell August 9, 2018 11:35

gziolo added [Feature] Block API API that allows to express the block paradigm. [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f labels Aug 9, 2018

dmsnell requested changes Aug 9, 2018

View reviewed changes

aduth added the [Feature] Nested / Inner Blocks Anything related to the experience of nested/inner blocks inside a larger container, like Group or P label Aug 20, 2018

dmsnell self-assigned this Aug 21, 2018

dmsnell mentioned this pull request Sep 22, 2018

Block API: Add pre_render and post_render block filters #10108

Closed

dmsnell mentioned this pull request Oct 10, 2018

Parser: Replace dynamic-block regex in do_blocks #10463

Closed

7 tasks

aduth mentioned this pull request Oct 10, 2018

ServerSideRender does not expose block's content #10478

Open

This was referenced Oct 25, 2018

Update post grammar to include markers for inner blocks #11082

Closed

WIP: Parser: Add tokens for inner blocks in inner HTML #11309

Closed

dmsnell mentioned this pull request Nov 2, 2018

Parser: Add new list of HTML fragments to parse output #11334

Merged

gziolo added the [Status] Stale Gives the original author opportunity to update before closing. Can be reopened as needed. label Feb 1, 2019

aduth closed this Feb 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return inner HTML before and after inner blocks when parsing and fix … #8760

Return inner HTML before and after inner blocks when parsing and fix … #8760

brucepearson commented Aug 9, 2018

brucepearson commented Aug 9, 2018

chrisvanpatten commented Aug 9, 2018

brucepearson commented Aug 9, 2018

dmsnell Aug 9, 2018

brucepearson Aug 13, 2018

dmsnell Aug 9, 2018

brucepearson Aug 10, 2018

dmsnell commented Aug 20, 2018

brucepearson commented Aug 21, 2018

dmsnell commented Aug 25, 2018

brucepearson commented Sep 13, 2018

dmsnell commented Sep 13, 2018 •

edited

Loading

brucepearson commented Sep 14, 2018

georgestephanis commented Oct 10, 2018 •

edited

Loading

dmsnell commented Oct 10, 2018

brucepearson commented Oct 11, 2018 •

edited

Loading

brucepearson commented Oct 18, 2018

gziolo commented Feb 1, 2019

dmsnell commented Feb 1, 2019

aduth commented Feb 4, 2019

Return inner HTML before and after inner blocks when parsing and fix … #8760

Return inner HTML before and after inner blocks when parsing and fix … #8760

Conversation

brucepearson commented Aug 9, 2018

Description

How has this been tested?

Types of changes

Checklist:

brucepearson commented Aug 9, 2018

chrisvanpatten commented Aug 9, 2018

brucepearson commented Aug 9, 2018

dmsnell Aug 9, 2018

Choose a reason for hiding this comment

brucepearson Aug 13, 2018

Choose a reason for hiding this comment

dmsnell Aug 9, 2018

Choose a reason for hiding this comment

brucepearson Aug 10, 2018

Choose a reason for hiding this comment

dmsnell commented Aug 20, 2018

brucepearson commented Aug 21, 2018

dmsnell commented Aug 25, 2018

brucepearson commented Sep 13, 2018

dmsnell commented Sep 13, 2018 • edited Loading

brucepearson commented Sep 14, 2018

georgestephanis commented Oct 10, 2018 • edited Loading

dmsnell commented Oct 10, 2018

brucepearson commented Oct 11, 2018 • edited Loading

brucepearson commented Oct 18, 2018

gziolo commented Feb 1, 2019

dmsnell commented Feb 1, 2019

aduth commented Feb 4, 2019

dmsnell commented Sep 13, 2018 •

edited

Loading

georgestephanis commented Oct 10, 2018 •

edited

Loading

brucepearson commented Oct 11, 2018 •

edited

Loading