Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return inner HTML before and after inner blocks when parsing and fix … #8760

Closed

Conversation

brucepearson
Copy link

Description

Fix rendering of inner blocks - #8214

gutenberg_render_block does not render inner blocks

To Reproduce
Use Gutenberg editor to add 2 columns. You end up with something like this

<!-- wp:columns -->
<div class="wp-block-columns has-2-columns"><!-- wp:column -->
<div class="wp-block-column"><!-- wp:paragraph -->
<p>This is a column</p>
<!-- /wp:paragraph --></div>
<!-- /wp:column -->

<!-- wp:column -->
<div class="wp-block-column"><!-- wp:paragraph -->
<p>This is the other column</p>
<!-- /wp:paragraph --></div>
<!-- /wp:column --></div>
<!-- /wp:columns -->

Use gutenberg_parse_blocks and then gutenberg_render_block to re-generate the content.

$blocks = gutenberg_parse_blocks( $content );

$new_content = '';
foreach( $blocks as $block ) {
   $new_content .= gutenberg_render_block( $block );
}

Expected behaviour
$new_content should be the same as $content but it's not. The inner blocks of columns is not rendered.

How has this been tested?

This code has been tested as described above.

This has been covered by PHP Unit class Test_Render.

Types of changes

Introduced new innerHTMLBeforeInnerBlocks and innerHTMLAfterInnerBlocks to block parser. This data is set only when there are inner blocks.
Updated gutenberg_render_block to use this data to properly render the inner blocks.
Added tests to cover this.

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code follows the accessibility standards.
  • My code has proper inline documentation.

@brucepearson
Copy link
Author

@youknowriad @mtias @gziolo @aduth Could you please look over this and let me know if this can be considered. The js parsing fails for column and columns blocks and will need some work to produce the same as the PHP parser. I can have a look if you think it's worth continuing with.

@gziolo gziolo requested review from aduth, pento and dmsnell August 9, 2018 11:35
@gziolo gziolo added [Feature] Block API API that allows to express the block paradigm. [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f labels Aug 9, 2018
@chrisvanpatten
Copy link
Contributor

How does this work with many layers of nested blocks? I'm not too familiar with the inner workings of the parser (or any parser, for that matter) but it seems like maybe an approach where blocks are rendered from the inside out, with the rendered blocks passed up the chain, might be a more stable way of handling it?

@brucepearson
Copy link
Author

This will work fine with many layers of nested blocks. It uses recursion to render any inner blocks. There's a test that includes 3 layers of nesting with a paragraph block inside a column block inside a columns block.

$block['innerHTMLBeforeInnerBlocks'] = implode( '', $innerHTMLBeforeInnerBlocks );
$block['innerHTMLAfterInnerBlocks'] = implode( '', $innerHTMLAfterInnerBlocks );
}
return $block;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm…it seems like we are trying to manually edit automatically-generated code here. @brucepearson did you get a chance to look at the spec grammar which is run through php-pegjs to generate this file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dmsnell. Is the concept OK that I've applied here? I can then translate to the grammar file.

if ( isset( $block['innerBlocks'] ) && count( $block['innerBlocks'] ) ) {
$raw_content = $block['innerHTMLBeforeInnerBlocks'];
foreach ( $block['innerBlocks'] as $inner_block ) {
$raw_content .= gutenberg_render_block( $inner_block );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although we shouldn't expect deeply nested blocks we may want to consider the possibility of their existence. suppose someone made a post with 1000 levels of nesting - that could theoretically cause WordPress to crash on account of overflowing the call stack.

I'd recommend considering one of two changes to the behavior here:

  • set a limit on the recursion and abort with some defined behavior when we're too deep
  • flip the recursion on its head and recurse using a trampoline structure to track the recursion in an alternate data structure or control structure to eliminate the risk of stack overflow

it should be noted that this work overlaps what's happening in #8083. if we get a fast internal PHP parser we might dramatically rewrite this function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmsnell Is it worth continuing with this considering you might change the PHP parser. The intention of this change is to fix rendering of inner blocks that doesn't happen at all right now.

@dmsnell
Copy link
Member

dmsnell commented Aug 20, 2018

sorry for the delay here @brucepearson - I don't want to say too much about this method in the current state of the PR because I think we have to step back and make a change to the way we wrote the code up - actually it can help to take a few steps back and write up from a high level what you are doing and how it solves the problem (no code 😉)

unfortunately we can't just change the generated parser code - if we do that it will immediately disappear once we regenerate it. additionally it's hard to make assertions about how those changes will affect the parse. since the PEG parser places constraints on how we parse we might end up triggering unrelated bugs if we do a kind of deep surgery on it.

instead we need to think about how what changes we want to make from a semantic point of view - what is wrong with the specification (the grammar .pegjs file) that prevents us from rendering inner content well? once we have that figured out we can update the grammar if needed and regenerate the parser.


on the other hand, if we adopt the parser implementation in #8083 then it may not be an issue. I don't believe that we will discover that the problem lies in the grammar but rather in the limited parser implementation in PHP that has been serving us well up until now. we haven't had a spec-compliant parser in PHP because the auto-generated one was too slow to be practical. if we change that we should be able to handle full nesting and dynamic blocks without any exceptional cases or code snippets.


does that help? I'll try to be more responsive with this PR this week.

@aduth aduth added the [Feature] Nested / Inner Blocks Anything related to the experience of nested/inner blocks inside a larger container, like Group or P label Aug 20, 2018
@brucepearson
Copy link
Author

I think we have to step back and make a change to the way we wrote the code up - actually it can help to take a few steps back and write up from a high level what you are doing and how it solves the problem (no code wink)

@dmsnell I'm working on adding translation support for Gutenberg to WPML.

Basically what happens is:

  1. The user saves the page
  2. WPML hooks this and looks for strings that need translation. eg. We look for text inside the p tag for the paragraph block
  3. WPML does this by using gutenberg_parse_blocks to get the blocks
  4. For each block we then look for strings using rules for each block type.

Once the strings are extracted out of the blocks we can translate each string separately and the translator does not need to know about blocks at all.

When translation is complete we need to build the translated page.

  1. WPML again parses the original by using gutenberg_parse_blocks to get the blocks
  2. The strings in each block are then replaced with the translated strings
  3. We need to rebuild the page by looping over the blocks and rendering them
  4. The translated page is then saved

This works fine except for the inner blocks. To render the inner blocks we need to render the pre-inner html, then the inner blocks and then the post-inner-html.

@dmsnell dmsnell self-assigned this Aug 21, 2018
@dmsnell
Copy link
Member

dmsnell commented Aug 25, 2018

@brucepearson - that's a great description and really helps me (and I'm sure others) understand your use-case.

I'd like to see if we can't get a new and performant parser merged into the project - such as in #8083 - so that you won't have to do any adjustments to make the server-side parse work for you.

additionally and sadly we're going to have to address something else in here that after some inspection I think may be a bit difficult: we can't assume anything about the ordering of inner blocks. in the patch here I see us trying to scan linearly in the following sequence: HTML before first inner block, list of inner blocks, HTML after last inner block. we very much expect a variety of sequences however including HTML between inner blocks.

one of the earliest discussion around nesting involved what to do with inner blocks. as it stands today we cannot infer after the parse in what positions the inner blocks were found - we only get an array of them in the order we found them but no placemarkers.

@mtias and @aduth I know discussed this point as well. as it stands we may still be able to somehow add those markers without breaking the existing format of the parser - say by adding new attributes like indicesOfInnerBlocks though with a terser name of course.

though this may sound frustrating it's not our only option even. there are a number of ways we should be able to address a need like this including one thing I'd like to see at some point: a block visitor where we can register a function to run over each block for additional processing.

I think I'll pause now and let you catch up and respond 😄

@brucepearson
Copy link
Author

@dmsnell I see there's a new parser in Gutenberg 3.8. Is there plans to improve handling of the inner blocks any time soon?

@dmsnell
Copy link
Member

dmsnell commented Sep 13, 2018

@brucepearson thanks for popping in again here!

Is there plans to improve handling of the inner blocks any time soon?

yes! I'd expect the process to work like this…

  • the new parser is already in as of 3.8.0
  • we'll make sure nothing major is broken in the parser
  • the new default parser will replace the top-level-only get_dynamic_blocks_regex() in do_blocks
  • dynamic blocks on the server will have access to the inner blocks
  • we'll update the grammar specification to include some indication of where in the inner text the inner blocks existed.

I've played around and had working code to provide a new property to the blocks alongside innerBlocks which would be an array of indices matching the array of inner blocks and that index would reference the remaining innerHTML without the inner blocks…

parse( '<!-- wp:my/block -->a<!-- wp:my/inner -->b<!-- /wp:my/inner -->c<!-- /wp:my/block -->' ) === [
	{
		blockName: 'my/block',
		attrs: {},
		innerHTML: 'ac',
		innerBlocks: [
			{ blockName: 'my/inner', attrs: {}, innerHTML: 'b', innerBlocks: [] }
		],
		innerBlockIndices: [
			1
		]
	}
]

The idea is that when the render callback in PHP has access to not only the inner blocks but also to the indices from whence they came then PHP is free to process and reassemble the original block with the limitations that are currently there.

Thoughts?

@brucepearson
Copy link
Author

@dmsnell The plan looks good. The innerBlockIndices should be all that is required to rebuild the original block.

One further point...
There's currently no function to rebuild the original content from the parsed data. gutenberg_render_block renders it for display but we need something that renders it so that it will load into the Gutenberg editor.

I'm currently using 'hackish' code to do this:

	public function string_translated(
		$package_kind,
		$translated_post_id,
		$original_post,
		$string_translations,
		$lang
	) {

		if ( self::PACKAGE_ID === $package_kind ) {
			$blocks = gutenberg_parse_blocks( $original_post->post_content );

			$blocks = $this->update_block_translations( $blocks, $string_translations, $lang );

			$content = '';
			foreach ( $blocks as $block ) {
				$content .= $this->render_block( $block );
			}

			wp_update_post( array( 'ID' => $translated_post_id, 'post_content' => $content ) );

		}

	}

	/**
	 * @param array|WP_Block_Parser_Block $block
	 *
	 * @return string
	 */
	private function render_block( $block ) {
		$content = '';

		if ( $block instanceof WP_Block_Parser_Block ) {
			$block_type = preg_replace( '/^core\//', '', $block->blockName );

			$block_attributes = '';
			if ( $block->attrs ) {
				$block_attributes = ' ' . json_encode( $block->attrs );
			}
			$content .= '<!-- wp:' . $block_type . $block_attributes . ' -->';

			$content .= $this->render_inner_HTML( $block );

			$content .= '<!-- /wp:' . $block_type . ' -->';

		} else {
			$content .= $block['innerHTML'];
		}

		return $content;

	}

Do you already have this planned?

dmsnell added a commit that referenced this pull request Sep 22, 2018
There are numerous needs to process posts and block content from its
structured form without demanding that plugin authors implement their
own parsing systems.

Since the new default parser was implemented in #8083 the server-side
parse is now fast enough to consider doing full parses of our documents
and with that brings the idea that we can filter block content from the
parser itself.

In this patch I'm exploring an API to allow extending the parser's
behavior by post-processing blocks as they enter the parser's output
array. This new filter gives the ability to transform all of the block's
properties as they finish parsing.

In the case of inner blocks the filter runs as the inner blocks have
finished their own nesting. In the case of top-level blocks the filter
runs after all inner content has finished parsing.

One use case is in #8760 where we want to replace the HTML parts of
blocks while preserving other structure. Another use case could be
removing specific inner blocks or content based on the current user
requesting a post.

This filter exposes a kind of visitor pattern for the nested parse.

> **THIS IS AN INCOMPLETE PATCH DO NOT MERGE**
@georgestephanis
Copy link
Contributor

georgestephanis commented Oct 10, 2018

This would address (I believe) #6751 and #7247 (in case it's worth closing those in favor of this)

@dmsnell
Copy link
Member

dmsnell commented Oct 10, 2018

@brucepearson I've started working on this in #10463 such that the new parser is used for do_blocks. some normal snags are holding it up but it's also revealing to me some of the tricks we have to pull to make this happen.

in #10108 I started exploring a block-level filter which I think also comes into play.

my idea is that we'll have to end up doing the full parse in do_blocks which lets us actually address the inner blocks and then use a registered filter to process those block data structures from the bottom up.

if you don't mind could you look at the imagined interface in #10108 and provide your feedback on that API, how you would imagine wanting to hook into the block process, how you would imagine writing your plugin to do the replacements on render?

@brucepearson
Copy link
Author

brucepearson commented Oct 11, 2018

@dmsnell

how you would imagine writing your plugin to do the replacements on render?

I don't think replace on render would work for us. It has a few problems:

  • Performance can be slow if there are many strings
  • Often users want to edit their translated pages manually
  • It would be a big workflow change compared to the classic editor where a user could edit the translated post.

I do believe that the filter could work as long as there's a function to render the parsed blocks back to the post content in the format for the Gutenberg editor.

eg.

add_filter( 'block_post_parse', function( &$block ) {
    ... Do some magic with the block.
} );

$blocks = gutenberg_parse_blocks( $post->content );
$translated_post->content = gutenberg_render_blocks_for_editor( $blocks );

@brucepearson
Copy link
Author

@dmsnell I noticed in 4.0.0 that gutenberg_parse_blocks now returns an array of arrays instead of an array of WP_Block_Parser_Block objects. Will it stay that way in the future so we can rely on it return the same data format?

dmsnell added a commit that referenced this pull request Nov 7, 2018
Attempt three at including positional information from the parse to enable isomorphic reconstruction of the source `post_content` after parsing.

See alternate attempts: #11082, #11309
Motivated by: #7247, #8760, Automattic/jetpack#10256
Enables: #10463, #10108

## Abstract

Add new `innerContent` property to each block in parser output indicating where in the innerHTML each innerBlock was found.

## Status

 - will update fixtures after design review indicates this is the desired approach
 - all parsers passing new tests for fragment behavior

## Summary

Inner blocks, or nested blocks, or blocks-within-blocks, can exist in Gutenberg posts. They are serialized in `post_content` in place as normal blocks which exist in between another block's comment delimiters.

```html
<!-- wp:outerBlock -->
Check out my
<!-- wp:voidInnerBlock /-->
and my other
<!-- wp:innerBlock -->
with its own content.
<!-- /wp:innerBlock -->
<!-- /wp:outerBlock -->
```

The way this gets parsed leaves us in a quandary: we cannot reconstruct the original `post_content` after parsing because we lose the origin location information for each inner block since they are only passed as an array of inner blocks.

```json
{
	"blockName": "core/outerBlock",
	"attrs": {},
	"innerBlocks": [
		{
			"blockName": "core/voidInnerBlock",
			"attrs": {},
			"innerBlocks": [],
			"innerHTML": ""
		},
		{
			"blockName": "core/innerBlock",
			"attrs": {},
			"innerBlocks": [],
			"innerHTML": "\nwith its own content.\n"
		}
	],
	"innerHTML": "\nCheck out my\n\nand my other\n\n"
}
```

At this point we have parsed the blocks and prepared them for attaching into the JavaScript block code that interprets them but we have lost our reverse transformation.

In this PR I'd like to introduce a new mechanism which shouldn't break existing functionality but which will enable us to go back and forth isomorphically between the `post_content` and first stage of parsing. If we can tear apart a Gutenberg post and reassemble then it will let us to structurally-informed processing of the posts without needing to be aware of all the block JavaScript.

The proposed mechanism is a new property as a **list of HTML fragments with `null` values interspersed between those fragments where the blocks were found**.

```json
{
	"blockName": "core/outerBlock",
	"attrs": {},
	"innerBlocks": [
		{
			"blockName": "core/voidInnerBlock",
			"attrs": {},
			"innerBlocks": [],
			"blockMarkers": [],
			"innerHTML": ""
		},
		{
			"blockName": "core/innerBlock",
			"attrs": {},
			"innerBlocks": [],
			"blockMarkers": [],
			"innerHTML": "\nwith its own content.\n"
		}
	],
	"innerHTML": "\nCheck out my\n\nand my other\n\n",
	"innerContent": [ "\nCheck out my\n", null, "\n and my other\n", null, "\n" ],
}
```

Doing this allows us to replace those `null` values with their associated block (sequentially) from `innerBlocks`.

## Questions

 - Why not use a string token instead of an array?
    - See #11309. The fundamental problem with the token is that it could be valid content input from a person and so there's a probability that we would fail to split the content accurately.

 - Why add the `null` instead of leaving basic array splits like `[ 'before', 'after' ]`?
    - By inspection we can see that without an explicit marker we don't know if the block came before or after or between array elements. We could add empty strings `''` and say that blocks exist only _between_ array elements but the parser code would have to be more complicated to make sure we appropriately add those empty strings. The empty strings are a bit odd anyway.

 - Why add a new property?
    - Code already depends on `innerHTML` and `innerBlocks`; I don't want to break any existing behaviors and adding is less risky than changing.
@gziolo
Copy link
Member

gziolo commented Feb 1, 2019

@brucepearson and @dmsnell - what's the status of this one? lib/blocks.php is almost entirely moved to WordPress core now and lib/parser.php was moved to a new location. This PR needs to be refreshed for sure so I'm marking it as Stale to make review process of 200+ PRs easier.

@gziolo gziolo added the [Status] Stale Gives the original author opportunity to update before closing. Can be reopened as needed. label Feb 1, 2019
@dmsnell
Copy link
Member

dmsnell commented Feb 1, 2019

@gziolo @brucepearson the parser in core splits the inner HTML into a list of HTML and null values which indicate where the inner blocks are. we should be able to close this issue but I want to let @brucepearson make that call.

in #11334 we added the change which breaks apart the innerHTML so now it should be possible to rearrange or work with inner content using the filters, notably render_block and soon render_block_data (introduced in https://core.trac.wordpress.org/ticket/45451)

by default do_blocks() and render_block() should properly render the nesting content.

@aduth
Copy link
Member

aduth commented Feb 4, 2019

Let's err on the side of closing. I've tested and commented as such at #8214 (comment) . If further work is needed, the issue can be reopened or new ones created.

@aduth aduth closed this Feb 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Block API API that allows to express the block paradigm. [Feature] Nested / Inner Blocks Anything related to the experience of nested/inner blocks inside a larger container, like Group or P [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Status] Stale Gives the original author opportunity to update before closing. Can be reopened as needed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants