Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser: Client-side wpautop #4005

Merged
merged 2 commits into from
Jan 30, 2018
Merged

Parser: Client-side wpautop #4005

merged 2 commits into from
Jan 30, 2018

Conversation

aduth
Copy link
Member

@aduth aduth commented Dec 14, 2017

Related: #3745, #2806

This pull request seeks to explore solutions for migrating classic content transparently, while also considering paragraph block serialization omitting comment demarcations. In #3745, it was found that the server serialization behavior was insufficient to respect nested blocks. This was originally implemented to selectively apply wpautop to non-block content. The approach here instead assumes this to occur within Gutenberg when loading a post with non-block content. Simultaneously it adds parser awareness of the paragraph tag to upgrade to a paragraph block.

An advantage of implementing wpautop upgrading in the Gutenberg editor is that it allows the the_content filter for automatic paragraphs to be skipped altogether for posts containing blocks (already exists) and eliminates the need for server processing to this effect on save.

Caveats:

There are a few uncertainties with the changes proposed here:

  • We don't want to call on the wp.oldEditor.autop global for applying autop, not just because it's a global, but because it is not functional in test environments. Instead, we should consider extracting these utilities (autop, removep) to a separate @wordpress/autop package. Edit: Updated to use the new @wordpress/autop package
  • To this point, I had started the work here, and found in porting the PHP wpautop tests to JavaScript that there are many differences between the PHP and JavaScript implementations. We should seek to improve this. Edit: Accounted for in the new @wordpress/autop package
  • The undemarcated paragraph block is certainly less verbose, but I worry that this is a post-grammar-parse step to convert a freeform string including paragraph tags to a segmented set of freeform and paragraph blocks. This would also mean the server parser would not detect undemarcated paragraphs as paragraph blocks, even if within the Gutenberg editor they were treated as such. Edit: No longer pursuing the undemarcated paragraphs as part of the changes here

Testing instructions:

Verify that saving a post in the classic editor with removep'd content can be loaded into Gutenberg respecting paragraph separations. Example:

classic text

extra p

<div>foo</div>

more p

This text, when loaded in Gutenberg, should produce the following sequence: 2 paragraph blocks, 1 freeform block (the div), and a last paragraph block. Saving the post should save with <p> tags. Edit: No longer pursuing the undemarcated paragraphs as part of the changes here

Applying background color or alignment to a paragraph should apply comment demarcations, since these attributes are serialized in comment syntax.Edit: No longer pursuing the undemarcated paragraphs as part of the changes here

Ensure that this content loads into the Gutenberg editor with paragraph blocks applied, and that even after inserting new block content and saving the post that it appears on the front-end with paragraphs still existing.

@aduth aduth added [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Status] In Progress Tracking issues with work in progress labels Dec 14, 2017
@aduth
Copy link
Member Author

aduth commented Jan 4, 2018

cc @dmsnell re: The note/worry about parsing undemarcated paragraphs outside of the grammar itself. I might plan to split this out into a separate pull request all the same.

@aduth
Copy link
Member Author

aduth commented Jan 4, 2018

Saving commit references planned to be dropped in rebase:

@aduth aduth changed the title Try: Client-side wpautop, undemarcated paragraph blocks Try: Client-side wpautop Jan 5, 2018
@aduth aduth changed the title Try: Client-side wpautop Parser: Client-side wpautop Jan 5, 2018
@aduth aduth removed the [Status] In Progress Tracking issues with work in progress label Jan 5, 2018
@aduth
Copy link
Member Author

aduth commented Jan 5, 2018

This is now ready for a proper review. I've updated the intent of the pull request to work exclusively toward porting wpautop reapplication into the client. Work toward undemarcated paragraphs and fragmenting a freeform block into paragraph blocks will be pursued separately.

@aduth aduth requested a review from mtias January 5, 2018 14:51
@youknowriad
Copy link
Contributor

Code changes look good to me. Is this supposed to fix the switching between "classic editor" and "gutenberg"?

If that's the case, it's working properly, aside one issue:

  • This adds empty paragraphs (free form block) between Gutenberg blocks when:

  • writing in Gutenberg

  • Updating in classic editor

@aduth
Copy link
Member Author

aduth commented Jan 8, 2018

@youknowriad I wouldn't expect this should fix any outstanding issues so much as recreating the existing behavior in a way that accommodates block nesting, considered as the favorable alternative to server-side parsing enhancements proposed in #3545.

@aduth
Copy link
Member Author

aduth commented Jan 8, 2018

If that's the case, it's working properly, aside one issue:

I'm seeing this in master as well. Can you confirm? Investigating why this might be happening. It's creating a new separate paragraph tag (presumably based on the double newline between blocks), which I wouldn't expect to happen as of #2708.

@aduth
Copy link
Member Author

aduth commented Jan 8, 2018

Related: #3901

Copy link
Contributor

@youknowriad youknowriad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

About the empty paragraphs issue, one way to solve it is to drop the empty lines between blocks. Granted it makes the generated HTML less readable but I prefer less readable over broken when switching between classic and Gutenberg

Copy link
Contributor

@mcsf mcsf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dmsnell
Copy link
Member

dmsnell commented Jan 14, 2018

@aduth I hear your concerns about a post-grammar parse but I can also sympathize here the same way I sympathize with the more tag. What I mean is that I think we're hitting a legacy wall where we have no where further to run to escape the past. That being said the code looks fine to me and the comments are clear.

One idea I had long ago but it never picked up much momentum was a pre-parse converter so that we could handle inconsistencies in a more isolated way. We could run the document through the preprocessor which would look for very specific and searchable patterns then replace them. In the case of the <!-- more --> tag we could search for <!--\s*more(?!-->)*--> and replace it with a full more block. This would leave the gutenberg parser purer because it wouldn't have to make those exceptions. To date we've had very few exceptions though so it's not terrible.

The same could apply to autop and freeform HTML and maybe could help with the known performance issue (#3799) which chokes on long sequences of non-block content. The preprocessor could quickly scan the document and look for anything not a proper block then perform its transformation - maybe turning everything into big free-form HTML blocks, maybe splitting by paragraph via autop, maybe something else.

Easier said than done I guess but it's an idea at least.

In the meantime I support this PR especially if it removes an obstacle to nested blocks.

A parsed block has no awareness of where inner blocks exist in its innerHTML, so it cannot safely reserialize.

There are a few options:

- Since we merely skip wpautop for known blocks, we could avoid reserialization and return the block's original HTML verbatim if we had access to its outerHTML. See nylen/phpegjs#3

- Move wpautop behavior for freeform content to the editor client. This may align well with desires to transparently upgrade legacy paragraph content to paragraph blocks. This would also allow the server to avoid any preprocessing before showing a post on the front-end, assuming that the saved content has had wpautop applied already.

Acknowledging that this effectively reverts large parts of #2806
@aduth aduth merged commit c6e8798 into master Jan 30, 2018
@aduth aduth deleted the try/client-wpautop branch January 30, 2018 21:19
@ntwb ntwb mentioned this pull request Mar 6, 2018
3 tasks
@ntwb ntwb mentioned this pull request Mar 9, 2018
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants