Editable: separate out content ops and switch to using tinymce tree output #3713

spocke · 2017-11-29T10:12:51Z

Description

Creating this PR more as discussion point. Can make a separate PR later depending on what comes out of this one.

The benefits of using the filtering in tiny is:

Removes invalid HTML elements it follows the HTML5 specification by default.
Removes internal elements such as placeholders, UI elements etc, zwnbsp:s.
Removes XSS elements and attributes such as scripts or event handlers.
Removes or pads invisible elements for example empty spans.
Removes or pads empty block elements such as an empty paragraph or h1.
Converts legacy elements like font and u to span elements.
Normalizes urls based on config absolute/relative etc.

Did some performance comparisons and the change is barely noticeable and since the blocks in gutenberg are mostly smaller chunks I don't think it will be a problem.

The performance is very content and browser dependent some browsers are slower at traversing the dom than parsing the html. IE/Edge/Firefox has a slower dom Chrome has a faster dom for example.

We have done some major optimizations in the tinymce 4.7.3 core and added the ability to get the intermediate node tree out this bypasses a few steps in the filtering process.

How Has This Been Tested?

I ported over the tests for domreact but if we expose the attributes and style to json in that dom-react package we don't need to move those over.
Other than that I did some manual testing of pressing enter in various blocks and saving contents. This is where the content operations occur.

Screenshots (jpeg or gifs if applicable):

Types of changes

Separated out content operations from editable.js to a separate file content.js.
Normalized the output of split operations to a single data structure.
Moved domreact into gutenberg. This might not be needed.
Swapped out the gutenberg specific filtering of nodes with tinymce filtering.

Checklist:

[Y] My code is tested.
[Y] My code follows the WordPress code style.
[N] My code follows has proper inline documentation.

codecov · 2017-11-29T10:12:54Z

Codecov Report

Merging #3713 into master will decrease coverage by 0.17%.
The diff coverage is 35.57%.

@@            Coverage Diff             @@
##           master    #3713      +/-   ##
==========================================
- Coverage   37.45%   37.28%   -0.18%     
==========================================
  Files         277      281       +4     
  Lines        6707     6899     +192     
  Branches     1222     1279      +57     
==========================================
+ Hits         2512     2572      +60     
- Misses       3536     3626      +90     
- Partials      659      701      +42

Impacted Files	Coverage Δ
blocks/editable/index.js	`10.63% <0%> (+0.46%)`	⬆️
blocks/editable/content.js	`0% <0%> (ø)`
utils/domreact/attrs.js	`100% <100%> (ø)`
blocks/api/matchers.js	`92.85% <100%> (ø)`	⬆️
blocks/editable/tree.js	`9.09% <9.09%> (ø)`
utils/domreact/index.js	`97.43% <97.43%> (ø)`
blocks/api/raw-handling/shortcode-converter.js	`46.15% <0%> (-6.23%)`	⬇️
blocks/library/shortcode/index.js	`30% <0%> (-3.34%)`	⬇️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cac31b0...2010ef1. Read the comment docs.

youknowriad

Overall, I think we should use the tree format. Interested in learning more about TinyMCE's tree format and whether it could be used as our "Editable"'s value format (even outside TinyMCE).

youknowriad · 2017-12-01T10:48:06Z

blocks/editable/content.js

+}
+
+export function getContent( editor ) {
+	return childrenToReact( editor.getContent( { format: 'tree' } ) );


What shape has this "tree" format? Can you share an example?

Do you think it makes sense to avoid the "react" element format at all and just use the tree format in the state as well (outside Editable), in which case, it requires some refactoring but might be worth it (same as #3048)

I created a simple gist doc on the structure of this tree.
https://gist.github.com/spocke/b009c5ef460c188c8995eb46b064a6b7

I guess using it for tinymce related things makes sense but unsure if it's a general enough format for usage for other things since each node has methods on it for mutation. The react format seems simpler in that regard at the least it's input format.

Can you serialize this "tree" using JSON.stringify and recreate later after calling JSON.parse? The root cause why we don't want to store React objects in the Redux state is that rehydration doesn't work out of the box.

We could convert the tree into the simple json format that was suggested else where. It's pretty fast to just convert a one tree structure to another if it's just simple objects. We just need to implement a way to set the tree back in tiny just before the filters apply.

youknowriad · 2017-12-01T10:48:29Z

blocks/editable/content.js

+	}
+
+	content = renderToString( content );
+	editor.setContent( content, { format: 'raw' } );


Can we use the "tree" format to set content as well?

Currently we only support getting the contents out as the tree but since it's an internal thing we just need to expose I see no reason why we couldn't support setting contents as a tree as well that would by pass the parsing step in the setContent call so it would be more efficient obviously.

That would be awesome, if we could send the same tree back 💯

aduth · 2017-12-04T14:42:32Z

My general feelings on this are:

Per RichText State Structure #771, the structure of how we represent rich content should ideally not be dependent on any specific library, neither TinyMCE nor React (i.e. not TinyMCE tree nodes, not React elements)
We should avoid recreating the TinyMCE clean-up behaviors if we can instead receive this data directly from the source (i.e. getting content from TinyMCE APIs as tree)
We should be conscious of and limiting how often we're converting between formats (i.e. parsing blocks -> WP tree shape -> TinyMCE HTML -> TinyMCE tree -> WP tree shape -> serialize block)

And a question:

Does this benefit TinyMCE as a project in how it internally represents data or is this just another serialization (getContent) target? Thinking either in terms of: Would TinyMCE maintain a tree shape as the contenteditable changes over time? Or in other serialization targets (getContent( { format: 'html' } ) use the tree shape as the starting point?

Overall I'm feeling positive about a change like this, particularly because it feels wrong for us to be cleaning up TinyMCE markup ourselves in Editable's createTinyMCEElement.

spocke · 2017-12-04T15:53:50Z

You could have a wordpress specific lightweight tree format that just does what you need and then convert dom or tinymce trees to that. I guess it depends on how you want to store contents having it in js only model makes it super fast to work with compared to the real dom.

But you would have to recreate a bunch of mutation logic if you want to alter the tree in memory that is something we already provide in the tinymce tree model but unsure if you need to filter the contents on the gutenberg side?

TinyMCE just exposes the tree that we already have bypassing the final step of taking the tree and serializing it back to html. If we where to support setting a tree as well we would just bypass the parser step and go straight to the filtering steps. So we kind of just exposed the internals though this tree format.

Afraithe · 2017-12-05T13:40:48Z

I guess this boils down to a more broad discussion on how you want to store the format for Gutenberg specifically. As you say, converting between different formats should be avoided. I believe @iseulde came to a similar conclusion when this was last discussed, if you plan to use a string in/out then this doesn't make much sense.

So, what is the desired spec for the internal storage format?

gziolo · 2017-12-14T08:53:27Z

So, what is the desired spec for the internal storage format?

@Afraithe, we did some explorations in #3048. Let me share one of the examples from the unit test. It mirrors how React.createElement(component, props, ...children) works. Every DOM node is represented as one of the following:

node text (string)
element node (array) - array contains node name, object with attributes and an unlimited number of children nodes

Assuming you have the following HTML code:

<div>This is a <strong>paragraph</strong> with a <a href="https://w.org/">link with <b>bold <i>and italics</i></b></a>.</div>

Here is the internal representation of the tree created:

[
	'This is a ',
	[ 'strong', {}, 'paragraph' ],
	' with a ',
	[ 'a', { href: 'https://w.org/' }, 'link with ', [
		'b', {}, 'bold ', [
			'i', {}, 'and italics',
		],
	] ],
	'.',
]

spocke · 2017-12-15T18:56:02Z

So we could provide a way in tinymce to set the tree using editor.setContent(tree, { format: 'tree' }) then provide in a PR to gutenberg functions that convert from this suggested format that you would want to and from the internal tree representation in tinymce we could do a separate PR for that if you guys feel that it's a good idea. This PR was more to get the ball rolling in term of having the filtering being done by tiny but the storage done by gutenberg.

gziolo · 2017-12-15T19:21:35Z

@aduth, @iseulde, should we iterate on this approach rather then continue efforts on #3048? Do you see any drawbacks assuming we store a plain JS data in Redux store?

aduth · 2017-12-18T14:45:21Z

One potential issue is that we'd want to still be able to set the content of the Editable prior to TinyMCE initializing to avoid content flashing in after the editor loads. See existing logic:

gutenberg/blocks/editable/tinymce.js

Lines 105 to 111 in 227a93f

    
           // If a default value is provided, render it into the DOM even before 
        
           // TinyMCE finishes initializing. This avoids a short delay by allowing 
        
           // us to show and focus the content before it's truly ready to edit. 
        
           let children; 
        
           if ( defaultValue ) { 
        
           	children = Children.toArray( defaultValue ); 
        
           }

Which means we'd still need a way to convert from the plain JS shape to a React element (or dangerously set inner HTML).

spocke · 2017-12-19T18:54:56Z

Converting the json format that was suggested to html or dom might a solution. However on editor initialization isn’t the content already a html string since from what I understand it’s stored as html in the db.

aduth · 2018-06-27T15:11:41Z

Brain dump:

This pull request is probably beyond salvaging at this point, but I've recently started coming back around to the idea that it'd be in our interest to use TinyMCE's tree format as the base source from which to retrieve content and convert it into the block attribute value shape. I'd explored some of this as well in #6467, where the main hiccup was in that we'd still need some filtering to handle the case of splitting blocks. However, I think if we refactored the implementation of RichText to use the default forced_root_block behavior, we could instead split blocks on TinyMCE's NewBlock event, call getContent( { format: 'tree' } ), and expect tree.firstChild and tree.lastChild to be the respective equivalents of our current before and after fragments. There's probably a decent amount of effort involved here to avoid breakage with multiline and tagName behaviors on RichText. Also, it may require a new wrapper div to use in place of the default tagName for the TinyMCE component, since TinyMCE's enter behavior won't allow the new block behavior to be triggered when the root element is the p; this is expected: rather, we want a div for what TinyMCE to consider as its root and its first child as what we consider as the root for the block attribute value.

aduth · 2018-06-28T16:11:48Z

FYI I explored using getContent( { format: 'tree' } ) in #7583 and considered it a failed experiment. More information at that pull request. Namely, I would have hoped TinyMCE would construct its tree by walking the actual editor DOM, but in fact it runs the parse on the body's innerHTML, which is prohibitively more expensive an operation.

spocke added 4 commits November 28, 2017 13:02

Editor: moved dom-react into utils

95e925f

Editor: moved content logic into a separate module

a66fecc

Editor: switched extract to clone to make splitAtCaret a read op

a8590c2

Editor: fixed so contents gets extracted though tinymce tree format

2010ef1

spocke requested review from youknowriad, ephox-mogran, aduth, ellatrix and tg-ephox November 29, 2017 10:12

spocke added the [Feature] Rich Text Related to the Rich Text component that allows developers to render a contenteditable label Nov 29, 2017

youknowriad reviewed Dec 1, 2017

View reviewed changes

aduth mentioned this pull request Dec 18, 2017

Try: Custom save serializer, Editable as array tree #4049

Closed

aduth mentioned this pull request Mar 3, 2018

Try: Block API: Remove children source in favor of HTML string #5380

Closed

aduth closed this Jun 27, 2018

aduth deleted the try/tinymce-tree branch June 27, 2018 15:12

aduth mentioned this pull request Jun 27, 2018

Experiment: RichText: Format content from tree value #7583

Closed

MaxMood96 mentioned this pull request Jun 24, 2023

[Snyk] Fix for 1 vulnerabilities MaxMood96/gutenberg#103

Open

MaxMood96 mentioned this pull request Dec 1, 2023

[Snyk] Fix for 1 vulnerabilities MaxMood96/gutenberg#118

Open

naiba4 mentioned this pull request Dec 1, 2023

[Snyk] Fix for 1 vulnerabilities naiba4/gutenberg#26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Editable: separate out content ops and switch to using tinymce tree output #3713

Editable: separate out content ops and switch to using tinymce tree output #3713

spocke commented Nov 29, 2017

codecov bot commented Nov 29, 2017 •

edited

Loading

youknowriad left a comment

youknowriad Dec 1, 2017

spocke Dec 4, 2017 •

edited

Loading

gziolo Dec 14, 2017

spocke Dec 15, 2017

youknowriad Dec 1, 2017

spocke Dec 4, 2017

gziolo Dec 14, 2017

aduth commented Dec 4, 2017

spocke commented Dec 4, 2017

Afraithe commented Dec 5, 2017

gziolo commented Dec 14, 2017 •

edited

Loading

spocke commented Dec 15, 2017

gziolo commented Dec 15, 2017

aduth commented Dec 18, 2017

spocke commented Dec 19, 2017

aduth commented Jun 27, 2018

aduth commented Jun 28, 2018 •

edited

Loading

Editable: separate out content ops and switch to using tinymce tree output #3713

Editable: separate out content ops and switch to using tinymce tree output #3713

Conversation

spocke commented Nov 29, 2017

Description

How Has This Been Tested?

Screenshots (jpeg or gifs if applicable):

Types of changes

Checklist:

codecov bot commented Nov 29, 2017 • edited Loading

Codecov Report

youknowriad left a comment

Choose a reason for hiding this comment

youknowriad Dec 1, 2017

Choose a reason for hiding this comment

spocke Dec 4, 2017 • edited Loading

Choose a reason for hiding this comment

gziolo Dec 14, 2017

Choose a reason for hiding this comment

spocke Dec 15, 2017

Choose a reason for hiding this comment

youknowriad Dec 1, 2017

Choose a reason for hiding this comment

spocke Dec 4, 2017

Choose a reason for hiding this comment

gziolo Dec 14, 2017

Choose a reason for hiding this comment

aduth commented Dec 4, 2017

spocke commented Dec 4, 2017

Afraithe commented Dec 5, 2017

gziolo commented Dec 14, 2017 • edited Loading

spocke commented Dec 15, 2017

gziolo commented Dec 15, 2017

aduth commented Dec 18, 2017

spocke commented Dec 19, 2017

aduth commented Jun 27, 2018

aduth commented Jun 28, 2018 • edited Loading

codecov bot commented Nov 29, 2017 •

edited

Loading

spocke Dec 4, 2017 •

edited

Loading

gziolo commented Dec 14, 2017 •

edited

Loading

aduth commented Jun 28, 2018 •

edited

Loading