Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rich Text: Formalize RichText children value abstraction #7934

Merged
merged 3 commits into from
Jul 20, 2018

Conversation

aduth
Copy link
Member

@aduth aduth commented Jul 12, 2018

Related: #7476 , #7890

This pull request seeks to work toward detaching the dependency between RichText value and the @wordpress/element React element shape. It formalizes what we already offer in the children source as a proper value shape of its own.

Implementation notes:

Some observations made in the implementation herein:

  • "Element" is an implementation detail. The only requirements of our RichText implementation is being able to source from and write to HTML. Representing the value as an object shape (vs. a plain markup string) has been the subject of much past discussion (Try: Block API: Remove children source in favor of HTML string #5380, Editable: Treat value as HTML string, eliminate children source #2750) where the conclusion is that we want some of the benefits which could be offered through element (e.g. safety, content analysis) but it does not strictly need to be element itself. Our current need for manipulation is concatenation (merging), which can be offered through an abstraction native to the "children" type.
    • Our need for dom-react hinges on this fact that we expect to need to convert a DOM node back to an element for serialization. But our custom serializer (Framework: Use custom serializer for texturize compatibility #5897) is more tolerant to non-"element" props (e.g. class, style). It is fine if this becomes non-true in the future, so long as we have the abstractions in place for children and node source types to work from and to the HTML.
  • This does not intend to propose any specific object shape for the RichText value. Initial explorations here did, and they introduced needless complexity, when the most critical piece is providing the consistent abstraction. I'll leave value shape to efforts like in RichText state structure for value manipulation #7890 (issue at RichText State Structure #771).
  • The implementation here includes some more stability:
    • Concatenating two children types consisting only of strings has a more consistent result...
      • In master, we produce [ 'First paragraph', 'Second paragraph' ] where here we generate the simpler and consistent [ 'First paragraphSecond paragraph' ]
    • Empty node deletion traverses deeply, resolving more issues with zero-width space character presence (see included E2E test case)
  • I wasn't entirely sure about RichText manually crafting its own BlockChildren objects, vs. something like with dom-react in passing a map/reduce callback (or object of "visitors") into the block API itself.
    • Many complications:
      • We may need to clean either TEXT_NODE and ELEMENT_NODE
      • We may need to omit nodes altogether
      • Doing so in a way that is performant (i.e. single pass)
    • Since this value is aimed at RichText, it seemed reasonable that it is fully aware of the object shape

Testing instructions:

Verify there are no regressions in the display, manipulation (splitting, merging, transforming), and save+reload of blocks leveraging RichText (e.g. paragraph, quote). A particular focus should be made on areas of low test coverage (e.g. block transforms).

@aduth aduth added [Feature] Block API API that allows to express the block paradigm. Framework Issues related to broader framework topics, especially as it relates to javascript [Feature] Rich Text Related to the Rich Text component that allows developers to render a contenteditable labels Jul 12, 2018
@@ -618,6 +626,11 @@ export function renderAttributes( props ) {
* @return {string} Style attribute value.
*/
export function renderStyle( style ) {
// Only generate from object, e.g. tolerate string value.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was conflicted about this change, since it's largely aimed at the use-case being proposed here with children sourcing from DOM. In general, our custom serializer is much more tolerant to things which React would otherwise barf about. This has made the implementation slightly simpler in some regards too (though arguably not in this instance). Here, I saw no reason why style must be an object, toward a goal where the fallback behavior is to always assign an attribute as given (and if it's given as a string, great: write it as-is). Other similar projects seem tolerant as well, e.g. preact-render-to-string, which happily handles the string style. The main downside is a mixed expectation of what could be passed to renderToString vs. render alone.

If it's an issue, there's an alternative: We can detect the string from the child API's fromDOM and parse it back into an object.

@aduth aduth requested a review from a team July 16, 2018 14:36
@gziolo gziolo requested a review from hypest July 16, 2018 14:38
@hypest
Copy link
Contributor

hypest commented Jul 16, 2018

... we want some of the benefits which could be offered through element (e.g. safety, content analysis) but it does not strictly need to be element itself. Our current need for manipulation is concatenation (merging), which can be offered through an abstraction native to the "children" type.

This is most probably a naive question, stemming from lack of insight: Could we perhaps move the analysis step closer to the operations that need it? Like, instead of always keeping an already analyzed tree in memory (and in state), only keep the serialized html and produce the tree on the spot, when merging needs to happen.

@aduth
Copy link
Member Author

aduth commented Jul 16, 2018

Like, instead of always keeping an already analyzed tree in memory (and in state)

What do you have in mind as representing the analyzed tree? Would something like what's being explored in #7890 satisfy what you're talking about?

@gziolo
Copy link
Member

gziolo commented Jul 16, 2018

Aztec takes HTML as an input (similar to TinyMCE) and also generates HTML as an output (TinyMCE is different). I think, question is if we could use HTML string to store value and parse it to the proposed format, when doing all transformations and convertions.

@youknowriad
Copy link
Contributor

While I see the value of this abstraction/type and I like how this makes the distinction between the element abstraction and the rich text value format, I have a question, could we guarantee that the abstraction will cover all the use-cases of transforming the rich text value?

My hunch is "no". For example we need to transform li to p when transforming from lists to paragraphs or get only the content of li etc... So I still think we need to formalize the shape of the value somehow and not treat it as a black box.

@youknowriad
Copy link
Contributor

(It looks like there's unrelated changes in the diff, probably need a rebase or something)

@hypest
Copy link
Contributor

hypest commented Jul 17, 2018

What do you have in mind as representing the analyzed tree?

Nothing in particular @aduth , I'm still trying to understand if the in-memory object tree is just a nice-to-have (but super useful) that enables some features (like merging) or it is perhaps super-needed for the baseline functionality Gutenberg offers.

If nice-to-have, perhaps it would make sense to only perform the conversion on-demand, and keep passing/saving html from/to the state.

Besides, as @gziolo mentioned in his comment, there's an incentive to avoid requiring Aztec (the current native mobile editor) to expose an object model instead of just html. If we can find a way to continue using html, at least as the interface deeper in the RichText component, it will be welcomed for this first attempt at integrating Aztec.

@aduth
Copy link
Member Author

aduth commented Jul 17, 2018

I think I should reiterate that this pull request does not intend to express any proposal for the presence of or a particular shape of the RichText value. It primarily seeks to stop presenting it as though it were the @wordpress/element value.

I'm still trying to understand if the in-memory object tree is just a nice-to-have (but super useful) that enables some features (like merging) or it is perhaps super-needed for the baseline functionality Gutenberg offers.

There's substantial context / prior discussion for this in #5380 and #2750.

There's nothing that technically forbids us from tracking the state value as an HTML string, but for all of the reasons mentioned previously (operations upon state value primarily), the abstraction has stuck. Performance is also a very real consideration, as I would expect that on-demand transformations would be very costly.

I'm open to reconsidering #2750, but we need movement now, and I'd still consider this to be an incremental step forward.

@aduth aduth added this to the 3.3 milestone Jul 18, 2018
@tofumatt
Copy link
Member

Looks like there's still a lot of stuff in this branch that is unrelated, which makes it tough to review. Maybe it needs another rebase?

@aduth
Copy link
Member Author

aduth commented Jul 19, 2018

@tofumatt It's not a small pull request in its own right, but yes, there was something awry with a previous rebase that pulled in some unrelated commits unintentionally. Should be cleared up now.

/**
* Internal dependencies
*/
import * as node from './node';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there's a cyclic dependency between node and children, Should we just merge them in the same file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there's a cyclic dependency between node and children, Should we just merge them in the same file?

Well-spotted. There is, though Webpack seems to handle it without issue. I personally found some elegance in having independent files with overlapping APIs (matcher, fromDOM, toHTML) mapping one-to-one with the offered source type. I don't feel too strongly about it.

try {
result.push( node.fromDOM( domNodes[ i ] ) );
} catch ( error ) {
// Simply ignore if DOM node could not be converted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can this happen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can this happen?

The child matcher throws on anything other than nodeType of TEXT_NODE or ELEMENT_NODE, so if a block uses a children selector on an element which includes a comment node, for example, this would prevent its inclusion.

<div class="my-block"><!-- foo -->hello world</div>
{
  "type": "array",
  "source": "children",
  "selector": "div"
},

https://developer.mozilla.org/en-US/docs/Web/API/Node/nodeType#Constants

Copy link
Contributor

@youknowriad youknowriad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this, it works well. I also like that it includes the shape simplification from #7476

Let's get this in, it's definitely an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Block API API that allows to express the block paradigm. [Feature] Rich Text Related to the Rich Text component that allows developers to render a contenteditable Framework Issues related to broader framework topics, especially as it relates to javascript
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants