Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RichText state structure for value manipulation #7890

Merged
merged 44 commits into from
Oct 1, 2018
Merged

Conversation

ellatrix
Copy link
Member

@ellatrix ellatrix commented Jul 11, 2018

Description

Read about the functions for manipulation.

Why a different RichText internal state structure?

Currently RichText values is internally structured as tree of HTML nodes that can map to the displayed live DOM. When the value needs to be manipulated, we are either looking at the live DOM or the internal array structure.

Current parsing and serialisation

Raw parsed blocks have a blob of HTML as their content, from which attributes are sourced according to the block’s attribute schema. This is done by parsing the HTML with the browser. For RichText values, the in memory DOM tree is then converted to an array structure of the same shape. In the RichText editor, the values are applied by setting the HTML (element.innerHTML). On Input, the live DOM is converted as well to an array structure. Any manipulation that is done by us happens by manipulation the live DOM directly, or by manipulating the array structure, which is then set as HTML in the editor.

Proposed parsing and serialisation: controlled component

Ideally, we’d like to avoid manipulating the live DOM directly, be it though TinyMCE APIs or not. It is both very fragile (prone to errors) and not so easy to do. Say you’d like to automatically format some text that matches @user, and complete it based on suggestion, you’d have to merge all text children of element nodes in the tree before the selection to be able to search, split elements and text nodes in order to format the matching text, and move selection using the Selection API. Formatting etc. is similarly complex within TinyMCE. See https://github.com/tinymce/tinymce/tree/master/src/core/main/ts/fmt. A tree shape is not so easy to manipulate, wether it is the live DOM or an array structure. A flat structure would be easier, where text and formats are separated. I’m not proposing an alternative format for saving or displaying the data. HTML remains perfect to store the value, and the DOM needs to be used, in the browser, to display the data in a contentEditable container.

The best structure I’ve found was one where the text and formats are separated. The text would be just a string, which becomes easy to search. The formats are stored in a sparse array of the same length as the text. Each index of the array maps to a character in the string, and formats are reference to a unique format variation. RichText values become very easy to manipulate like this, imagine for example splitting, merging, slicing, splicing, applying and removing formats. Empty checking is simply a matter of checking the string. Things like an empty HTML element and empty text nodes are not possible.

Example:

<p>one <em><a href="https://w.org">two</a></em></p>
// All formats are referenced, this is to illustrate.
const format1 = { name: 'italic' };
const format2 = { name: 'link', url: 'https://w.org' };

{
	formats: [
		,
		,
		,
		,
		[ format1, format2 ],
		[ format1, format2 ],
		[ format1, format2 ],
	],
	text: 'one two',
}

New formats could be registered with their own view:

<p>one <span class="mention">two</span></em></p>
const format1 = { name: 'mention' };
{
	formats: [
		,
		,
		,
		,
		[ format1 ],
		[ format1 ],
		[ format1 ],
	],
	text: 'one two',
}

We could also support some meta date here if we need to save extra information e.g. attributes or a user ID.

Because there is a textual representation, RichText instances could be searched and formatted easily, e.g. think about auto-formatting @mentions and marking pieces of text for spell checking or comments.

Another benefit is that there is only one right way to map this to HTML. Merging

const format1 = { name: 'italic' };

{
	formats: [
		[ format1 ],
		[ format1 ],
		[ format1 ],
	],
	text: 'one',
}
const format1 = { name: 'italic' };

{
	formats: [
		[ format1 ],
		[ format1 ],
		[ format1 ],
	],
	text: 'two',
}

will result in <em>onetwo</em>. Not <em>one</em><em>two</em>.

How has this been tested?

Gutenberg should load without errors. Test splitting and merging, formatting, inline tokens, text patterns, blocks transformations.

Screenshots

Types of changes

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code follows the accessibility standards.
  • My code has proper inline documentation.

@brandonpayton
Copy link
Member

Instead I'd like to propose an alternative structure where the text content and formatting are separated.

Thanks so much for working on this. We really need it. It would allow us to style autocompletions like @-mentions based on a pattern match rather than modifying the actual content like #6577 does. This would be cleaner, more flexible, and less error-prone.

Copy link
Member

@brandonpayton brandonpayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a really good direction if some pattern transforms can be shared across all RichText instances for things like @-mentions. In general, the changes felt like they were simplifying. I like how this is consolidating text-related logic in rich-text-structure.

@@ -157,6 +159,11 @@ export function parseWithAttributeSchema( innerHTML, attributeSchema ) {
*/
export function getBlockAttribute( attributeKey, attributeSchema, innerHTML, commentAttributes ) {
let value;

if ( attributeSchema.source === 'rich-text' ) {
attributeSchema.default = create();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth aliasing the create import with a more specific name to clarify what is being created.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can rename all this sure, atm it's maybe a bit less important while it's in progress.

} = settings;

return Array.from( element.attributes ).reduce( ( acc, { name, value } ) => {
if ( ! removeAttributeMatch( name ) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real, non-rhetorical question:
Would there be any value in providing more context like the element's name to removeAttributeMatch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, maybe. Atm the goal is just to be able to strip mce specific stuff.

formats.splice( start, end - start + 1 );
record = { formats, text };
} else {
// Delete from highest to lowest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be good to say why we're deleting from highest to lowest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if there are any poor comments, it's a WIP, this particular one will actually be removed again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the nitpicks. :) The intent was to share thoughts as I had them, but it wasn't super helpful for a WIP.

@@ -0,0 +1,271 @@
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests were very helpful in understanding how the rich-text-structure functions work. Thank you.

this.savedContent = this.getContent();
const record = this.getContent();

this.savedContent = this.patterns.reduce( ( accu, transform ) => transform( accu ), record );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that transforms due to matched patterns will be persisted in the post content? That seems clear here but is different than I've been mentally modeling this.

Here's my understanding of how we would apply this for @-mentions:

  • Adding formatting: Text matching a pattern like /@\w+/ would be formatted as an @-mention by wrapping it like <span class="user-mention">.
  • Removing formatting: A <span class="user-mention"> whose text content does not begin with an @ would have the formatting removed by replacing the span with its text content.

Is that roughly correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one currently is just for the patterns without the mentions in mind. They're a bit different. The patterns expect a piece of text to be transformed from one thing to another. The mentions merely expect some piece of text to be displayed differently (and maybe allow to have some meta info). I'll look at the mentions shortly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Thanks for clarifying.

@ellatrix
Copy link
Member Author

@brandonpayton I update the autocompleter to use the rich text structure. I think this simplifies it a lot, but it needs some more polishing for sure there. Also wondering if it should maybe be moved in the rich text folder...

Copy link
Member

@brandonpayton brandonpayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love these changes, and it's great to see the Autocomplete component simplified. Thank you.

Also wondering if it should maybe be moved in the rich text folder...

That seems like a good move to me. Autocomplete already had things that made it more editor-related than generic. Even before this PR, its implementation assumed wrapping something like a contenteditable and had the onReplace prop which implies use in a block context. IMHO, bringing it even closer to rich-text to work with a more consistent and reliable editor layer is a win.

inputEvent.initEvent( 'input', true /* bubbles */, false /* cancelable */ );
}
range.commonAncestorContainer.closest( '[contenteditable=true]' ).dispatchEvent( inputEvent );
replacement = replacement.slice( 1 + this.state.query.length );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to account for the prefix character that triggers the completer? If so, what do you think about multi-character triggers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's to replace the prefix and the already typed characters to minimise updates. So a multi character trigger can still work, in fact, there's no longer such a thing, you can complete customise what to match and how to match it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK. That helps.

There are a couple of assumptions here that may not hold for all completers.

This line relies on completions always finishing what is being typed (e.g., +doto being completed with +dotorg), but completions may be anything and aren't constrained to what the user typed. For example, a fruit completer could complete "!apple" with '🍎'. This means we need to replace the entire substring corresponding to the completer query rather than simply adding the part that isn't there.

It's also possible for completers to return wp.element declarations, so eventually this method needs to to account for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point, thank you!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The apple example seems to be something that closer to a pattern like `test`... I wonder if we can somehow unify all of these little functionalities.

Copy link
Member

@brandonpayton brandonpayton Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can somehow unify all of these little functionalities.

I like the direction you're going with the token configuration here. I'm not sure if the current completer interface will ultimately have a role here and need to sleep on it to give my mind some time to adjust. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible for completers to return wp.element declarations, so eventually this method needs to to account for that.

Since we're allowing matched tokens to be either selected from a list or typed in full, maybe we should reconsider allowing completions to be wp.element declarations. If we restrict to text completions, completions would be simpler, and there could be a single place to declare token formatting and a single execution path for applying that formatting.

if ( richTextStructure.isCollapsed( record ) ) {
const text = richTextStructure.getTextContent( record );
const prevText = richTextStructure.getTextContent( prevRecord );
const textToSearch = text.slice( 0, record.selection.start );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took some time to build and play with this PR and noticed typing "Hello, @-exampl" led to completer's test and getQuery functions being called with "Hello, @-exampl" instead of "@-exampl". This gives more power to completers, but I wonder whether it would be simpler for authors if we only passed "@-exampl" which is the text between the cursor and the nearest preceding space. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for sure, that sounds good. Not sure about keeping the trigger, let's see when it all works properly. Atm the autocomplete also pops up for something like text @so|omeone with an item for e.g. someone. We'll need to avoid this somehow. Unsure how this worked before. This PR touches so many components that I'm wondering if you'd be able to help out a bit with the autocomplete component, mainly making sure all current functionality is present?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm happy to help update the autocomplete component.

My family is moving in a little over a week, and I'll be taking a few days off then. Hopefully, that won't be an issue, but I'll try to keep a close eye on things and do what I can around that time.

@@ -58,7 +58,17 @@ export function createBlockCompleter( {
return {
name: 'blocks',
className: 'editor-autocompleters__block',
triggerPrefix: '/',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The power of the new test and getQuery functions is cool, but I'm wondering if it would be good to continue supporting a trigger property for people getting started or for those who don't need the power.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's leave the API alone until we work on the formatting API in the next PR. We should just focus on replacing the content change detection and manipulation logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be another breaking change. What we could instead as an enhancement to allow a function to be passed as triggerPrefix - this would be quite similar to what Webpack does with its options.

@ellatrix

This comment has been minimized.

@ellatrix

This comment has been minimized.

@ellatrix

This comment has been minimized.

@ellatrix ellatrix force-pushed the try/rich-text-record branch 2 times, most recently from 754bf06 to 91dbec9 Compare August 7, 2018 11:08
@ellatrix

This comment has been minimized.

@jasmussen

This comment has been minimized.

@jasmussen

This comment has been minimized.

@jasmussen

This comment has been minimized.

@jasmussen

This comment has been minimized.

@ellatrix

This comment has been minimized.

@ellatrix

This comment has been minimized.

@ellatrix ellatrix deleted the try/rich-text-record branch October 1, 2018 23:03
@youknowriad
Copy link
Contributor

Fantastic @iseulde

@tofumatt
Copy link
Member

tofumatt commented Oct 1, 2018

So good!

nyan

@gziolo
Copy link
Member

gziolo commented Oct 2, 2018

Love the changes added in 94352e7. Ace work on this PR @iseulde 👏 Thanks for being patient with all our feedback 😃

@hypest
Copy link
Contributor

hypest commented Oct 2, 2018

🎉 Thanks for the effort here @iseulde and thanks for "minding the mobile" and working closely with the mobile team! 🙇

@ellatrix ellatrix mentioned this pull request Oct 3, 2018
ellatrix added a commit that referenced this pull request Oct 3, 2018
"react-native": "src/index",
"dependencies": {
"@babel/runtime": "^7.0.0",
"@wordpress/escape-html": "file:../escape-html",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces uncommitted changes to the local branch after npm install. Please review #10234

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean package lock? See #10304.

ellatrix added a commit that referenced this pull request Oct 3, 2018
export { insert } from './insert';
export { slice } from './slice';
export { split } from './split';
export { apply, toDom as unstableToDom } from './to-dom';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktmn
Copy link

ktmn commented Oct 9, 2018

Quick question about this rich text, previously if I had a template like this:

<wp.editor.InnerBlocks template={[
	['core/paragraph', {
		content: ['Default content'],
	}],
]} />

Now with array it doesn't work, string does:

<wp.editor.InnerBlocks template={[
	['core/paragraph', {
		content: 'Default content',
	}],
]} />

And adding a line break, for example, worked like this:

<InnerBlocks template={[
    ['core/paragraph', {
        content: [
            'Hello',
            {
                type: 'br',
                props: {
                    children: [],
                },
            },
            'world',
        ],
    }],
]} />

How would I do that now? I tried content: wp.richText.create('Hello<br>world') but that was not it. Do I need like an object with formats and text or something?

The docs don't really say much, create( ?input: Element | string, ?range: Range, ?multilineTag: string, ?settings: Object ): Object, whats the range, what are the settings?

@youknowriad
Copy link
Contributor

@ktmn We're still in the process of ensuring the best upgrade experience without breakage for existing templates. See #10370

@ktmn
Copy link

ktmn commented Oct 10, 2018

@youknowriad Ok I see, thanks. Another thing that seems broken for me is when a block previously had an attribute

content: {
	type: 'array',
},

that is used in wp.editor.RichText then it throws

Do all the richtext attribute types need to be changed, and if so, to what type? And do they need a default value? I tried rendering wp.editor.RichText without any value prop and it still threw it, so does it need something from me?

Basically I shouldn't worry about current version of Gutenberg at all, it's supposed to be broken? I wanted to develop some blocks that I put aside waiting for the non-paragraph block inserter, now that it's merged I grabbed the latest version but I should have installed something earlier, this is completely WIP?

@youknowriad
Copy link
Contributor

Basically I shouldn't worry about current version of Gutenberg at all, it's supposed to be broken?

Short answer, yes, we're working on it. The migration path is not fully decided.

*
* @return {string} Escaped HTML element value.
*/
export function escapeHTML( value ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module is called escape-html but all it handles is specifically an attribute value? Isn't that a bit misleading?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also exports escapeHTML?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also exports escapeHTML?

I think I misinterpreted this to only be handling attribute values.


append( tree, '' );

for ( let i = 0; i < formatsLength; i++ ) {
Copy link
Member

@aduth aduth Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since formats is a sparse array, would it be worth considering using forEach to skip over the unformatted segments?

forEach calls callbackfn [...] only for elements of the array which actually exist; it is not called for missing elements of the array.

https://www.ecma-international.org/ecma-262/5.1/#sec-15.4.4.18

When a format is encountered and at the end of the text, we could simply append the unformatted text in a single go. This would also be much more performant since append is a non-trivial operation.

Benchmarks:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe we can do this. I'd have to explore it, so far didn't have the time. Formats that stay equal could also be appended in one go. Also note that appendData, not append, which probably doesn't make it any faster.

}

function createEmpty( type ) {
const { body } = document.implementation.createHTMLDocument( '' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has frequently turned up in performance profiling as an expensive operation, which on reflection is understandable given that it's creating an entirely new document. I think we should hold on to a constant reference of a document body and reset it each time we call createEmpty.

Benchmark: Creating a "fresh" body using innerHTML to reset is 646x faster than via document.implementation.createHTMLDocument

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I saw the other PR. We can do the same here.

const formatsLength = formats.length + 1;
const tree = createEmpty( tag );

append( tree, '' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we append an empty string?

@phpbits
Copy link
Contributor

phpbits commented May 23, 2019

@ellatrix I hope it's fine to ask you a question here. Can you point me to the right direction on adding custom markdown or patterns? Inline code markdown is registered on getPatterns in which I'm interested on extending.

I've also figured out that value.text on formats returns the current RichText contents but there's no way to manipulate it directly. If you could help me out, that'll be really great. Thanks!

@ellatrix
Copy link
Member Author

@phpbits Currently there's no way to do this as a plugin as we do it in core. We still need an API for this. :) That's why the code is not in the format library yet. Feel free to create a separate issue.

@phpbits
Copy link
Contributor

phpbits commented May 28, 2019

Thanks @ellatrix! I've finally figured it out after couple of days trying :) I've created my own component following what you already have on code formatting. Here's what I've got so far : https://twitter.com/phpbits/status/1132988500196773888 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Status] In Progress Tracking issues with work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.