Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle whitespaces #2651

Closed
Reinmar opened this issue Oct 26, 2016 · 12 comments
Closed

Handle whitespaces #2651

Reinmar opened this issue Oct 26, 2016 · 12 comments
Labels
package:clipboard resolution:expired This issue was closed due to lack of feedback. status:stale type:improvement This issue reports a possible enhancement of an existing feature.

Comments

@Reinmar
Copy link
Member

Reinmar commented Oct 26, 2016

Edit: I decided to generalise this ticket to handling copied and pasted whitespaces. See my 2nd comment to understand better the scope. In general, we need to:

  1. Make sure that when pasting into the editor (content from outside or from the editor), then we shouldn't pollute the content with hundreds of nbsps (the usual Chrome's behaviour).
  2. Try not to lose nbsps when copying and pasting within the editor.
  3. The solution that we'll find for copying content from the editor should make sure that this content can be pasted into some other place (so it can't be too editor specific).

The problem is that here we start dealing with various technical limitations and browser quirks.


Webkit&Blink have terrible nature to put <span classs="Apple-converted-space"> </span> in many places in the clipboard content. They replace what was a normal space in the original content, polluting the content in a result.

We're currently stripping those spans using a stupid regexp, but yeah... it's stupid. We're also replacing singular &nbsps with normal spaces in order to fix the most popular case when there's only a single space inside such span.

However, something like this would be nicer:

/**
 * @license Copyright (c) 2003-2016, CKSource - Frederico Knabben. All rights reserved.
 * For licensing, see LICENSE.md.
 */

import Range from '../../engine/view/range.js';

/**
 * Removes some popular browser quirks out of the clipboard data (HTML).
 *
 * @param {engine.view.DocumentFragment} data The HTML data to normalize.
 * @returns {String} Normalized HTML.
 */
export default function normalizeClipboardData( data ) {
    const range = Range.createIn( data );
    const spans = [];

    for ( const item of range.getItems( { ignoreElementEnd: true } ) ) {
        if ( item.name == 'span' && item.hasClass( 'Apple-converted-space' ) ) {
            spans.push( item );
        }
    }

    spans.forEach( span => {
        const text = span.getChild( 0 );

        if ( text && text.data ) {
            text.data = text.data.replace( /\s/g, ' ' );
        }
    } );

    return data;
}

At the view stage we can freely replace every space inside those spans with a normal space, without breaking series of them (because in the view they are not encoded).

Unfortunately, the above code doesn't work because upon conversion to view spans are emptied ;|. This looks like some bug in the engine, but I don't have time to debug it now.

@Reinmar
Copy link
Member Author

Reinmar commented Oct 27, 2016

Damn... this is tricky, because the DOM -> view converter has here a really tricky job to do.

Simple case:

  1. you're pasting a single space,
  2. in the clipboard it's &nbsp; or... e.g. <h2> </h2> or <em><span class="A-c-s">&nbsp;</span></em> or something equally crazy,
  3. on conversion to the view you want this to become a normal space.

I'm thinking now that every nbsp in a <span class="A-c-s"> should be converted into a normal space and that we need an option in the DOM -> view converter to always preserve such spaces. That's because we're not able to tell what was supposed to be a normal space and what not in the original content. So it's better to lose some nbsps when pasting from outside of the editor than to insert too many nbsps.

The situation when copy-pasting within an editor is a bit different, because we could tell that the HTML in the clipboard exactly represents the data that we have. Unfortunately, that wouldn't work when copying from the editor and pasting somewhere else ;|.

Does it mean that we should mark our spaces as well? And write in a data- attribute what that was supposed to be? :D I really don't see a different option. If we don't want to lose nbsps which were supposed to be nbsps, then such a solution could be the only one we'll have.

@Reinmar Reinmar changed the title Handle Apple-converted-space spans Handle whitespaces upon paste Oct 27, 2016
@Reinmar Reinmar changed the title Handle whitespaces upon paste Handle whitespaces Oct 27, 2016
@fredck
Copy link
Contributor

fredck commented Oct 28, 2016

If we don't want to lose nbsps which were supposed to be nbsps

What's is this case? A nbsp in the model? If yes, then I think that mark them is not a big deal, after all, we should not expect to have too many of them.

@Reinmar
Copy link
Member Author

Reinmar commented Oct 28, 2016

What's is this case? A nbsp in the model? If yes, then I think that mark them is not a big deal, after all, we should not expect to have too many of them.

Yes, that's the case.

And I agree that it shouldn't be a big deal. I imagine that on view -> DOM conversion we'd wrap them with spans and that would allow to preserve them on DOM -> view conversion.

Initially, I thought that we'd also need to write in a data- attribute of such a span what was the original content of that span, but then I've been thinking about wrapping every space (which doesn't make much sense). If we'd wrap only &nbsp; then we should be fine.

@fredck
Copy link
Contributor

fredck commented Oct 28, 2016

I'm good to go with that in that way... actually, I always imagined we would do so, because I don't see another way, as well :)

@Reinmar
Copy link
Member Author

Reinmar commented Oct 28, 2016

Still, I'm worried a lot about content pasted from outside, in which we won't know what was supposed to be an &nbps; and in which you've got hundreds of them. I guess there are two options:

  • We ignore those cases and convert everything that we find in the clipboard into a normal view space (so space which will be preserved by the editor). This is tricky, cause replacing <span class="A-c-s"> must then be done in a context of what's around and how it's going to be understood by the DOM -> view converter. I can't imagine doing this inside the DOM -> view converter because it shouldn't be worried about such things.
  • We ignore this even more and convert every <span class="A-c-s"> into normal DOM spaces and agree that multiple ones will be turned into single ones. Of course, this would happen only when pasting from outside of the editor. This is drastic, but very simple to implement. May be a good first step.

@Reinmar
Copy link
Member Author

Reinmar commented Jun 22, 2017

I have a feeling that something's changed in Chrome's behaviour and I also see that we may have some problem with losing spaces.

When I copy this content:

image

I get this results:

image

The spaces in the "paste" section (which represents a content from the native data transfer) are &nbsp;s of course (it's Chrome after all...), but it's strange that in the next step there are no spaces at all.

Perhaps I forgot something, but I don't recall us cleaning up these spaces which means that we may be losing them somewhere.

I guess it'd be good to check that as pasting content into the editor works really bad now in Chrome. I think it worked much better a few months ago and this may be the missing Apple-converted-space spans.

E.g. this is the result in Safari which still produces them:

image

@Reinmar Reinmar self-assigned this Jul 3, 2017
@Reinmar
Copy link
Member Author

Reinmar commented Jul 3, 2017

OK, it's definitely something in the engine, because this test fails:

			it.only( 'does not lose whitespaces in Chrome\'s paste-like content', () => {
				const fragment = dataProcessor.toView(
					'<meta charset=\'utf-8\'>' +
					'<span>This is the<span>\u00a0</span></span>' +
					'<a href="url">third developer preview</a>' +
					'<span><span>\u00a0</span>of<span>\u00a0</span></span>' +
					'<strong>CKEditor\u00a05</strong>' +
					'<span>.</span>'
				);

				expect( stringify( fragment ) ).to.equal(
					'<span>This is the<span>\u00a0</span></span>' +
					'<a href="url">third developer preview</a>' +
					'<span><span>\u00a0</span>of<span>\u00a0</span></span>' +
					'<strong>CKEditor\u00a05</strong>' +
					'<span>.</span>'
				);

				// Just to be sure... stringify() uses conversion and the browser extensively,
				// so it's not entirely safe.
				expect( fragment.getChild( 0 ).getChild( 1 ).getChild( 0 ).data ).to.equal( '\u00a0' );
				expect( fragment.getChild( 2 ).getChild( 0 ).getChild( 0 ).data ).to.equal( '\u00a0' );
				expect( fragment.getChild( 2 ).getChild( 2 ).getChild( 0 ).data ).to.equal( '\u00a0' );
			} );

But I can't write a view-level test which would prove this ;| Strangely, they pass.

@Reinmar
Copy link
Member Author

Reinmar commented Jul 3, 2017

Actually, we have two issues here:

  1. That the spaces are lost on DOM to view conversion. This is something which I should look into just to be sure what's broken.

  2. But, most importantly, we also need to handle <span>\u00a0</span> somehow. Otherwise, after I'd fix the issue described in 1. we'd have only non-breaking spaces after pasting some content in Chrome.

    So far, we've been converting <span class="Apple-converted-space">\u00a0</span> straight to a normal space. That worked in the past on Chrome and works now in Safari. However, Chrome doesn't produce that class any more so we're left with noname spans. We pass them to the view and lose those spaces somewhere. However, we don't want them to become \u00a0 in the content becase, as I said, this leads to having the entire content polluted with non breaking spaces . So, the fix here will be to treat spans with no attribute just like those with Apple-converted-space.

@Reinmar
Copy link
Member Author

Reinmar commented Jul 3, 2017

That the spaces are lost on DOM to view conversion. This is something which I should look into just to be sure what's broken.

This turned out to be https://github.com/ckeditor/ckeditor5-engine/issues/404. Not something that we can quickly fix so I'm not going to work on that now. However, as the case in that engine's ticket shows – this issue may lead to losing spaces in some situations. E.g., when something like this was pasted: <p>foo<b>&nbsp;</b>bar</p>.

@Reinmar Reinmar removed their assignment Jul 3, 2017
@Reinmar
Copy link
Member Author

Reinmar commented Jul 3, 2017

@mlewand mlewand transferred this issue from ckeditor/ckeditor5-clipboard Oct 9, 2019
@mlewand mlewand added this to the unknown milestone Oct 9, 2019
@mlewand mlewand added status:confirmed type:improvement This issue reports a possible enhancement of an existing feature. package:clipboard labels Oct 9, 2019
@pomek pomek removed this from the unknown milestone Feb 21, 2022
@CKEditorBot
Copy link
Collaborator

There has been no activity on this issue for the past year. We've marked it as stale and will close it in 30 days. We understand it may be relevant, so if you're interested in the solution, leave a comment or reaction under this issue.

@CKEditorBot
Copy link
Collaborator

We've closed your issue due to inactivity over the last year. We understand that the issue may still be relevant. If so, feel free to open a new one (and link this issue to it).

@CKEditorBot CKEditorBot added the resolution:expired This issue was closed due to lack of feedback. label Nov 2, 2023
@CKEditorBot CKEditorBot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package:clipboard resolution:expired This issue was closed due to lack of feedback. status:stale type:improvement This issue reports a possible enhancement of an existing feature.
Projects
None yet
Development

No branches or pull requests

5 participants