Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keeps line and indentation on remove() #86

Closed
sindresorhus opened this issue Aug 17, 2012 · 8 comments
Closed

Keeps line and indentation on remove() #86

sindresorhus opened this issue Aug 17, 2012 · 8 comments

Comments

@sindresorhus
Copy link
Contributor

When I remove an indented element that makes up the whole line it preserves the line and the indentation. This makes a document with many remove() look dirty, since it's filled with indented whitespace on empty lines.

    <span>lorem ipsum</span>

It should IMO remove the line if all the contents is gone.

@sindresorhus
Copy link
Contributor Author

We use Cheerio in Yeoman to do HTML manipulations in our scaffolder and it's great, but because of this, it leaves a lot of empty lines and trailing whitespace, which is annoying to the end-user. Hopefully this can be fixed soon :)

@matthewmueller
Copy link
Member

I'm not sure I understand the example, but a while ago we removed the tidying features. IMO it was feature creep and should be left to a tidy library.

@sindresorhus
Copy link
Contributor Author

If I remove() the div#test

<div>
    <div id="test">dsf</div>
</div>

The resulting HTML is (with trailing whitespace):

<div>

</div>

The resulting HTML should be:

<div>
</div>

Or even

<div></div>

In this situation.

@ironchefpython
Copy link

<div>
    <div id="test">dsf</div>
</div>

This is a div element with three children. A textnode containing a newline and 4 spaces, a div element with an id of "test", and a textnode containing a newline. Removing the div child element does not (and should not) remove the text nodes.

I do understand what you want, you want the HTML resulting from Cheerio to be reformatted accoring to your preferred style. However this is not (currently) a functional goal of Cheerio, and is functionality that can best be achieved by processing the output of Cheerio with another function. I would recommend the mature and stable js-beautify) for HTML post-processing. It provides a number of options to format HTML to your standards

@sindresorhus
Copy link
Contributor Author

Ok, didn't think Cheerio concerned itself about textnodes.

I do however think that Cheerio should have an option in $.html() or something to run the html-beautify. I can't think of any scenario where I would want trailing spaces left in the source.

@matthewmueller
Copy link
Member

Right, but most node modules use cheerio to do screen scraping, where content, not source, is most important.

From a quick look at js-beautify and node-beautifier, it looks to be as simple as:

var html = $.html(),
    beauty = beautify.html_beautify;

html = beauty(html);

Would be super simple to add to yeoman, and it would do a better job than something we hack together for cheerio.

Closing this issue, unless there becomes a more compelling reason to add a tidy.

@sindresorhus
Copy link
Contributor Author

Not saying it would be hard to add to Yeoman, obviously it's not, just would be a nice convenience in Cheerio, not having to evaluate the options, add as dep, import it, look up the API and the finally beautify, but whatever.

@hanjo
Copy link

hanjo commented Aug 11, 2022

While this is quite an old issue, I just came across this facing the same situation. Beautification is obviously a possible way, but I didn't like the overhead, so I came up with the following solution, which is removing empty (whitespace only) lines. It is doing so by iterating over all nodes, selecting the text nodes and removing it if it only contains white spaces. Maybe it is helpful to anybody else:

$("something").contents().filter(function() {
	var ns = this.nextSibling;
	if(ns != null) {
		return this.nodeType === 3 && ns.nodeType === 3 && /^\s+$/.test(this.nodeValue); // Node.TEXT_NODE
	}
	return false;
}).remove();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants