Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text() on tree object not passing down "strip" parameter #35

Closed
phoerious opened this issue Mar 25, 2021 · 1 comment
Closed

text() on tree object not passing down "strip" parameter #35

phoerious opened this issue Mar 25, 2021 · 1 comment
Labels

Comments

@phoerious
Copy link
Contributor

phoerious commented Mar 25, 2021

When I call text(strip=True) on the root tree object, the strip parameter is not being passed on to the body tag object. Here's the code in parser.pyx:

    def text(self, bool deep=True, str separator='', bool strip=False):
        return self.body.text(deep=deep, separator=separator, strip=False)

So tree.text(strip=True) isn't working, but an explicit tree.body(strip=True) is.

Moreover, the whole behaviour of this parameter is somewhat wonky and unpredictable depending on where white space appears.

HTMLParser("<body><p>sfsdf\n\n\n\n          xxxx\n\n\n\n\n\n\n</p>aaa</body>\n\n\n\n").body.text(strip=True)

gives

'sfsdf\n\n\n\n          xxxxaaa'

where only trailing white space is clipped, whereas

HTMLParser("<body><p>sfsdf\n\n\n\n          \n\n\n\n\n\n\n</p>aaa</body>\n\n\n\n").body.text(strip=True)

gives

'sfsdfaaa'

which is more like what I'd expect, but it strips whitespaces completely and doesn't simply collapse it to a single space.

@rushter rushter added the bug label Apr 7, 2021
@rushter
Copy link
Owner

rushter commented Apr 16, 2021

Moreover, the whole behaviour of this parameter is somewhat wonky and unpredictable depending on where white space appears.

It behaves similarly to Python's strip, but for each node. It prevents from extra spaces for a lot of general cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants