Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node.attrs causes segfault and has buggy items() method #39

Closed
phoerious opened this issue Jul 9, 2021 · 0 comments · Fixed by #41
Closed

Node.attrs causes segfault and has buggy items() method #39

phoerious opened this issue Jul 9, 2021 · 0 comments · Fixed by #41

Comments

@phoerious
Copy link
Contributor

phoerious commented Jul 9, 2021

This issue is probably similar to #9, but was not resolved by the fix. Bug is present in version 0.2.12.

A direct call to Node.attrs results in a segmentation fault. Calling methods on the object is fine, but getting the dict representation directly crashes the Python interpreter. The same does not happen with Node.attributes.

Repro case:

>>> from selectolax.parser import HTMLParser
>>> HTMLParser('<body foo="bar">abc</body></html>').css('body')[0].attrs['foo']
'bar'
>>> HTMLParser('<body foo="bar">abc</body></html>').css('body')[0].attrs.get('foo')
'bar'
>>> list(HTMLParser('<body foo="bar">abc</body></html>').css('body')[0].attrs.values())
['bar']
>>> HTMLParser('<body foo="bar">abc</body></html>').css('body')[0].attrs
[1]    3625999 segmentation fault (core dumped)  python

Also, while keys() and values() work fine, items() is buggy:

>>> list(HTMLParser('<body foo="bar">abc</body></html>').css('body')[0].attrs.items())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "selectolax/node.pxi", line 91, in items
TypeError: 'selectolax.parser._Attributes' object is not callable
phoerious added a commit to phoerious/selectolax that referenced this issue Jul 10, 2021
When calling attrs on a temporary instance, the tree may already be
deallocated when the tag name is being retrieved, which causes a null pointer
dereference. This fix doesn't really solve the problem, but at least it
prevents the crash.

Also fixes a bug in items() that made the method useless.

Fixes rushter#39
rushter pushed a commit that referenced this issue Jul 10, 2021
* Apply `decode_errors` to encoding as well, fixes #40

* Fix __repr__() segfault and buggy items() on _Attributes

When calling attrs on a temporary instance, the tree may already be
deallocated when the tag name is being retrieved, which causes a null pointer
dereference. This fix doesn't really solve the problem, but at least it
prevents the crash.

Also fixes a bug in items() that made the method useless.

Fixes #39

* Add test for unencodable strings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant