Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selectolax hangs because of bad CSS selector #36

Closed
Thewildweb opened this issue Apr 6, 2021 · 12 comments
Closed

Selectolax hangs because of bad CSS selector #36

Thewildweb opened this issue Apr 6, 2021 · 12 comments
Labels

Comments

@Thewildweb
Copy link

Hi,

first thanks for selectolax, it helps me greatly.

If I give selectolax a bad css selector like "span[itemprop='example" it hangs indefinitely. It would be great if that will raise an Exception.

If you need some help with the project, I could see if I can free some time

@rushter rushter added the bug label Apr 7, 2021
@BarryThrill
Copy link
Contributor

BarryThrill commented Apr 9, 2021

Hello @Thewildweb !

Just curious. I tried to reproduce the issue you had by doing:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import requests
from selectolax.parser import HTMLParser

response = requests.get("https://edition.cnn.com/")
bs4 = HTMLParser(response.text)

test= bs4.css_first("span[itemprop='example")
print(test)

and the return was:

  File "selectolax\parser.pyx", line 101, in selectolax.parser.HTMLParser.css_first
  File "selectolax\node.pxi", line 458, in selectolax.parser.Node.css_first
  File "selectolax\node.pxi", line 441, in selectolax.parser.Node.css
  File "selectolax\selector.pxi", line 16, in selectolax.parser.Selector.__init__
  File "selectolax\selector.pxi", line 57, in selectolax.parser.Selector._prepare_selector
ValueError: Bad CSS Selectors: span[itemprop='example

Also if you actually do write a correct example:

bs4.css_first("span[itemprop='example'")

None

Im not quite sure how you were able to produce this bug but it is very intersting for me to know as I might had a similar issue. Wasn't sure if its related to the same thing however.

@Thewildweb
Copy link
Author

Thewildweb commented Apr 14, 2021

Hi, sorry for the late reply. Here is an exaple.

import requests
from selectolax.parser import HTMLParser

resp = requests.get("https://www.python.org/")
tree = HTMLParser(resp.text)

# bad css selector 'href' between quotes
a_hrefs = tree.css("a['href']")
# now it hangs indefinitly

@BarryThrill
Copy link
Contributor

Hi, sorry for the late reply. Here is an exaple.

import requests
from selectolax.parser import HTMLParser

resp = requests.get("https://www.python.org/")
tree = HTMLParser(resp.text)

# bad css selector 'href' between quotes
a_hrefs = tree.css("a['href']")
# now it hangs indefinitly

I see! That's probably due to incorrect css selector. Could agree that it is not good and should trigger a exception, there is a way to avoid that by doing

a_hrefs = tree.css('a[href^=""')

even though im not expert with css selectors :D

@rushter
Copy link
Owner

rushter commented Apr 14, 2021

We need to fix this in Modest.
It hangs when parsing a CSS selector.

https://github.com/lexborisov/Modest/blob/393338d994c921705ff71dfbd1d98ceb31328f14/source/mycss/selectors/init.c#L153

@BarryThrill
Copy link
Contributor

BarryThrill commented Apr 14, 2021

https://github.com/lexbor/lexbor - I assume Modest wont be prio anymore? Seems like the dev is working on something new. Have you seen it?

@rushter
Copy link
Owner

rushter commented Apr 14, 2021

https://github.com/lexbor/lexbor - I assume Modest wont be prio anymore? Seems like the dev is working on something new. Have you seen it?

It's a faster engine, but it does not support everything that we need yet. For example, CSS engine can only parse queries. It can't execute them yet.

Does it only happen when using CSS selector?

It hangs on some malformed examples, but I don't know why this happens. I'm not very familiar with the source code of Modest.

@BarryThrill
Copy link
Contributor

https://github.com/lexbor/lexbor - I assume Modest wont be prio anymore? Seems like the dev is working on something new. Have you seen it?

It's a faster engine, but it does not support everything that we need yet. For example, CSS engine can only parse queries. It can't execute them yet.

Does it only happen when using CSS selector?

It hangs on some malformed examples, but I don't know why this happens. I'm not very familiar with the source code of Modest.

Thank you for the information :) I see. Maybe soon it will support it but even though this is already pretty fast so excited to see if it can be even faster;

@Thewildweb
Copy link
Author

It is not a huge issue. I thought it would be nice to throw an exception if it was an easy job.

A, even faster parser would be awesome. I'm coming from bs4, so selectolax feels instant...

@lexborisov
Copy link

Hi,

I can deal with this tomorrow lexborisov/Modest#84.

@lexborisov
Copy link

lexborisov commented Apr 16, 2021

@rushter

Seems to have fixed in Modest.

@rushter
Copy link
Owner

rushter commented Apr 16, 2021

@lexborisov Thanks!

@BarryThrill
Copy link
Contributor

https://github.com/lexbor/lexbor - I assume Modest wont be prio anymore? Seems like the dev is working on something new. Have you seen it?

It's a faster engine, but it does not support everything that we need yet. For example, CSS engine can only parse queries. It can't execute them yet.

Does it only happen when using CSS selector?

It hangs on some malformed examples, but I don't know why this happens. I'm not very familiar with the source code of Modest.

Hi man! Just got a comment about new update!

lexbor/lexbor#96 (comment)

Is that something you plan to add to selectolax? 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants