Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSS file of 320 bytes takes 10+ seconds to parse #18

Open
vdmit opened this issue Nov 2, 2021 · 4 comments
Open

CSS file of 320 bytes takes 10+ seconds to parse #18

vdmit opened this issue Nov 2, 2021 · 4 comments
Labels
help wanted Extra attention is needed

Comments

@vdmit
Copy link

vdmit commented Nov 2, 2021

I've encountered CSS file which cssutils library takes lots of seconds to parse (in fact, almost infinite).
After some investigation I've minified problematic sample (as possible) and prepared reproducible code example.
On my laptop (core i7-8550U) it takes 14 seconds to parse file with just one line and 320 bytes of text.

Original file has 20kBytes of text and parser does not finish it in several hours.

Code:
https://gist.github.com/vdmit/ef9007170fa1c616cf5aba1fcebfce87
I'm using:

  • latest stable cssutils release 2.3.0 from pypi repo
  • python 3.8.10
  • Ubuntu 20.04 linux distribution.

Notes:

  • original CSS file was malformed and even contained non-printable characters. My minified example is still not valid CSS file, but it is a plain ANSI file.
  • Adding line break in any place fixes problem;
  • Removing of last style expression (.s11... reduces execution time from 14 seconds to 3 seconds).
  • Adding of one more style expression increases execution time from 14 seconds to 50, 2 more styles yields ~260 seconds.

So, there is a exponential complexity somewhere, which is strange.

Traceback of interrupted script is following:

Traceback (most recent call last):
  File "cssutils_infinite_loop_bug_example.py", line 10, in <module>
    sheet = css_parser.parseString(cssText=css_text)
  File "/home/vdmit/.local/lib/python3.8/site-packages/cssutils/parse.py", line 147, in parseString
    sheet._setCssTextWithEncodingOverride(
  File "/home/vdmit/.local/lib/python3.8/site-packages/cssutils/css/cssstylesheet.py", line 408, in _setCssTextWithEncodingOverride
    self.cssText = cssText
  File "/home/vdmit/.local/lib/python3.8/site-packages/cssutils/css/cssstylesheet.py", line 331, in _setCssText
    wellformed, expected = self._parse(
  File "/home/vdmit/.local/lib/python3.8/site-packages/cssutils/util.py", line 484, in _parse
    expected = p(expected, seq, token, tokenizer)
  File "/home/vdmit/.local/lib/python3.8/site-packages/cssutils/css/cssstylesheet.py", line 313, in ruleset
    rule.cssText = self._tokensupto2(tokenizer, token)
  File "/home/vdmit/.local/lib/python3.8/site-packages/cssutils/util.py", line 343, in _tokensupto2
    for token in tokenizer:
  File "/home/vdmit/.local/lib/python3.8/site-packages/cssutils/tokenize2.py", line 172, in tokenize
    match = matcher(text, pos)  # if no match try next production
KeyboardInterrupt
@jaraco jaraco added the help wanted Extra attention is needed label Feb 23, 2022
@jaraco
Copy link
Owner

jaraco commented Feb 23, 2022

Thanks for the report.

I welcome investigation or a patch.

@InconceivableVizzini
Copy link

No patch at the moment, but I believe this is related to backtracking in the regular expressions parsing strings, for the specific input shared. The lengthy call is located at

match = matcher(text, pos) # if no match try next production

When the input is ' and the matcher is related to the string production,

'string': r'{string1}|{string2}',

@InconceivableVizzini
Copy link

This can be fixed by rewriting the string productions (r"'(\\.|[^'])*'") or using an alternative regular expression library that does not rely on backtracking, such as re2, which may be used as a drop-in replacement of re.

@jaraco
Copy link
Owner

jaraco commented Jun 2, 2022

This can be fixed by rewriting the string productions (r"'(\\.|[^'])*'") or using an alternative regular expression library that does not rely on backtracking, such as re2, which may be used as a drop-in replacement of re.

I'd prefer to see a rewrite of the string productions, especially if it can be done in a fairly straightforward way and without adding to much complication to the approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants