Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in regex used to detect robots noindex directive in page header #110

Closed
cicirello opened this issue Oct 5, 2023 · 0 comments · Fixed by #109
Closed

Bug in regex used to detect robots noindex directive in page header #110

cicirello opened this issue Oct 5, 2023 · 0 comments · Fixed by #109
Labels
bug Something isn't working

Comments

@cicirello
Copy link
Owner

Summary

The current regular expression used to detect if there is a meta tag in the page header with a robots noindex directive (e.g., to exclude such pages from the sitemap) has a potential bug. \s* is used in a couple places to account for sequences of space characters. However, it is not being passed through to Python's regular expression processor, and instead being detected as an invalid escape sequence in the string. Need to escape the \. Revealed when upgrading to Python 3.12, which gives a warning. Earlier versions of Python not warning on this, although behavior appears to be correct. Not entirely sure why. But should fix this none-the-less.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant