Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ruff] re and regex calls with unraw string as first argument (RUF039) #14446

Merged
merged 11 commits into from
Nov 19, 2024

Conversation

InSyncWithFoo
Copy link
Contributor

Summary

Resolves #11167.

Test Plan

cargo nextest run and cargo insta test.

Copy link
Contributor

github-actions bot commented Nov 19, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+199 -0 violations, +0 -0 fixes in 14 projects; 40 projects unchanged)

RasaHQ/rasa (+10 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ rasa/utils/io.py:222:9: RUF039 First argument to `re.compile()` is not raw string
+ rasa/utils/io.py:223:9: RUF039 First argument to `re.compile()` is not raw string
+ rasa/utils/io.py:224:9: RUF039 First argument to `re.compile()` is not raw string
+ rasa/utils/io.py:225:9: RUF039 First argument to `re.compile()` is not raw string
+ rasa/utils/io.py:226:9: RUF039 First argument to `re.compile()` is not raw string
+ rasa/utils/io.py:227:9: RUF039 First argument to `re.compile()` is not raw string
+ rasa/utils/io.py:228:9: RUF039 First argument to `re.compile()` is not raw string
+ rasa/utils/io.py:229:9: RUF039 First argument to `re.compile()` is not raw string
+ rasa/utils/io.py:230:9: RUF039 First argument to `re.compile()` is not raw string
+ rasa/utils/io.py:231:9: RUF039 First argument to `re.compile()` is not raw string

apache/airflow (+36 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ dev/breeze/src/airflow_breeze/params/build_prod_params.py:81:21: RUF039 First argument to `re.match()` is not raw string
+ dev/breeze/src/airflow_breeze/utils/run_tests.py:109:19: RUF039 First argument to `re.sub()` is not raw string
+ dev/perf/dags/elastic_dag.py:73:19: RUF039 First argument to `re.sub()` is not raw string
+ docs/exts/docs_build/lint_checks.py:46:46: RUF039 First argument to `re.findall()` is not raw string
+ helm_tests/airflow_aux/test_pod_template_file.py:358:26: RUF039 First argument to `re.search()` is not raw string
+ helm_tests/airflow_aux/test_pod_template_file.py:370:26: RUF039 First argument to `re.search()` is not raw string
+ helm_tests/airflow_aux/test_pod_template_file.py:407:26: RUF039 First argument to `re.search()` is not raw string
+ helm_tests/airflow_aux/test_pod_template_file.py:59:26: RUF039 First argument to `re.search()` is not raw string
+ helm_tests/airflow_aux/test_pod_template_file.py:97:26: RUF039 First argument to `re.search()` is not raw string
+ providers/src/airflow/providers/apache/spark/hooks/spark_submit.py:602:35: RUF039 First argument to `re.search()` is not raw string
... 26 additional changes omitted for project

apache/superset (+71 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ RELEASING/changelog.py:276:26: RUF039 First argument to `re.match()` is not raw string
+ scripts/build_docker.py:66:23: RUF039 First argument to `re.sub()` is not raw string
+ scripts/build_docker.py:68:23: RUF039 First argument to `re.sub()` is not raw string
+ scripts/build_docker.py:70:23: RUF039 First argument to `re.sub()` is not raw string
+ scripts/build_docker.py:70:51: RUF039 First argument to `re.sub()` is not raw string
+ superset/db_engine_specs/athena.py:30:5: RUF039 First argument to `re.compile()` is not raw string
+ superset/db_engine_specs/bigquery.py:76:5: RUF039 First argument to `re.compile()` is not raw string
+ superset/db_engine_specs/bigquery.py:77:5: RUF039 First argument to `re.compile()` is not raw string
+ superset/db_engine_specs/bigquery.py:90:5: RUF039 First argument to `re.compile()` is not raw string
+ superset/db_engine_specs/denodo.py:29:46: RUF039 First argument to `re.compile()` is not raw string
+ superset/db_engine_specs/denodo.py:30:48: RUF039 First argument to `re.compile()` is not raw string
+ superset/db_engine_specs/denodo.py:32:9: RUF039 First argument to `re.compile()` is not raw string
+ superset/db_engine_specs/denodo.py:35:9: RUF039 First argument to `re.compile()` is not raw string
+ superset/db_engine_specs/denodo.py:37:46: RUF039 First argument to `re.compile()` is not raw string
+ superset/db_engine_specs/denodo.py:39:9: RUF039 First argument to `re.compile()` is not raw string
+ superset/db_engine_specs/denodo.py:41:43: RUF039 First argument to `re.compile()` is not raw string
+ superset/db_engine_specs/denodo.py:43:9: RUF039 First argument to `re.compile()` is not raw string
... 54 additional changes omitted for project

bokeh/bokeh (+9 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ src/bokeh/util/strings.py:91:19: RUF039 First argument to `re.sub()` is not raw string
+ src/bokeh/util/strings.py:92:19: RUF039 First argument to `re.sub()` is not raw string
+ tests/unit/bokeh/core/test_templates.py:48:19: RUF039 First argument to `re.sub()` is not raw bytes literal
+ tests/unit/bokeh/io/test_export.py:203:9: RUF039 First argument to `re.compile()` is not raw string
+ tests/unit/bokeh/io/test_export.py:204:13: RUF039 First argument to `re.compile()` is not raw string
+ tests/unit/bokeh/io/test_export.py:205:13: RUF039 First argument to `re.compile()` is not raw string
+ tests/unit/bokeh/io/test_export.py:206:9: RUF039 First argument to `re.compile()` is not raw string
+ tests/unit/bokeh/server/test_server__server.py:211:28: RUF039 First argument to `re.compile()` is not raw string
+ tests/unit/bokeh/server/test_server__server.py:219:36: RUF039 First argument to `re.compile()` is not raw string

ibis-project/ibis (+8 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ ibis/backends/__init__.py:1396:17: RUF039 First argument to `re.match()` is not raw string
+ ibis/backends/flink/__init__.py:320:17: RUF039 First argument to `re.search()` is not raw string
+ ibis/backends/sql/compilers/pyspark.py:362:27: RUF039 First argument to `re.sub()` is not raw string
+ ibis/backends/tests/test_client.py:1047:13: RUF039 First argument to `re.search()` is not raw string
+ ibis/backends/tests/test_client.py:1068:13: RUF039 First argument to `re.search()` is not raw string
+ ibis/common/tests/test_patterns.py:246:31: RUF039 First argument to `re.compile()` is not raw string
+ ibis/common/tests/test_patterns.py:246:65: RUF039 First argument to `re.compile()` is not raw string
+ ibis/tests/expr/test_selectors.py:123:39: RUF039 First argument to `re.compile()` is not raw string

latchbio/latch (+4 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ src/latch_cli/centromere/ctx.py:447:26: RUF039 First argument to `re.match()` is not raw string
+ src/latch_cli/services/init/init.py:309:18: RUF039 First argument to `re.search()` is not raw string
+ src/latch_cli/services/init/init.py:316:18: RUF039 First argument to `re.search()` is not raw string
+ src/latch_cli/services/register/register.py:59:36: RUF039 First argument to `re.compile()` is not raw string

lnbits/lnbits (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ lnbits/db.py:151:34: RUF039 First argument to `re.compile()` is not raw string

pandas-dev/pandas (+21 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ pandas/io/formats/excel.py:420:35: RUF039 First argument to `re.search()` is not raw string
+ pandas/io/formats/style_render.py:2509:28: RUF039 First argument to `re.findall()` is not raw string
+ pandas/io/formats/style_render.py:2511:28: RUF039 First argument to `re.findall()` is not raw string
+ pandas/io/formats/style_render.py:2514:32: RUF039 First argument to `re.findall()` is not raw string
+ pandas/io/formats/style_render.py:2516:32: RUF039 First argument to `re.findall()` is not raw string
+ pandas/tests/dtypes/test_inference.py:462:44: RUF039 First argument to `re.compile()` is not raw string
+ pandas/tests/dtypes/test_inference.py:473:43: RUF039 First argument to `re.compile()` is not raw string
+ pandas/tests/extension/test_arrow.py:1788:25: RUF039 First argument to `re.compile()` is not raw string
+ pandas/tests/frame/methods/test_replace.py:1362:28: RUF039 First argument to `re.compile()` is not raw string
+ pandas/tests/indexes/datetimes/test_date_range.py:787:29: RUF039 First argument to `re.split()` is not raw string
... 11 additional changes omitted for project

pypa/build (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ tests/test_integration.py:31:21: RUF039 First argument to `re.compile()` is not raw string
+ tests/test_integration.py:32:21: RUF039 First argument to `re.compile()` is not raw string

python-poetry/poetry (+16 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ src/poetry/console/logging/formatters/builder_formatter.py:11:26: RUF039 First argument to `re.sub()` is not raw string
+ src/poetry/console/logging/formatters/builder_formatter.py:13:26: RUF039 First argument to `re.sub()` is not raw string
+ src/poetry/console/logging/formatters/builder_formatter.py:15:26: RUF039 First argument to `re.sub()` is not raw string
+ src/poetry/console/logging/formatters/builder_formatter.py:18:17: RUF039 First argument to `re.sub()` is not raw string
+ src/poetry/mixology/solutions/providers/python_requirement_solution_provider.py:22:13: RUF039 First argument to `re.match()` is not raw string
+ src/poetry/mixology/solutions/providers/python_requirement_solution_provider.py:23:13: RUF039 First argument to `re.match()` is not raw string
+ src/poetry/puzzle/provider.py:736:21: RUF039 First argument to `re.sub()` is not raw string
+ src/poetry/utils/dependency_specification.py:192:13: RUF039 First argument to `re.sub()` is not raw string
+ tests/config/test_config.py:58:46: RUF039 First argument to `re.sub()` is not raw string
+ tests/conftest.py:378:20: RUF039 First argument to `re.compile()` is not raw string
... 6 additional changes omitted for project

... Truncated remaining completed project reports due to GitHub comment length restrictions

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
RUF039 199 199 0 0 0

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This overall looks good. I'm interested in your and @AlexWaygood's opinion on flagging non-raw strings for all regex patterns or if we should make the rule slightly more clever, e.g. by allowing regex patterns containing an escape sequence that can't be written as a raw string (e.g. \n)

crates/ruff_linter/src/codes.rs Outdated Show resolved Hide resolved
@MichaReiser MichaReiser added rule Implementing or modifying a lint rule preview Related to preview mode features labels Nov 19, 2024
@InSyncWithFoo
Copy link
Contributor Author

I think this should be a blanket enforcement.

# It's too easy to use a normal string as the pattern...
re.compile('uv')       # `uv` anywhere (`uv`, `uvicorn`, `dhruv`, `juvenile`, etc.)

# ...then forget to switch to a raw string when the pattern changes.
re.compile('\buv\b')   # `uv` not within a word?

@MichaReiser
Copy link
Member

I don't disagree with this overall, but it does mean that it gives you false positives for e.g.

https://github.com/RasaHQ/rasa/blob/7807b19ad5fffab73ca1a04dc710f812115a9288/rasa/utils/io.py#L223-L230

@MichaReiser
Copy link
Member

So maybe that's specific to escape sequences. Python regex support all other common escape sequences (with the exception of \b, which maps to word bounderies). It's, therefore, not required to use a raw string for \n.

@MichaReiser
Copy link
Member

Oh, \U escape sequences are now also supported. I then agree that it should always flag.

Changed in version 3.3: The '\u' and '\U' escape sequences have been added.

Changed in version 3.6: Unknown escapes consisting of '' and an ASCII letter now are errors.

Changed in version 3.8: The '\N{name}' escape sequence has been added. As in string literals, it expands to the named Unicode character (e.g. '\N{EM DASH}').

crates/ruff_python_ast/src/nodes.rs Outdated Show resolved Hide resolved
@AlexWaygood
Copy link
Member

I haven't studied this in depth, but it overall seems reasonable to me, both in concept and implementation.

I wondered if we needed to take account of this footgun to do with raw strings (they cannot end with an odd number of backslashes). But I think we should be fine, since that documentation states:

Raw strings were designed to ease creating input for processors (chiefly regular expression engines) that want to do their own backslash escape processing. Such processors consider an unmatched trailing backslash to be an error anyway, so raw strings disallow that. In return, they allow you to pass on the string quote character by escaping it with a backslash. These rules work well when r-strings are used for their intended purpose.

@InSyncWithFoo InSyncWithFoo changed the title [ruff] re and regex calls with unraw string as first argument (RUF051) [ruff] re and regex calls with unraw string as first argument (RUF039) Nov 19, 2024
@MichaReiser MichaReiser merged commit 5f09d4a into astral-sh:main Nov 19, 2024
20 checks passed
@InSyncWithFoo InSyncWithFoo deleted the RUF051 branch November 20, 2024 03:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Related to preview mode features rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rule proposal: require raw strings for regex patterns
3 participants