Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flake8-simplify] Implementation for split-of-static-string (SIM905) #14008

Merged

Conversation

sbrugman
Copy link
Contributor

@sbrugman sbrugman commented Oct 30, 2024

Summary

Closes #13944

Test Plan

Standard snapshot testing

flake8-simplify surprisingly only has a single test case

@sbrugman sbrugman force-pushed the rule-ruff-split-of-static-string branch from 721fd96 to 05e0a1a Compare October 30, 2024 19:45
Copy link
Contributor

github-actions bot commented Oct 30, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+31 -0 violations, +0 -0 fixes in 6 projects; 48 projects unchanged)

apache/airflow (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ airflow/example_dags/example_params_ui_tutorial.py:101:22: SIM905 [*] Consider using a list literal instead of `str.split`
+ scripts/ci/pre_commit/check_min_python_version.py:29:35: SIM905 [*] Consider using a list literal instead of `str.split`

apache/superset (+9 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ superset/utils/date_parser.py:511:9: SIM905 [*] Consider using a list literal instead of `str.split`
+ tests/integration_tests/db_engine_specs/hive_tests.py:31:11: SIM905 [*] Consider using a list literal instead of `str.split`
+ tests/integration_tests/db_engine_specs/hive_tests.py:39:11: SIM905 [*] Consider using a list literal instead of `str.split`
+ tests/integration_tests/db_engine_specs/hive_tests.py:46:11: SIM905 [*] Consider using a list literal instead of `str.split`
+ tests/integration_tests/db_engine_specs/hive_tests.py:54:11: SIM905 [*] Consider using a list literal instead of `str.split`
+ tests/integration_tests/db_engine_specs/hive_tests.py:63:11: SIM905 [*] Consider using a list literal instead of `str.split`
+ tests/integration_tests/db_engine_specs/hive_tests.py:73:11: SIM905 [*] Consider using a list literal instead of `str.split`
+ tests/integration_tests/db_engine_specs/hive_tests.py:84:11: SIM905 [*] Consider using a list literal instead of `str.split`
+ tests/integration_tests/db_engine_specs/hive_tests.py:97:11: SIM905 [*] Consider using a list literal instead of `str.split`

bokeh/bokeh (+4 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ scripts/hooks/install.py:5:20: SIM905 [*] Consider using a list literal instead of `str.split`
+ scripts/hooks/uninstall.py:5:20: SIM905 [*] Consider using a list literal instead of `str.split`
+ scripts/sri.py:20:11: SIM905 [*] Consider using a list literal instead of `str.split`
+ src/bokeh/resources.py:674:11: SIM905 [*] Consider using a list literal instead of `str.split`

freedomofpress/securedrop (+12 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ admin/tests/test_integration.py:542:27: SIM905 [*] Consider using a list literal instead of `str.split`
+ admin/tests/test_integration.py:547:27: SIM905 [*] Consider using a list literal instead of `str.split`
+ admin/tests/test_integration.py:642:27: SIM905 [*] Consider using a list literal instead of `str.split`
+ admin/tests/test_integration.py:643:27: SIM905 [*] Consider using a list literal instead of `str.split`
+ admin/tests/test_integration.py:646:27: SIM905 [*] Consider using a list literal instead of `str.split`
+ admin/tests/test_integration.py:675:27: SIM905 [*] Consider using a list literal instead of `str.split`
+ securedrop/pretty_bad_protocol/_parsers.py:1194:16: SIM905 [*] Consider using a list literal instead of `str.split`
+ securedrop/pretty_bad_protocol/_parsers.py:1233:16: SIM905 [*] Consider using a list literal instead of `str.split`
+ securedrop/pretty_bad_protocol/_parsers.py:1292:24: SIM905 [*] Consider using a list literal instead of `str.split`
+ securedrop/pretty_bad_protocol/_parsers.py:1400:24: SIM905 [*] Consider using a list literal instead of `str.split`
+ securedrop/pretty_bad_protocol/_parsers.py:671:30: SIM905 [*] Consider using a list literal instead of `str.split`
+ securedrop/pretty_bad_protocol/gnupg.py:565:26: SIM905 [*] Consider using a list literal instead of `str.split`

scikit-build/scikit-build-core (+3 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ tests/test_skbuild_settings.py:400:12: SIM905 [*] Consider using a list literal instead of `str.split`
+ tests/test_skbuild_settings.py:440:12: SIM905 [*] Consider using a list literal instead of `str.split`
+ tests/test_skbuild_settings.py:752:12: SIM905 [*] Consider using a list literal instead of `str.split`

indico/indico (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ indico/modules/rb/api.py:310:17: SIM905 [*] Consider using a list literal instead of `str.split`

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
SIM905 31 31 0 0 0

@MichaReiser MichaReiser added rule Implementing or modifying a lint rule preview Related to preview mode features labels Oct 31, 2024
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is real good.

I think this mainly needs more tests .

The touch with maxsplit is nice but I haven't seen a single example in the ecosystem results. That's why I'm not sure if its worth the complexity or if we should just bail on providing a fix in this case.

Comment on lines 6 to 10
"""
itemA
itemB
itemC
""".split()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more examples would be great:

  1. Where the left is an empty string: "".split()
  2. An all whitespace string: " ".split() ideally where the whitespace is a mixture of spaces, tabs, etc
  3. Where the left contains no split points: "/abc/".split() (found in the ecosystem results)
  4. Multiline with different indents: see https://github.com/scikit-build/scikit-build-core/blob/f6e60f41e46ce13ddbd344542d1bf45743b74514/tests/test_skbuild_settings.py#L440-L443
  5. Strings with unicode flag
  6. Examples with comments in various positions
       ("a,b,c"
       # comment
       .split()
       )

We should also add examples for the following strings where I think it's okay if we don't support them

  1. Raw strings
  2. Implicit concatenated strings
  3. Implicit concatenated strings with commented parts (where there are comments between the parts)
  4. Implicit concatenated strings with different prefixes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added these cases, as well as maxsize=-1 that I found people also use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicit concatenated strings are supported, but the fix is unsafe because of the comments. I could also exclude them from fixing and mark the fix as safe if you prefer.

crates/ruff_linter/src/checkers/ast/analyze/expression.rs Outdated Show resolved Hide resolved
if let Some(ref replacement_expr) = split_replacement {
// Construct replacement list
let replacement = checker.generator().expr(replacement_expr);
diagnostic.set_fix(Fix::unsafe_edit(Edit::range_replacement(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your reasoning for making this an unsafe edit?

Copy link
Contributor Author

@sbrugman sbrugman Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not yet exhausted all edge cases (comments, implicit string concatenation, etc.), so I marked it unsafe to be on the safe side. Comments within ISC are not preserved, so in that case it's unsafe.

Comment on lines 80 to 81
// Autofix for maxsplit without separator not yet implemented
None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes maxsplit with the default separator difficult to implement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no equivalent to split for split_whitespace. Implementing it is not that much work, but it would be simplified greatly when this experimental API is stabilised (https://doc.rust-lang.org/std/str/struct.SplitWhitespace.html#method.remainder). I don't think it's worth to implement the logic now. I'll add a comment.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule exists in the Flake8-simplify linter as SIM905: MartinThoma/flake8-simplify#86

So I think it would be good for us to put it in our Flake8-simplify category with code SIM905

@sbrugman
Copy link
Contributor Author

Thanks for the review @MichaReiser, will make another pass.

Good catch @AlexWaygood, I'll remap to that category.

@sbrugman sbrugman changed the title [ruff] Implementation for split-of-static-string (RUF035) [flake8-simplify] Implementation for split-of-static-string (SIM905) Oct 31, 2024
}

fn split_sep(str_value: &str, sep_value: &str, max_split: usize, direction_left: bool) -> Expr {
let list_items: Vec<&str> = if direction_left && max_split > 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The touch with maxsplit is nice but I haven't seen a single example in the ecosystem results. That's why I'm not sure if it's worth the complexity or if we should just bail on providing a fix in this case.

Maxsplit=1 is quite common (arguably people should use str.partition).

Another reason to keep the logic is that it can be useful to generalise for this pylint rule:
https://pylint.readthedocs.io/en/stable/user_guide/messages/convention/use-maxsplit-arg.html

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome. Thank you. It's up to you if you want to address any of the two comments. I merge whenever you tell me the PR's good to go :)

crates/ruff_linter/src/codes.rs Outdated Show resolved Hide resolved
op: UnaryOp::USub,
operand,
..
}) if matches!(**operand, Expr::NumberLiteral { .. }) => 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should bail here if maxplit is a negative number other than -1

Copy link
Contributor Author

@sbrugman sbrugman Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's rare, but maxsplit=-2 behaves as if maxsplit is omitted.
From that perspective, any negative number could be flagged as non-idiomatic default, but that would be a different rule.

Working code in the wild:

https://github.com/localstack/localstack/blob/master/localstack-core/localstack/services/kms/utils.py#L17

(This is valid code, although likely a mistake)

@sbrugman
Copy link
Contributor Author

@MichaReiser processed the comments, thanks for the pointers

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test for maxsplit=0. I suspect that we're handling that incorrectly to be the same as maxsplit=-1 where it isn't.

>>> "a,b,c".split(',', maxsplit=0)
['a,b,c']
>>> "a,b,c".split(',', maxsplit=-1)
['a', 'b', 'c']
>>> "a,b,c".split(',', maxsplit=-2)
  • negative values => Same as maxsplit None
  • 0 => No split

@MichaReiser
Copy link
Member

To make things more interesting:

>>> "a,b,c".split(',', maxsplit=-0)
['a,b,c']
>>> "a,b,c".split(',', maxsplit=-1)
['a', 'b', 'c']

@sbrugman
Copy link
Contributor Author

sbrugman commented Nov 2, 2024

You're right, the maxsplit=0 case wasn't handled correctly. (And nice touch on -0 )

@charliermarsh charliermarsh force-pushed the rule-ruff-split-of-static-string branch from 2901dff to 56047ff Compare November 2, 2024 17:04
@charliermarsh
Copy link
Member

I decided to make it safe unless there are comments in the range (in which case, we still fix, but as unsafe).

@charliermarsh charliermarsh enabled auto-merge (squash) November 2, 2024 17:10
@charliermarsh charliermarsh enabled auto-merge (squash) November 2, 2024 17:10
@charliermarsh charliermarsh merged commit f837428 into astral-sh:main Nov 2, 2024
18 checks passed
@sbrugman sbrugman deleted the rule-ruff-split-of-static-string branch November 3, 2024 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Related to preview mode features rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Rule + fixer: Use a list instead of calling split with string literals
4 participants