Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refurb] Parse more exotic decimal strings in verbose-decimal-constructor (FURB157) #14098

Merged
merged 6 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions crates/ruff_linter/resources/test/fixtures/refurb/FURB157.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,23 @@
Decimal(0)
Decimal("Infinity")
decimal.Decimal(0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please also include cases for Decimal("-inf") and Decimal("inf") (both should error).

# Handle Python's Decimal parsing
# See https://github.com/astral-sh/ruff/issues/13807

# Errors
Decimal("1_000")
Decimal("__1____000")

# Ok
Decimal("2e-4")
Decimal("2E-4")
Decimal("_1.234__")
Decimal("2e4")
Decimal("2e+4")
Decimal("2E4")
Decimal("1.2")
# Ok: even though this is equal to `Decimal(123)`,
# we assume that a developer would
# only write it this way if they meant to.
Decimal("١٢٣")
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ use ruff_macros::{derive_message_formats, violation};
use ruff_python_ast::{self as ast, Expr};
use ruff_python_trivia::PythonWhitespace;
use ruff_text_size::Ranged;
use std::borrow::Cow;

use crate::checkers::ast::Checker;

Expand Down Expand Up @@ -75,13 +76,20 @@ pub(crate) fn verbose_decimal_constructor(checker: &mut Checker, call: &ast::Exp
value: str_literal, ..
}) => {
// Parse the inner string as an integer.
let trimmed = str_literal.to_str().trim_whitespace();

//
// For reference, a string argument to `Decimal` is parsed in CPython
// using this regex:
// https://github.com/python/cpython/blob/ac556a2ad1213b8bb81372fe6fb762f5fcb076de/Lib/_pydecimal.py#L6060-L6077
// _after_ trimming whitespace from the string and removing all occurrences of "_".
let mut trimmed = Cow::from(str_literal.to_str().trim_whitespace());
if memchr::memchr(b'_', trimmed.as_bytes()).is_some() {
trimmed = Cow::from(trimmed.replace('_', ""));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't think memchr gives us much here, considering that most literals are very short. We also don't have to use replace and allcoate a Cow::Owned. Instead, I would use Cow::from(trimmed.trim_start_matches('_')) for better readability (with the added benefit that it doesn't allocate)

Copy link
Collaborator Author

@dylwil3 dylwil3 Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine not using memchr, but I do think we need to use replace because underscores may appear anywhere in the string. For example- Decimal("1_____00") is valid.

Edit: whoops, didn't see that was already addressed below

}
// Extract the unary sign, if any.
let (unary, rest) = if let Some(trimmed) = trimmed.strip_prefix('+') {
("+", trimmed)
("+", Cow::from(trimmed))
} else if let Some(trimmed) = trimmed.strip_prefix('-') {
("-", trimmed)
("-", Cow::from(trimmed))
} else {
("", trimmed)
};
Expand All @@ -90,7 +98,7 @@ pub(crate) fn verbose_decimal_constructor(checker: &mut Checker, call: &ast::Exp
let rest = rest.trim_start_matches('0');

// Verify that the rest of the string is a valid integer.
if !rest.chars().all(|c| c.is_ascii_digit()) {
if !rest.bytes().all(|c| c.is_ascii_digit()) {
return;
};

Expand Down Expand Up @@ -159,3 +167,26 @@ pub(crate) fn verbose_decimal_constructor(checker: &mut Checker, call: &ast::Exp

checker.diagnostics.push(diagnostic);
}

// // Slightly modified from [CPython regex] to ignore https://github.com/python/cpython/blob/ac556a2ad1213b8bb81372fe6fb762f5fcb076de/Lib/_pydecimal.py#L6060-L6077
// static DECIMAL_PARSER_REGEX: LazyLock<Regex> = LazyLock::new(|| {
// Regex::new(
// r"(?x) # Verbose mode for comments
// ^ # Start of string
// (?P<sign>[-+])? # Optional sign
// (?:
// (?P<int>\d*) # Integer part (can be empty)
// (\.(?P<frac>\d+))? # Optional fractional part
// (E(?P<exp>[-+]?\d+))? # Optional exponent
// |
// Inf(inity)? # Infinity
// |
// (?P<signal>s)? # Optional signal
// NaN # NaN
// (?P<diag>\d*) # Optional diagnostic info
// )
// $ # End of string
// ",
// )
// .unwrap()
// });
Original file line number Diff line number Diff line change
Expand Up @@ -166,3 +166,43 @@ FURB157.py:12:17: FURB157 [*] Verbose expression in `Decimal` constructor
13 13 |
14 14 | # OK
15 15 | Decimal(0)

FURB157.py:23:9: FURB157 [*] Verbose expression in `Decimal` constructor
|
22 | # Errors
23 | Decimal("1_000")
| ^^^^^^^ FURB157
24 | Decimal("__1____000")
|
= help: Replace with `1000`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be nice to preserve the thousand separator in that case and suggest 1_000 instead of 1000. (If it's not too annoying to do)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion, and I think it would be a nice followup! I think one would want to trim leading/trailing underscores, and then replace every internal, contiguous sequence of underscores with a single underscore. At that point it may be worth a little refactoring to just walk through the string once and abort early as appropriate...


ℹ Safe fix
20 20 | # See https://github.com/astral-sh/ruff/issues/13807
21 21 |
22 22 | # Errors
23 |-Decimal("1_000")
23 |+Decimal(1000)
24 24 | Decimal("__1____000")
25 25 |
26 26 | # Ok

FURB157.py:24:9: FURB157 [*] Verbose expression in `Decimal` constructor
|
22 | # Errors
23 | Decimal("1_000")
24 | Decimal("__1____000")
| ^^^^^^^^^^^^ FURB157
25 |
26 | # Ok
|
= help: Replace with `1000`

ℹ Safe fix
21 21 |
22 22 | # Errors
23 23 | Decimal("1_000")
24 |-Decimal("__1____000")
24 |+Decimal(1000)
25 25 |
26 26 | # Ok
27 27 | Decimal("2e-4")
Loading