Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid parsing other parts of a format specification if replacements are present #6858

Merged
merged 7 commits into from
Aug 25, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,11 @@
"{:s} {:y}".format("hello", "world") # [bad-format-character]

"{:*^30s}".format("centered") # OK
"{:{s}}".format("hello", s="s") # OK (nested replacement value not checked)

"{:{s:y}}".format("hello", s="s") # [bad-format-character] (nested replacement format spec checked)
"{:{s}}".format("hello", s="s") # OK (nested placeholder value not checked)
"{:{s:y}}".format("hello", s="s") # [bad-format-character] (nested placeholder format spec checked)
"{0:.{prec}g}".format(1.23, prec=15) # OK (cannot validate after nested placeholder)
"{0:.{foo}{bar}{foobar}y}".format(...) # OK (cannot validate after nested placeholders)
"{0:.{foo}x{bar}y{foobar}g}".format(...) # OK (all nested placeholders are consumed without considering in between chars)

## f-strings

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,13 +63,14 @@ pub(crate) fn call(checker: &mut Checker, string: &str, range: TextRange) {
));
}
Err(_) => {}
Ok(format_spec) => {
for replacement in format_spec.replacements() {
let FormatPart::Field { format_spec, .. } = replacement else {
Ok(FormatSpec::Static(_)) => {}
Ok(FormatSpec::Dynamic(format_spec)) => {
for placeholder in format_spec.placeholders {
let FormatPart::Field { format_spec, .. } = placeholder else {
continue;
};
if let Err(FormatSpecError::InvalidFormatType) =
FormatSpec::parse(format_spec)
FormatSpec::parse(&format_spec)
{
checker.diagnostics.push(Diagnostic::new(
BadStringFormatCharacter {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,14 @@ bad_string_format_character.py:15:1: PLE1300 Unsupported format character 'y'
17 | "{:*^30s}".format("centered") # OK
|

bad_string_format_character.py:20:1: PLE1300 Unsupported format character 'y'
bad_string_format_character.py:19:1: PLE1300 Unsupported format character 'y'
|
18 | "{:{s}}".format("hello", s="s") # OK (nested replacement value not checked)
19 |
20 | "{:{s:y}}".format("hello", s="s") # [bad-format-character] (nested replacement format spec checked)
17 | "{:*^30s}".format("centered") # OK
18 | "{:{s}}".format("hello", s="s") # OK (nested placeholder value not checked)
19 | "{:{s:y}}".format("hello", s="s") # [bad-format-character] (nested placeholder format spec checked)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ PLE1300
21 |
22 | ## f-strings
20 | "{0:.{prec}g}".format(1.23, prec=15) # OK (cannot validate after nested placeholder)
21 | "{0:.{foo}{bar}{foobar}y}".format(...) # OK (cannot validate after nested placeholders)
|


160 changes: 81 additions & 79 deletions crates/ruff_python_literal/src/format.rs
Original file line number Diff line number Diff line change
Expand Up @@ -190,30 +190,35 @@ impl FormatParse for FormatType {
/// "hello {name:<20}".format(name="test")
/// ```
///
/// Format specifications allow nested replacements for dynamic formatting.
/// Format specifications allow nested placeholders for dynamic formatting.
/// For example, the following statements are equivalent:
/// ```python
/// "hello {name:{fmt}}".format(name="test", fmt="<20")
/// "hello {name:{align}{width}}".format(name="test", align="<", width="20")
/// "hello {name:<20{empty}>}".format(name="test", empty="")
/// ```
///
/// Nested replacements can include additional format specifiers.
/// Nested placeholders can include additional format specifiers.
/// ```python
/// "hello {name:{fmt:*>}}".format(name="test", fmt="<20")
/// ```
///
/// However, replacements can only be singly nested (preserving our sanity).
/// However, placeholders can only be singly nested (preserving our sanity).
/// A [`FormatSpecError::PlaceholderRecursionExceeded`] will be raised while parsing in this case.
/// ```python
/// "hello {name:{fmt:{not_allowed}}}".format(name="test", fmt="<20") # Syntax error
/// ```
///
/// When replacements are present in a format specification, we will parse them and
/// store them in [`FormatSpec`] but will otherwise ignore them if they would introduce
/// an invalid format specification at runtime.
/// When placeholders are present in a format specification, parsing will return a [`DynamicFormatSpec`]
/// and avoid attempting to parse any of the clauses. Otherwise, a [`StaticFormatSpec`] will be used.
#[derive(Debug, PartialEq)]
pub struct FormatSpec {
pub enum FormatSpec {
Static(StaticFormatSpec),
Dynamic(DynamicFormatSpec),
}

#[derive(Debug, PartialEq)]
pub struct StaticFormatSpec {
// Ex) `!s` in `'{!s}'`
conversion: Option<FormatConversion>,
// Ex) `*` in `'{:*^30}'`
Expand All @@ -232,8 +237,12 @@ pub struct FormatSpec {
precision: Option<usize>,
// Ex) `f` in `'{:+f}'`
format_type: Option<FormatType>,
}

#[derive(Debug, PartialEq)]
pub struct DynamicFormatSpec {
// Ex) `x` and `y` in `'{:*{x},{y}b}'`
replacements: Vec<FormatPart>,
pub placeholders: Vec<FormatPart>,
}

#[derive(Copy, Clone, Debug, PartialEq, Default)]
Expand Down Expand Up @@ -320,41 +329,61 @@ fn parse_precision(text: &str) -> Result<(Option<usize>, &str), FormatSpecError>

/// Parses a format part within a format spec
fn parse_nested_placeholder<'a>(
parts: &mut Vec<FormatPart>,
placeholders: &mut Vec<FormatPart>,
text: &'a str,
) -> Result<&'a str, FormatSpecError> {
match FormatString::parse_spec(text, AllowPlaceholderNesting::No) {
// Not a nested replacement, OK
// Not a nested placeholder, OK
Err(FormatParseError::MissingStartBracket) => Ok(text),
Err(err) => Err(FormatSpecError::InvalidPlaceholder(err)),
Ok((format_part, text)) => {
parts.push(format_part);
placeholders.push(format_part);
Ok(text)
}
}
}

/// Parse and consume all placeholders in a format spec
/// This will also consume any intermediate characters such as `x` and `y` in `"x{foo}y{bar}z"`
/// If no placeholders are present, the text will be returned unmodified.
fn consume_all_placeholders<'a>(
placeholders: &mut Vec<FormatPart>,
text: &'a str,
) -> Result<&'a str, FormatSpecError> {
let mut chars = text.chars();
let mut text = text;
let mut placeholder_count = placeholders.len();

while chars.clone().contains(&'{') {
text = parse_nested_placeholder(placeholders, chars.as_str())?;
zanieb marked this conversation as resolved.
Show resolved Hide resolved
chars = text.chars();
// If we did not parse a placeholder, consume a character
if placeholder_count == placeholders.len() {
chars.next();
} else {
placeholder_count = placeholders.len();
}
}
Ok(text)
}
zanieb marked this conversation as resolved.
Show resolved Hide resolved

impl FormatSpec {
pub fn parse(text: &str) -> Result<Self, FormatSpecError> {
let mut replacements = vec![];
// get_integer in CPython
let text = parse_nested_placeholder(&mut replacements, text)?;
let mut placeholders = vec![];
let text = consume_all_placeholders(&mut placeholders, text)?;

if !placeholders.is_empty() {
return Ok(FormatSpec::Dynamic(DynamicFormatSpec { placeholders }));
}
zanieb marked this conversation as resolved.
Show resolved Hide resolved

let (conversion, text) = FormatConversion::parse(text);
let text = parse_nested_placeholder(&mut replacements, text)?;
let (mut fill, mut align, text) = parse_fill_and_align(text);
let text = parse_nested_placeholder(&mut replacements, text)?;
let (sign, text) = FormatSign::parse(text);
let text = parse_nested_placeholder(&mut replacements, text)?;
let (alternate_form, text) = parse_alternate_form(text);
let text = parse_nested_placeholder(&mut replacements, text)?;
let (zero, text) = parse_zero(text);
let text = parse_nested_placeholder(&mut replacements, text)?;
let (width, text) = parse_number(text)?;
let text = parse_nested_placeholder(&mut replacements, text)?;
let (grouping_option, text) = FormatGrouping::parse(text);
let text = parse_nested_placeholder(&mut replacements, text)?;
let (precision, text) = parse_precision(text)?;
let text = parse_nested_placeholder(&mut replacements, text)?;

let (format_type, _text) = if text.is_empty() {
(None, text)
Expand All @@ -376,7 +405,7 @@ impl FormatSpec {
align = align.or(Some(FormatAlign::AfterSign));
}

Ok(FormatSpec {
Ok(FormatSpec::Static(StaticFormatSpec {
conversion,
fill,
align,
Expand All @@ -386,12 +415,7 @@ impl FormatSpec {
grouping_option,
precision,
format_type,
replacements,
})
}

pub fn replacements(&self) -> &[FormatPart] {
return self.replacements.as_slice();
}))
}
}

Expand Down Expand Up @@ -437,7 +461,7 @@ impl std::fmt::Display for FormatParseError {
std::write!(fmt, "unescaped start bracket in literal")
}
Self::PlaceholderRecursionExceeded => {
std::write!(fmt, "multiply nested replacement not allowed")
std::write!(fmt, "multiply nested placeholder not allowed")
}
Self::UnknownConversion => {
std::write!(fmt, "unknown conversion")
Expand Down Expand Up @@ -730,7 +754,7 @@ mod tests {

#[test]
fn test_width_only() {
let expected = Ok(FormatSpec {
let expected = Ok(FormatSpec::Static(StaticFormatSpec {
conversion: None,
fill: None,
align: None,
Expand All @@ -740,14 +764,13 @@ mod tests {
grouping_option: None,
precision: None,
format_type: None,
replacements: vec![],
});
}));
assert_eq!(FormatSpec::parse("33"), expected);
}

#[test]
fn test_fill_and_width() {
let expected = Ok(FormatSpec {
let expected = Ok(FormatSpec::Static(StaticFormatSpec {
conversion: None,
fill: Some('<'),
align: Some(FormatAlign::Right),
Expand All @@ -757,45 +780,26 @@ mod tests {
grouping_option: None,
precision: None,
format_type: None,
replacements: vec![],
});
}));
assert_eq!(FormatSpec::parse("<>33"), expected);
}

#[test]
fn test_format_part() {
let expected = Ok(FormatSpec {
conversion: None,
fill: None,
align: None,
sign: None,
alternate_form: false,
width: None,
grouping_option: None,
precision: None,
format_type: None,
replacements: vec![FormatPart::Field {
let expected = Ok(FormatSpec::Dynamic(DynamicFormatSpec {
placeholders: vec![FormatPart::Field {
field_name: "x".to_string(),
conversion_spec: None,
format_spec: String::new(),
}],
});
}));
assert_eq!(FormatSpec::parse("{x}"), expected);
}

#[test]
fn test_format_parts() {
let expected = Ok(FormatSpec {
conversion: None,
fill: None,
align: None,
sign: None,
alternate_form: false,
width: None,
grouping_option: None,
precision: None,
format_type: None,
replacements: vec![
fn test_dynamic_format_spec() {
let expected = Ok(FormatSpec::Dynamic(DynamicFormatSpec {
placeholders: vec![
FormatPart::Field {
field_name: "x".to_string(),
conversion_spec: None,
Expand All @@ -812,34 +816,25 @@ mod tests {
format_spec: String::new(),
},
],
});
}));
assert_eq!(FormatSpec::parse("{x}{y:<2}{z}"), expected);
}

#[test]
fn test_format_part_with_others() {
let expected = Ok(FormatSpec {
conversion: None,
fill: None,
align: Some(FormatAlign::Left),
sign: None,
alternate_form: false,
width: Some(20),
grouping_option: None,
precision: None,
format_type: Some(FormatType::Binary),
replacements: vec![FormatPart::Field {
fn test_dynamic_format_spec_with_others() {
let expected = Ok(FormatSpec::Dynamic(DynamicFormatSpec {
placeholders: vec![FormatPart::Field {
field_name: "x".to_string(),
conversion_spec: None,
format_spec: String::new(),
}],
});
}));
assert_eq!(FormatSpec::parse("<{x}20b"), expected);
}
Comment on lines 806 to 816
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we lose all these additional fields which were correct in this test but could be wrong depending on the dynamic content of the placeholder.


#[test]
fn test_all() {
let expected = Ok(FormatSpec {
let expected = Ok(FormatSpec::Static(StaticFormatSpec {
conversion: None,
fill: Some('<'),
align: Some(FormatAlign::Right),
Expand All @@ -849,8 +844,7 @@ mod tests {
grouping_option: Some(FormatGrouping::Comma),
precision: Some(11),
format_type: Some(FormatType::Binary),
replacements: vec![],
});
}));
assert_eq!(FormatSpec::parse("<>-#23,.11b"), expected);
}

Expand All @@ -877,7 +871,7 @@ mod tests {
}

#[test]
fn test_format_parse_nested_replacement() {
fn test_format_parse_nested_placeholder() {
let expected = Ok(FormatString {
format_parts: vec![
FormatPart::Literal("abcd".to_owned()),
Expand Down Expand Up @@ -966,7 +960,15 @@ mod tests {
);
assert_eq!(
FormatSpec::parse("{}}"),
Err(FormatSpecError::InvalidFormatType)
// Note this should be an `InvalidFormatType` but we give up
// on all other parsing validation when we see a placeholder
Ok(FormatSpec::Dynamic(DynamicFormatSpec {
placeholders: vec![FormatPart::Field {
field_name: String::new(),
conversion_spec: None,
format_spec: String::new()
}]
}))
Comment on lines -969 to +954
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example case where we no longer detect invalid format type specifications because there was a replacement earlier — this is a trade-off for generally safer analysis.

);
assert_eq!(
FormatSpec::parse("{{x}}"),
Expand Down