-
Notifications
You must be signed in to change notification settings - Fork 2
fix the ranges of constants inside f-strings #33
fix the ranges of constants inside f-strings #33
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is heading in the right direction, but there are some remaining incorrect ranges. We'll have to review every single StringParser::range()
call and replace it with the correct range.
The Cursor
(used for lexing in the Ruff repository) has this concept where you can start
a token. What it does is that the Cursor
remembers where the start of the current token is. I think something similar would be helpful. The parser could define a start
function that records the current location
, and we introduce a new current_range()
(and current_range_and_start_new()
method, which returns the range of the current value.
@@ -167,7 +167,7 @@ expression: parse_ast | |||
values: [ | |||
Constant( | |||
ExprConstant { | |||
range: 114..133, | |||
range: 116..123, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The range of the following FormattedValue
must be incorrect because it overlaps with this Constant
.
parser/src/string.rs
Outdated
@@ -516,7 +522,7 @@ impl<'a> StringParser<'a> { | |||
ast::ExprConstant { | |||
value: content.into(), | |||
kind: None, | |||
range: self.range(), | |||
range: TextRange::new(content_start, self.location), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The code so far consistently uses self.get_pos()
over location
range: TextRange::new(content_start, self.location), | |
range: TextRange::new(content_start, self.get_pos()), |
nice catch. will keep investigating
|
no i'm back to thinking this isn't what we want i think this is currently following python's ast
which is fine for parsing but not for preserving the source ( probably makes sense to extract this into a separate issue |
raised astral-sh/ruff#5970 |
self-documenting f-strings (`f"{foo=}"`) are still outstanding see astral-sh/ruff#5970
updated. afict we can't store the position on self (like cursor) because the ranges are nested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good overall. The complexity of this code is rather high. I hope that we could rewrite it at some point to use a Cursor
based lexer (or use a dedicated lexer even).
There are a few tests that have, what seems to me, invalid ranges. I assume that's because of the TODO. Do you think it would be reasonable to fix these ranges as part of this PR as well without fixing the representation?
parser/src/snapshots/rustpython_parser__string__tests__fstring_parse_self_documenting_base.snap
Show resolved
Hide resolved
...rc/snapshots/rustpython_parser__string__tests__fstring_parse_self_documenting_base_more.snap
Show resolved
Hide resolved
parser/src/snapshots/rustpython_parser__string__tests__parse_fstring_self_doc_prec_space.snap
Show resolved
Hide resolved
parser/src/string.rs
Outdated
@@ -191,7 +183,7 @@ impl<'a> StringParser<'a> { | |||
let mut conversion = ConversionFlag::None; | |||
let mut self_documenting = false; | |||
let mut trailing_seq = String::new(); | |||
let location = self.get_pos(); | |||
let location = self.get_pos() - TextSize::from(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it guarnanteed that the character one position before is a ASCII character? I think it is because it is always the }
token. Would it make sense to instead pass the start location as an argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for prodding me on these. think i have a better solution (though it does add a new dependency. happy to discuss)
parser/src/string.rs
Outdated
@@ -472,12 +470,13 @@ impl<'a> StringParser<'a> { | |||
} | |||
} | |||
if !content.is_empty() { | |||
let range = self.current_range(location).sub_end(1.into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for substracting one character here? Is it safe to assume that the preceding character is ASCII? Should we instead pass the fstring_start_location
as a function argument?
parser/src/string.rs
Outdated
@@ -528,6 +528,7 @@ impl<'a> StringParser<'a> { | |||
|
|||
fn parse_bytes(&mut self) -> Result<Expr, LexicalError> { | |||
let mut content = String::new(); | |||
let location = self.get_pos(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: What would you think of renaming location
to start_location
(same in other places)? I would find it useful because it encodes the kind of location.
parser/src/string.rs
Outdated
@@ -583,25 +585,23 @@ impl<'a> StringParser<'a> { | |||
} else if self.kind.is_any_bytes() { | |||
self.parse_bytes().map(|expr| vec![expr]) | |||
} else { | |||
self.parse_string().map(|expr| vec![expr]) | |||
dbg!(self.parse_string().map(|expr| vec![expr])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbg!(self.parse_string().map(|expr| vec![expr])) | |
self.parse_string().map(|expr| vec![expr]) |
i saw you have a lexer rewrite pr open (but not read it yet). does that make this redundant? |
I don't think so. It brings in the |
This is ready to merge after we've resolved the remaining nits (either close as won't do or apply the change) |
thanks. will take a view on (probably fix) the comments not related to ranges. just wanted to resolve that first. |
use multipeek so we can check for `{{` without having to consume the first `{` and can instead leave that to the sub-parsers. this also means we can get the location for the previous (now ending) range before advancing the cursor
parser/src/string.rs
Outdated
@@ -21,7 +22,7 @@ use rustpython_parser_core::{ | |||
const MAX_UNICODE_NAME: usize = 88; | |||
|
|||
struct StringParser<'a> { | |||
chars: std::iter::Peekable<std::str::Chars<'a>>, | |||
chars: MultiPeek<std::str::Chars<'a>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if we don't have to use MultiPeek
. We can instead clone the chars
iterator. Cloning is cheap because it only requires coping the start end end slice pointers (and I'm somewhat certain that LLVM can figure out that it only needs to copy the start pointer).
Using clone
would allow us to remove the Peekable
altogether, similar to what we do in the Lexer's Cursor:
RustPython-Parser/parser/src/lexer/cursor.rs
Lines 30 to 42 in 7cbec09
/// Peeks the next character from the input stream without consuming it. | |
/// Returns [EOF_CHAR] if the file is at the end of the file. | |
pub(super) fn first(&self) -> char { | |
self.chars.clone().next().unwrap_or(EOF_CHAR) | |
} | |
/// Peeks the second character from the input stream without consuming it. | |
/// Returns [EOF_CHAR] if the position is past the end of the file. | |
pub(super) fn second(&self) -> char { | |
let mut chars = self.chars.clone(); | |
chars.next(); | |
chars.next().unwrap_or(EOF_CHAR) | |
} |
Note: It does make sense to use MultiPeek
if the Iterable is not cloneable or cloning is expensive. I think this doesn't apply here because storing an Option<char>
is about the same size as copying the slice start position.
Excellent. Thank you so much. |
first stab at astral-sh/ruff/issues/5907