Skip to content
This repository has been archived by the owner on Jul 27, 2023. It is now read-only.

fix the ranges of constants inside f-strings #33

Merged

Conversation

davidszotten
Copy link

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is heading in the right direction, but there are some remaining incorrect ranges. We'll have to review every single StringParser::range() call and replace it with the correct range.

The Cursor (used for lexing in the Ruff repository) has this concept where you can start a token. What it does is that the Cursor remembers where the start of the current token is. I think something similar would be helpful. The parser could define a start function that records the current location, and we introduce a new current_range() (and current_range_and_start_new() method, which returns the range of the current value.

@@ -167,7 +167,7 @@ expression: parse_ast
values: [
Constant(
ExprConstant {
range: 114..133,
range: 116..123,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range of the following FormattedValue must be incorrect because it overlaps with this Constant.

@@ -516,7 +522,7 @@ impl<'a> StringParser<'a> {
ast::ExprConstant {
value: content.into(),
kind: None,
range: self.range(),
range: TextRange::new(content_start, self.location),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The code so far consistently uses self.get_pos() over location

Suggested change
range: TextRange::new(content_start, self.location),
range: TextRange::new(content_start, self.get_pos()),

@davidszotten
Copy link
Author

davidszotten commented Jul 21, 2023

nice catch. will keep investigating

i think there might be other bugs here too, e.g. f'cc{dd=}' doesn't look right (the constant becomes ccdd ) update: maybe this is expected

@davidszotten
Copy link
Author

davidszotten commented Jul 22, 2023

f'cc{dd=}'

no i'm back to thinking this isn't what we want

i think this is currently following python's ast

>>> ast.dump(ast.parse("f'b{a=}'"))
"Module(body=[Expr(value=JoinedStr(values=[Constant(value='ba='), FormattedValue(value=Name(id='a', ctx=Load()), conversion=114)]))], type_ignores=[])"

which is fine for parsing but not for preserving the source

(f'{a=}' is short hand for f'a={a}')

probably makes sense to extract this into a separate issue

@davidszotten
Copy link
Author

raised astral-sh/ruff#5970

self-documenting f-strings (`f"{foo=}"`) are still outstanding
see astral-sh/ruff#5970
@davidszotten
Copy link
Author

updated. afict we can't store the position on self (like cursor) because the ranges are nested

@davidszotten davidszotten requested a review from MichaReiser July 24, 2023 08:42
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall. The complexity of this code is rather high. I hope that we could rewrite it at some point to use a Cursor based lexer (or use a dedicated lexer even).

There are a few tests that have, what seems to me, invalid ranges. I assume that's because of the TODO. Do you think it would be reasonable to fix these ranges as part of this PR as well without fixing the representation?

@@ -191,7 +183,7 @@ impl<'a> StringParser<'a> {
let mut conversion = ConversionFlag::None;
let mut self_documenting = false;
let mut trailing_seq = String::new();
let location = self.get_pos();
let location = self.get_pos() - TextSize::from(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guarnanteed that the character one position before is a ASCII character? I think it is because it is always the } token. Would it make sense to instead pass the start location as an argument?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for prodding me on these. think i have a better solution (though it does add a new dependency. happy to discuss)

parser/src/string.rs Outdated Show resolved Hide resolved
@@ -472,12 +470,13 @@ impl<'a> StringParser<'a> {
}
}
if !content.is_empty() {
let range = self.current_range(location).sub_end(1.into());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for substracting one character here? Is it safe to assume that the preceding character is ASCII? Should we instead pass the fstring_start_location as a function argument?

@@ -528,6 +528,7 @@ impl<'a> StringParser<'a> {

fn parse_bytes(&mut self) -> Result<Expr, LexicalError> {
let mut content = String::new();
let location = self.get_pos();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: What would you think of renaming location to start_location (same in other places)? I would find it useful because it encodes the kind of location.

@@ -583,25 +585,23 @@ impl<'a> StringParser<'a> {
} else if self.kind.is_any_bytes() {
self.parse_bytes().map(|expr| vec![expr])
} else {
self.parse_string().map(|expr| vec![expr])
dbg!(self.parse_string().map(|expr| vec![expr]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dbg!(self.parse_string().map(|expr| vec![expr]))
self.parse_string().map(|expr| vec![expr])

@davidszotten
Copy link
Author

I hope that we could rewrite it at some point to use a Cursor based lexer (or use a dedicated lexer even).

i saw you have a lexer rewrite pr open (but not read it yet). does that make this redundant?

@MichaReiser
Copy link
Member

i saw you have a lexer rewrite pr open (but not read it yet). does that make this redundant?

I don't think so. It brings in the Cursor abstraction that could help to simplify some of the code here.

@MichaReiser
Copy link
Member

This is ready to merge after we've resolved the remaining nits (either close as won't do or apply the change)

@davidszotten
Copy link
Author

thanks. will take a view on (probably fix) the comments not related to ranges. just wanted to resolve that first.

use multipeek so we can check for `{{` without having to consume the
first `{` and can instead leave that to the sub-parsers. this also means
we can get the location for the previous (now ending) range before
advancing the cursor
@@ -21,7 +22,7 @@ use rustpython_parser_core::{
const MAX_UNICODE_NAME: usize = 88;

struct StringParser<'a> {
chars: std::iter::Peekable<std::str::Chars<'a>>,
chars: MultiPeek<std::str::Chars<'a>>,
Copy link
Member

@MichaReiser MichaReiser Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we don't have to use MultiPeek. We can instead clone the chars iterator. Cloning is cheap because it only requires coping the start end end slice pointers (and I'm somewhat certain that LLVM can figure out that it only needs to copy the start pointer).

Using clone would allow us to remove the Peekable altogether, similar to what we do in the Lexer's Cursor:

/// Peeks the next character from the input stream without consuming it.
/// Returns [EOF_CHAR] if the file is at the end of the file.
pub(super) fn first(&self) -> char {
self.chars.clone().next().unwrap_or(EOF_CHAR)
}
/// Peeks the second character from the input stream without consuming it.
/// Returns [EOF_CHAR] if the position is past the end of the file.
pub(super) fn second(&self) -> char {
let mut chars = self.chars.clone();
chars.next();
chars.next().unwrap_or(EOF_CHAR)
}

Note: It does make sense to use MultiPeek if the Iterable is not cloneable or cloning is expensive. I think this doesn't apply here because storing an Option<char> is about the same size as copying the slice start position.

@MichaReiser
Copy link
Member

Excellent. Thank you so much.

@MichaReiser MichaReiser merged commit 13196fc into astral-sh:main Jul 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants