-
Notifications
You must be signed in to change notification settings - Fork 760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: report line and column on requirements parser errors #2100
Changes from 2 commits
20b85f2
d0f7161
223fd99
d742401
b80d83c
5f33e06
88bf1d8
e47ca80
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -317,6 +317,27 @@ pub struct RequirementsTxt { | |
pub no_index: bool, | ||
} | ||
|
||
/// Calculates column and line based on the cursor and content. | ||
fn calculate_line_column_pair(content: &str, position: usize) -> (usize, usize) { | ||
let mut line = 1; | ||
let mut column = 1; | ||
|
||
for (index, char) in content.char_indices() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we want byte indices rather than char indices here? CC @BurntSushi There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
let content = "a💩b";
for (i, ch) in content.char_indices() {
eprintln!("{}:{}", i, ch);
} Has this output:
Since |
||
if index >= position { | ||
break; | ||
} | ||
// This should work fine for both Windows and Linux line endings | ||
if char == '\n' { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So here, if we see There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Agreed, I left this as-is on purpose but I really debated to either keep this for simplicity or track the prev_char reference for these types of checks. Luckily it can be changed easily whichever route we want to go. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd prefer to change it just for completeness, in case this gets reused elsewhere. Are you ok to modify it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let me know if this is what you had in mind d0f7161 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was aiming for something more like: if we see There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for the delay (work 😆), hopefully this is closer 223fd99 |
||
line += 1; | ||
column = 1; | ||
} else if char != '\r' { | ||
column += 1; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would suggest using the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should be using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I agree that we should do whatever editors are likely to use here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I think char_indices is okay here, but I would like to tweak the newline handling slightly in line with the Ruff reference above. |
||
} | ||
} | ||
|
||
(line, column) | ||
} | ||
|
||
impl RequirementsTxt { | ||
/// See module level documentation | ||
#[instrument(skip_all, fields(requirements_txt = requirements_txt.as_ref().as_os_str().to_str()))] | ||
|
@@ -412,9 +433,11 @@ impl RequirementsTxt { | |
} | ||
RequirementsTxtStatement::IndexUrl(url) => { | ||
if data.index_url.is_some() { | ||
let (line, column) = calculate_line_column_pair(content, s.cursor()); | ||
return Err(RequirementsTxtParserError::Parser { | ||
message: "Multiple `--index-url` values provided".to_string(), | ||
location: s.cursor(), | ||
line, | ||
column, | ||
}); | ||
} | ||
data.index_url = Some(url); | ||
|
@@ -453,36 +476,36 @@ fn parse_entry( | |
eat_wrappable_whitespace(s); | ||
while s.at(['\n', '\r', '#']) { | ||
// skip comments | ||
eat_trailing_line(s)?; | ||
eat_trailing_line(content, s)?; | ||
eat_wrappable_whitespace(s); | ||
} | ||
|
||
let start = s.cursor(); | ||
Ok(Some(if s.eat_if("-r") || s.eat_if("--requirement") { | ||
let requirements_file = parse_value(s, |c: char| !['\n', '\r', '#'].contains(&c))?; | ||
let requirements_file = parse_value(content, s, |c: char| !['\n', '\r', '#'].contains(&c))?; | ||
let end = s.cursor(); | ||
eat_trailing_line(s)?; | ||
eat_trailing_line(content, s)?; | ||
RequirementsTxtStatement::Requirements { | ||
filename: requirements_file.to_string(), | ||
start, | ||
end, | ||
} | ||
} else if s.eat_if("-c") || s.eat_if("--constraint") { | ||
let constraints_file = parse_value(s, |c: char| !['\n', '\r', '#'].contains(&c))?; | ||
let constraints_file = parse_value(content, s, |c: char| !['\n', '\r', '#'].contains(&c))?; | ||
let end = s.cursor(); | ||
eat_trailing_line(s)?; | ||
eat_trailing_line(content, s)?; | ||
RequirementsTxtStatement::Constraint { | ||
filename: constraints_file.to_string(), | ||
start, | ||
end, | ||
} | ||
} else if s.eat_if("-e") || s.eat_if("--editable") { | ||
let path_or_url = parse_value(s, |c: char| !['\n', '\r'].contains(&c))?; | ||
let path_or_url = parse_value(content, s, |c: char| !['\n', '\r'].contains(&c))?; | ||
let editable_requirement = EditableRequirement::parse(path_or_url, working_dir) | ||
.map_err(|err| err.with_offset(start))?; | ||
RequirementsTxtStatement::EditableRequirement(editable_requirement) | ||
} else if s.eat_if("-i") || s.eat_if("--index-url") { | ||
let given = parse_value(s, |c: char| !['\n', '\r'].contains(&c))?; | ||
let given = parse_value(content, s, |c: char| !['\n', '\r'].contains(&c))?; | ||
let url = VerbatimUrl::parse(given) | ||
.map(|url| url.with_given(given.to_owned())) | ||
.map_err(|err| RequirementsTxtParserError::Url { | ||
|
@@ -493,7 +516,7 @@ fn parse_entry( | |
})?; | ||
RequirementsTxtStatement::IndexUrl(url) | ||
} else if s.eat_if("--extra-index-url") { | ||
let given = parse_value(s, |c: char| !['\n', '\r'].contains(&c))?; | ||
let given = parse_value(content, s, |c: char| !['\n', '\r'].contains(&c))?; | ||
let url = VerbatimUrl::parse(given) | ||
.map(|url| url.with_given(given.to_owned())) | ||
.map_err(|err| RequirementsTxtParserError::Url { | ||
|
@@ -506,7 +529,7 @@ fn parse_entry( | |
} else if s.eat_if("--no-index") { | ||
RequirementsTxtStatement::NoIndex | ||
} else if s.eat_if("--find-links") || s.eat_if("-f") { | ||
let path_or_url = parse_value(s, |c: char| !['\n', '\r'].contains(&c))?; | ||
let path_or_url = parse_value(content, s, |c: char| !['\n', '\r'].contains(&c))?; | ||
let path_or_url = FindLink::parse(path_or_url, working_dir).map_err(|err| { | ||
RequirementsTxtParserError::Url { | ||
source: err, | ||
|
@@ -524,11 +547,13 @@ fn parse_entry( | |
editable: false, | ||
}) | ||
} else if let Some(char) = s.peek() { | ||
let (line, column) = calculate_line_column_pair(content, s.cursor()); | ||
return Err(RequirementsTxtParserError::Parser { | ||
message: format!( | ||
"Unexpected '{char}', expected '-c', '-e', '-r' or the start of a requirement" | ||
), | ||
location: s.cursor(), | ||
line, | ||
column, | ||
}); | ||
} else { | ||
// EOF | ||
|
@@ -549,7 +574,7 @@ fn eat_wrappable_whitespace<'a>(s: &mut Scanner<'a>) -> &'a str { | |
} | ||
|
||
/// Eats the end of line or a potential trailing comma | ||
fn eat_trailing_line(s: &mut Scanner) -> Result<(), RequirementsTxtParserError> { | ||
fn eat_trailing_line(content: &str, s: &mut Scanner) -> Result<(), RequirementsTxtParserError> { | ||
s.eat_while([' ', '\t']); | ||
match s.eat() { | ||
None | Some('\n') => {} // End of file or end of line, nothing to do | ||
|
@@ -563,9 +588,11 @@ fn eat_trailing_line(s: &mut Scanner) -> Result<(), RequirementsTxtParserError> | |
} | ||
} | ||
Some(other) => { | ||
let (line, column) = calculate_line_column_pair(content, s.cursor()); | ||
return Err(RequirementsTxtParserError::Parser { | ||
message: format!("Expected comment or end-of-line, found '{other}'"), | ||
location: s.cursor(), | ||
line, | ||
column, | ||
}); | ||
} | ||
} | ||
|
@@ -669,8 +696,8 @@ fn parse_requirement_and_hashes( | |
} | ||
})?; | ||
let hashes = if has_hashes { | ||
let hashes = parse_hashes(s)?; | ||
eat_trailing_line(s)?; | ||
let hashes = parse_hashes(content, s)?; | ||
eat_trailing_line(content, s)?; | ||
hashes | ||
} else { | ||
Vec::new() | ||
|
@@ -679,32 +706,35 @@ fn parse_requirement_and_hashes( | |
} | ||
|
||
/// Parse `--hash=... --hash ...` after a requirement | ||
fn parse_hashes(s: &mut Scanner) -> Result<Vec<String>, RequirementsTxtParserError> { | ||
fn parse_hashes(content: &str, s: &mut Scanner) -> Result<Vec<String>, RequirementsTxtParserError> { | ||
let mut hashes = Vec::new(); | ||
if s.eat_while("--hash").is_empty() { | ||
let (line, column) = calculate_line_column_pair(content, s.cursor()); | ||
return Err(RequirementsTxtParserError::Parser { | ||
message: format!( | ||
"Expected '--hash', found '{:?}'", | ||
s.eat_while(|c: char| !c.is_whitespace()) | ||
), | ||
location: s.cursor(), | ||
line, | ||
column, | ||
}); | ||
} | ||
let hash = parse_value(s, |c: char| !c.is_whitespace())?; | ||
let hash = parse_value(content, s, |c: char| !c.is_whitespace())?; | ||
hashes.push(hash.to_string()); | ||
loop { | ||
eat_wrappable_whitespace(s); | ||
if !s.eat_if("--hash") { | ||
break; | ||
} | ||
let hash = parse_value(s, |c: char| !c.is_whitespace())?; | ||
let hash = parse_value(content, s, |c: char| !c.is_whitespace())?; | ||
hashes.push(hash.to_string()); | ||
} | ||
Ok(hashes) | ||
} | ||
|
||
/// In `-<key>=<value>` or `-<key> value`, this parses the part after the key | ||
fn parse_value<'a, T>( | ||
content: &str, | ||
s: &mut Scanner<'a>, | ||
while_pattern: impl Pattern<T>, | ||
) -> Result<&'a str, RequirementsTxtParserError> { | ||
|
@@ -716,9 +746,11 @@ fn parse_value<'a, T>( | |
s.eat_whitespace(); | ||
Ok(s.eat_while(while_pattern).trim_end()) | ||
} else { | ||
let (line, column) = calculate_line_column_pair(content, s.cursor()); | ||
Err(RequirementsTxtParserError::Parser { | ||
message: format!("Expected '=' or whitespace, found {:?}", s.peek()), | ||
location: s.cursor(), | ||
line, | ||
column, | ||
}) | ||
} | ||
} | ||
|
@@ -746,7 +778,8 @@ pub enum RequirementsTxtParserError { | |
MissingEditablePrefix(String), | ||
Parser { | ||
message: String, | ||
location: usize, | ||
line: usize, | ||
column: usize, | ||
}, | ||
UnsupportedRequirement { | ||
source: Pep508Error, | ||
|
@@ -786,9 +819,14 @@ impl RequirementsTxtParserError { | |
Self::UnsupportedUrl(url) => Self::UnsupportedUrl(url), | ||
Self::MissingRequirementPrefix(given) => Self::MissingRequirementPrefix(given), | ||
Self::MissingEditablePrefix(given) => Self::MissingEditablePrefix(given), | ||
Self::Parser { message, location } => Self::Parser { | ||
Self::Parser { | ||
message, | ||
line, | ||
column, | ||
} => Self::Parser { | ||
message, | ||
location: location + offset, | ||
line, | ||
column, | ||
}, | ||
Self::UnsupportedRequirement { source, start, end } => Self::UnsupportedRequirement { | ||
source, | ||
|
@@ -831,8 +869,12 @@ impl Display for RequirementsTxtParserError { | |
"Requirement `{given}` looks like a directory but was passed as a package name. Did you mean `-e {given}`?" | ||
) | ||
} | ||
Self::Parser { message, location } => { | ||
write!(f, "{message} at position {location}") | ||
Self::Parser { | ||
message, | ||
line, | ||
column, | ||
} => { | ||
write!(f, "{message} at position {line}:{column}") | ||
} | ||
Self::UnsupportedRequirement { start, end, .. } => { | ||
write!(f, "Unsupported requirement in position {start} to {end}") | ||
|
@@ -903,10 +945,14 @@ impl Display for RequirementsTxtFileError { | |
self.file.simplified_display(), | ||
) | ||
} | ||
RequirementsTxtParserError::Parser { message, location } => { | ||
RequirementsTxtParserError::Parser { | ||
message, | ||
line, | ||
column, | ||
} => { | ||
write!( | ||
f, | ||
"{message} in `{}` at position {location}", | ||
"{message} in `{}` at position {line}:{column}", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you make that format There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed in 223fd99 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note, mileage may vary, I noticed that for full-paths it does highlight on my IDE, but not relative paths like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think using |
||
self.file.simplified_display(), | ||
) | ||
} | ||
|
@@ -1279,4 +1325,34 @@ mod test { | |
Some(("../editable[", "[dev]")) | ||
); | ||
} | ||
|
||
#[test] | ||
fn parser_error_line_and_column() -> Result<()> { | ||
let temp_dir = assert_fs::TempDir::new()?; | ||
let requirements_txt = temp_dir.child("requirements.txt"); | ||
requirements_txt.write_str(indoc! {" | ||
numpy>=1,<2 | ||
--borken | ||
tqdm | ||
"})?; | ||
|
||
let error = RequirementsTxt::parse(requirements_txt.path(), temp_dir.path()).unwrap_err(); | ||
let errors = anyhow::Error::new(error).chain().join("\n"); | ||
|
||
let requirement_txt = | ||
regex::escape(&requirements_txt.path().simplified_display().to_string()); | ||
let filters = vec![ | ||
(requirement_txt.as_str(), "<REQUIREMENTS_TXT>"), | ||
(r"\\", "/"), | ||
]; | ||
insta::with_settings!({ | ||
filters => filters | ||
}, { | ||
insta::assert_display_snapshot!(errors, @r###" | ||
Unexpected '-', expected '-c', '-e', '-r' or the start of a requirement in `<REQUIREMENTS_TXT>` at position 2:3 | ||
"###); | ||
}); | ||
|
||
Ok(()) | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add a couple of tests that include non-ASCII codepoints. More concretely, one test with a non-ASCII codepoint like, say, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done in 223fd99, included one with two codepoints and your example one with three. |
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you use
unicode-width
per my suggestion below, can you just add a quick note here documenting that we define column in this context as the, "offset according to the visual width of each codepoint."There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
unicode-width
is not the right thing to use, then just adding, "offset according to the number of Unicode codepoints."There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it as a comment in 223fd99