Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid calling normalize_path with relative paths that extend beyond the current directory #3013

Merged
merged 2 commits into from
Apr 12, 2024

Conversation

charliermarsh
Copy link
Member

@charliermarsh charliermarsh commented Apr 12, 2024

Summary

It turns out that normalize_path (sourced from Cargo) has a subtle bug. If you pass it a relative path that traverses beyond the root, it silently drops components. So, e.g., passing ../foo/bar, it will just drop the leading .. and return foo/bar.

This PR encodes that behavior as a Result and avoids using it in such cases.

Closes #3012.

@charliermarsh charliermarsh marked this pull request as ready for review April 12, 2024 14:25
@charliermarsh charliermarsh added the bug Something isn't working label Apr 12, 2024

let cache_entry = cache.entry(
CacheBucket::Interpreter,
"",
format!("{}.msgpack", digest(&executable_bytes)),
format!("{}.msgpack", digest(&canonical)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@konstin - I think it's okay to cache under the canonical path here rather than the executable path... since we already use the canonical path for the timestamp. It could cause issues with executables that are symlinks (that's a common source of bugs here), but I think it's still fine since we're not using the canonical path to query the executable, which is what tends to cause issues.

Comment on lines 84 to 86
/// Normalize a path, removing things like `.` and `..`.
///
/// Assumes that the path is already absolute.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want to remove those? Should we add a debug assertion that they're not present at the beginning of the path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want to remove ..?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that what you're asking?

@@ -488,16 +488,15 @@ impl InterpreterInfo {
/// unless the Python executable changes, so we use the executable's last modified
/// time as a cache key.
pub(crate) fn query_cached(executable: &Path, cache: &Cache) -> Result<Self, Error> {
let executable_bytes = executable.as_os_str().as_encoded_bytes();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\cc @BurntSushi - I changed this to just hash the path directly. Not sure if it's equivalent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to grok why you need the hashes to be the same here. I'm not understanding that park. Like, even if this generates a different hash, that seems okay, it just means you'll get a cache miss? Or is there something deeper here that I'm missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't need to be the same!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was mostly a Chesterton's Fence of trying to understand why we did this in the first place.

@@ -36,6 +36,8 @@ impl VerbatimUrl {
}

/// Create a [`VerbatimUrl`] from a file path.
///
/// Assumes that the path is absolute.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we panic if it isn't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

pub fn normalize_path(path: impl AsRef<Path>) -> PathBuf {
let mut components = path.as_ref().components().peekable();
///
/// CAUTION: Assumes that the path is already absolute.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Can we panic if it isn't absolute?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

@@ -488,16 +488,15 @@ impl InterpreterInfo {
/// unless the Python executable changes, so we use the executable's last modified
/// time as a cache key.
pub(crate) fn query_cached(executable: &Path, cache: &Cache) -> Result<Self, Error> {
let executable_bytes = executable.as_os_str().as_encoded_bytes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -488,16 +488,15 @@ impl InterpreterInfo {
/// unless the Python executable changes, so we use the executable's last modified
/// time as a cache key.
pub(crate) fn query_cached(executable: &Path, cache: &Cache) -> Result<Self, Error> {
let executable_bytes = executable.as_os_str().as_encoded_bytes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to grok why you need the hashes to be the same here. I'm not understanding that park. Like, even if this generates a different hash, that seems okay, it just means you'll get a cache miss? Or is there something deeper here that I'm missing?

@charliermarsh
Copy link
Member Author

I guess normalize_path doesn't actually require an absolute path. It's just an error case that you could have more .. segments than you have parents. I'll change this to return Result if we try to pop in that case.

@charliermarsh charliermarsh marked this pull request as draft April 12, 2024 16:01
@charliermarsh charliermarsh changed the title Avoid calling normalize_path on possibly-relative paths Avoid calling normalize_path with relative paths that extend beyond the current directory Apr 12, 2024
@charliermarsh charliermarsh marked this pull request as ready for review April 12, 2024 18:31
@charliermarsh
Copy link
Member Author

Okay, could use another review here.

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@charliermarsh charliermarsh merged commit c43757a into main Apr 12, 2024
37 checks passed
@charliermarsh charliermarsh deleted the charlie/abs branch April 12, 2024 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

normalize_path breaks relative python interpreters
3 participants