Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid calling normalize_path with relative paths that extend beyond the current directory #3013

Merged
merged 2 commits into from
Apr 12, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions crates/pep508-rs/src/verbatim_url.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ impl VerbatimUrl {
}

/// Create a [`VerbatimUrl`] from a file path.
///
/// Assumes that the path is absolute.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we panic if it isn't?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

pub fn from_path(path: impl AsRef<Path>) -> Self {
let path = path.as_ref();

Expand Down Expand Up @@ -76,7 +78,7 @@ impl VerbatimUrl {
};

// Normalize the path.
let path = normalize_path(path);
let path = normalize_path(&path);

// Extract the fragment, if it exists.
let (path, fragment) = split_fragment(&path);
Expand Down Expand Up @@ -110,7 +112,7 @@ impl VerbatimUrl {
};

// Normalize the path.
let path = normalize_path(path);
let path = normalize_path(&path);

// Extract the fragment, if it exists.
let (path, fragment) = split_fragment(&path);
Expand Down
14 changes: 11 additions & 3 deletions crates/uv-fs/src/path.rs
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,16 @@ pub fn normalize_url_path(path: &str) -> Cow<'_, str> {
/// Normalize a path, removing things like `.` and `..`.
///
/// Source: <https://github.com/rust-lang/cargo/blob/b48c41aedbd69ee3990d62a0e2006edbb506a480/crates/cargo-util/src/paths.rs#L76C1-L109C2>
pub fn normalize_path(path: impl AsRef<Path>) -> PathBuf {
let mut components = path.as_ref().components().peekable();
///
/// CAUTION: Assumes that the path is already absolute.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Can we panic if it isn't absolute?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

///
/// CAUTION: This does not resolve symlinks (unlike
/// [`std::fs::canonicalize`]). This may cause incorrect or surprising
/// behavior at times. This should be used carefully. Unfortunately,
/// [`std::fs::canonicalize`] can be hard to use correctly, since it can often
/// fail, or on Windows returns annoying device paths.
pub fn normalize_path(path: &Path) -> PathBuf {
let mut components = path.components().peekable();
let mut ret = if let Some(c @ Component::Prefix(..)) = components.peek().copied() {
components.next();
PathBuf::from(c.as_os_str())
Expand Down Expand Up @@ -235,7 +243,7 @@ mod tests {
use super::*;

#[test]
fn normalize() {
fn normalize_url() {
if cfg!(windows) {
assert_eq!(
normalize_url_path("/C:/Users/ferris/wheel-0.42.0.tar.gz"),
Expand Down
5 changes: 1 addition & 4 deletions crates/uv-interpreter/src/find_python.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ use std::path::PathBuf;
use tracing::{debug, instrument};

use uv_cache::Cache;
use uv_fs::normalize_path;
use uv_toolchain::PythonVersion;

use crate::interpreter::InterpreterInfoError;
Expand Down Expand Up @@ -52,9 +51,7 @@ pub fn find_requested_python(request: &str, cache: &Cache) -> Result<Option<Inte
Interpreter::query(executable, cache).map(Some)
} else {
// `-p /home/ferris/.local/bin/python3.10`
let executable = normalize_path(request);

Interpreter::query(executable, cache).map(Some)
Interpreter::query(request, cache).map(Some)
}
}

Expand Down
7 changes: 3 additions & 4 deletions crates/uv-interpreter/src/interpreter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -488,16 +488,15 @@ impl InterpreterInfo {
/// unless the Python executable changes, so we use the executable's last modified
/// time as a cache key.
pub(crate) fn query_cached(executable: &Path, cache: &Cache) -> Result<Self, Error> {
let executable_bytes = executable.as_os_str().as_encoded_bytes();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\cc @BurntSushi - I changed this to just hash the path directly. Not sure if it's equivalent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to grok why you need the hashes to be the same here. I'm not understanding that park. Like, even if this generates a different hash, that seems okay, it just means you'll get a cache miss? Or is there something deeper here that I'm missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't need to be the same!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was mostly a Chesterton's Fence of trying to understand why we did this in the first place.

let canonical = uv_fs::canonicalize_executable(executable)?;
let modified = Timestamp::from_path(&canonical)?;

let cache_entry = cache.entry(
CacheBucket::Interpreter,
"",
format!("{}.msgpack", digest(&executable_bytes)),
format!("{}.msgpack", digest(&canonical)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@konstin - I think it's okay to cache under the canonical path here rather than the executable path... since we already use the canonical path for the timestamp. It could cause issues with executables that are symlinks (that's a common source of bugs here), but I think it's still fine since we're not using the canonical path to query the executable, which is what tends to cause issues.

);

let modified = Timestamp::from_path(uv_fs::canonicalize_executable(executable)?)?;

// Read from the cache.
if cache
.freshness(&cache_entry, None)
Expand Down
Loading