-
Notifications
You must be signed in to change notification settings - Fork 808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Win Trampoline: Use Python executable path encoded in binary #1803
Conversation
ff62bd7
to
9110f3c
Compare
9110f3c
to
d88ffdf
Compare
crates/uv-trampoline/src/bounce.rs
Outdated
|
||
// Start with a size of 1024 bytes which should be enough for most paths but avoids reading the | ||
// entire file. | ||
let mut buffer: Vec<u8> = vec![0; 1024]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the "incremental" reading by setting the capacity to 10
73f0bff
to
e772c05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w00t! So much Windows API. Nice work.
I do think there is at least one thing worth changing here before merging, which is the logic for reading the file path from the end of the binary. The main issue I think with the current implementation is that it trusts the path length, which could lead to the program allocating a huge amount of memory if something went wrong (whether intentional or not).
} else { | ||
b"python.exe" | ||
expect_result(unsafe { CloseHandle(file_handle) }, 0, || { | ||
String::from("Failed to close file handle") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so reading the above, I think it works like this:
- It first tries to read 1KB from the end of a file.
- If it finds the magic number, path length and path all within that 1KB, it stops.
- Otherwise, it resizes the buffer's capacity to whatever the decoded
path_len
is. - It then goes back to the first step again, but read
{path_len}KB
instead.
I think there might be a couple issues with this approach, both of which center around trusting path_len
:
- If
path_len
isu32::MAX
, then this will allocate 4GB. - If the file is being mutated while we're doing this in a very specific way, then it's possible this loop isn't guaranteed to terminate.
(2) is kind of a stretch, but (1) I think is a real enough issue.
My suggestion here would be to read 4KB from the end of the file, and if you can't find the path in there, consider it malformed and give up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough review. I'll carefully go over the implementation again tomorrow.
- shouldn't be a problem because we open the file with
FILE_SHARE_READ
only, prohibiting other processes from mutating or deleting the file while we're using it.
Not trusting path_len
makes sense as well does limiting. I'll probably go with a higher default, e.g. even URl have an upper limit of 2MB. Agreed, URLs may contain application data which is different but I don't think it hurts to go somewhat higher than 4KB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah good catch on (2). Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry:
The Windows API has many functions that also have Unicode versions to permit an extended-length path for a maximum total path length of 32,767 characters.
The maximum path of 32,767 characters is approximate, because the "\?" prefix may be expanded to a longer string by the system at run time, and this expansion applies to the total length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my read of that doc is that there isn't a pre-defined upper bound on how long a path can be. Does that match your understanding? If that's true, I still think we need to place some kind of reasonable upper bound on what we're willing to accept. I think 4KB is probably sufficient, but 32KB would be fine with me too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I limited it to 32KB
@charliermarsh / @zanieb If I want to add a test case as outlined in the test plan, how would I go about it? Can I install any python package with a binary script as part of the test or do we have a stub package that has a script entry point that I can use? |
@MichaReiser you could generate a stub package and add it to the repository or use some "small" real world package with a pinned version |
Do you have a link with some resources on how I would do that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @konstin for providing me with a test wheel!
c4c98f3
to
c13eef2
Compare
e6e011d
to
ed98eeb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thank you!
@@ -0,0 +1,3 @@ | |||
[unstable] | |||
build-std = ["core", "panic_abort", "alloc", "std"] | |||
build-std-features = ["compiler-builtins-mem"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to include this in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's intentional, considering that I won't land my std
branch anytime soon. It requires me to only type cargo build --release --target x86_64-pc-windows-msvc
instead of that plus the -Z
compiler flags.
if i64::from(bytes_to_read) > file_size { | ||
eprintln!("The length of the python executable path exceeds the file size. Verify that the path length is appended to the end of the launcher script as a u32 in little endian."); | ||
exit_with_status(1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I was unsure if this was limiting heap memory since bytes_to_read
was still being calculated based on path_len
, but I see above that path_len
is limited to a maximum. So I think that's right.
Summary
Fixes #1766
The first commit replaces the
check!
macro with a standalone function that can be reused for non-boolean returning functions to get nice error messages in case a syscall fails.The second commit changes the python discovery to work with symlinks too. It also changes a few existing calls to use safe APIs as a surprise for @BurntSushi ;)
The way this is implemented is that I changed the launcher file layout to:
Windows ignores any content after the executable, which is a fact that the launcher already makes use of today. Python ignores everything after the zip end, which allows us to pack additional data after the zip file. I decided to use the magic number
"UVUV"
at the end of the file as a "safety" mechanism that the file indeed has the right format rather than just assuming that the last u32 (little endian) is the length of the UTF8 encoded python path.Alternatives
Resolving symlinks
The downside of this approach is that we now need to read the executable which adds some complexity. An alternative to solve #1766 would be to extend our existing relative resolution to resolve symlinks before searching the python executable.
I intended that this also addresses #1779 where we use a global python installation instead, where simply following symlinks isn't sufficient anymore (assuming it is something we want to support). I've set up a repro PR that uses the new
uv
binary (the good news, is the error messages are much better). The job still fails withThe problem is that our wheel installer assumes that
python
is in.\Scripts\python.exe
but that's not the case for a regular Python installation where the binary is in the root folder. We would need to find the python installation when runninguv pip install
and pass the instance through to the wheel building code. This feels out of scope for this PR and probably requires input from someone more familiar withuv
than I. What I don't understand is why this works for unix where we, presumably, have the same problem (or are we just lucky because the binary in global install happens to be in abin
directory?)I'm open to changing the implementation to resolve symlinks instead, but this approach seemed more flexible.
Shebang parsing
The launcher used by
distutil
searches for the python script and then parses the shebang line to retrieve the Python executable name. This is kind of nice because it doesn't require a custom data format, it just works similarly to unix.The main downside that I'm seeing is that it requires a bit more parsing (and navigating) than the current approach. The launcher first needs to find the end of the zip file entry. From there, find the start of the script, and then parse the shebang.
I decided against this approach (but open to changing) because it is more involved and our launcher isn't intended to be used without
uv
where the extra ergonomics of simply having to write a shebang brings us much benefit.Binary size increase
One of the main reasons for the binary increase is that the binary now contains more static strings with possible error messages.
Test Plan
I followed the instructions in #1766 and the command now runs successfully