Win Trampoline: Use Python executable path encoded in binary #1803

MichaReiser · 2024-02-21T11:21:02Z

Summary

The first commit replaces the check! macro with a standalone function that can be reused for non-boolean returning functions to get nice error messages in case a syscall fails.

The second commit changes the python discovery to work with symlinks too. It also changes a few existing calls to use safe APIs as a surprise for @BurntSushi ;)

The way this is implemented is that I changed the launcher file layout to:

| --------------------------------|
|           <launcher.exe>        |
| --------------------------------|
|       <zipped python scrip      |
| --------------------------------|
|           <python_path>         |
| --------------------------------|
|        len(<python_path>)       | 
| --------------------------------|
|              "UVUV"             |
| --------------------------------|

Windows ignores any content after the executable, which is a fact that the launcher already makes use of today. Python ignores everything after the zip end, which allows us to pack additional data after the zip file. I decided to use the magic number "UVUV" at the end of the file as a "safety" mechanism that the file indeed has the right format rather than just assuming that the last u32 (little endian) is the length of the UTF8 encoded python path.

Alternatives

Resolving symlinks

The downside of this approach is that we now need to read the executable which adds some complexity. An alternative to solve #1766 would be to extend our existing relative resolution to resolve symlinks before searching the python executable.
I intended that this also addresses #1779 where we use a global python installation instead, where simply following symlinks isn't sufficient anymore (assuming it is something we want to support). I've set up a repro PR that uses the new uv binary (the good news, is the error messages are much better). The job still fails with

Failed to spawn the python child process with command '"C:\hostedtoolcache\windows\Python\3.12.2\x64\Scripts\python.exe" "C:\hostedtoolcache\windows\Python\3.12.2\x64\Scripts\tox.exe" -vvvv --notest'

The problem is that our wheel installer assumes that python is in .\Scripts\python.exe but that's not the case for a regular Python installation where the binary is in the root folder. We would need to find the python installation when running uv pip install and pass the instance through to the wheel building code. This feels out of scope for this PR and probably requires input from someone more familiar with uv than I. What I don't understand is why this works for unix where we, presumably, have the same problem (or are we just lucky because the binary in global install happens to be in a bin directory?)

I'm open to changing the implementation to resolve symlinks instead, but this approach seemed more flexible.

Shebang parsing

The launcher used by distutil searches for the python script and then parses the shebang line to retrieve the Python executable name. This is kind of nice because it doesn't require a custom data format, it just works similarly to unix.

The main downside that I'm seeing is that it requires a bit more parsing (and navigating) than the current approach. The launcher first needs to find the end of the zip file entry. From there, find the start of the script, and then parse the shebang.

I decided against this approach (but open to changing) because it is more involved and our launcher isn't intended to be used without uv where the extra ergonomics of simply having to write a shebang brings us much benefit.

Binary size increase

One of the main reasons for the binary increase is that the binary now contains more static strings with possible error messages.

Test Plan

I followed the instructions in #1766 and the command now runs successfully

uv venv
uv pip install pycowsay
New-Item -Path .\pycatsay.exe -ItemType SymbolicLink -Value .\.venv\Scripts\pycowsay.exe
.\pycatsay.exe


<  >

   \   ^__^
    \  (oo)\_______
       (__)\       )\/\
           ||----w |
           ||     ||

MichaReiser · 2024-02-21T11:33:12Z

crates/uv-trampoline/src/bounce.rs

+
+    // Start with a size of 1024 bytes which should be enough for most paths but avoids reading the
+    // entire file.
+    let mut buffer: Vec<u8> = vec![0; 1024];


I tested the "incremental" reading by setting the capacity to 10

BurntSushi

w00t! So much Windows API. Nice work.

I do think there is at least one thing worth changing here before merging, which is the logic for reading the file path from the end of the binary. The main issue I think with the current implementation is that it trusts the path length, which could lead to the program allocating a huge amount of memory if something went wrong (whether intentional or not).

crates/uv-trampoline/src/bounce.rs

crates/install-wheel-rs/src/wheel.rs

BurntSushi · 2024-02-21T17:40:14Z

crates/uv-trampoline/src/bounce.rs

-    } else {
-        b"python.exe"
+    expect_result(unsafe { CloseHandle(file_handle) }, 0, || {
+        String::from("Failed to close file handle")


OK, so reading the above, I think it works like this:

It first tries to read 1KB from the end of a file.

If it finds the magic number, path length and path all within that 1KB, it stops.

Otherwise, it resizes the buffer's capacity to whatever the decoded path_len is.

It then goes back to the first step again, but read {path_len}KB instead.

I think there might be a couple issues with this approach, both of which center around trusting path_len:

If path_len is u32::MAX, then this will allocate 4GB.

If the file is being mutated while we're doing this in a very specific way, then it's possible this loop isn't guaranteed to terminate.

(2) is kind of a stretch, but (1) I think is a real enough issue.

My suggestion here would be to read 4KB from the end of the file, and if you can't find the path in there, consider it malformed and give up.

Thanks for the thorough review. I'll carefully go over the implementation again tomorrow.

shouldn't be a problem because we open the file with FILE_SHARE_READ only, prohibiting other processes from mutating or deleting the file while we're using it.

Not trusting path_len makes sense as well does limiting. I'll probably go with a higher default, e.g. even URl have an upper limit of 2MB. Agreed, URLs may contain application data which is different but I don't think it hurts to go somewhat higher than 4KB.

Ah yeah good catch on (2). Thanks!

From https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry:

The Windows API has many functions that also have Unicode versions to permit an extended-length path for a maximum total path length of 32,767 characters.
The maximum path of 32,767 characters is approximate, because the "\?" prefix may be expanded to a longer string by the system at run time, and this expansion applies to the total length.

So my read of that doc is that there isn't a pre-defined upper bound on how long a path can be. Does that match your understanding? If that's true, I still think we need to place some kind of reasonable upper bound on what we're willing to accept. I think 4KB is probably sufficient, but 32KB would be fine with me too.

I limited it to 32KB

MichaReiser · 2024-02-21T21:34:54Z

@charliermarsh / @zanieb If I want to add a test case as outlined in the test plan, how would I go about it? Can I install any python package with a binary script as part of the test or do we have a stub package that has a script entry point that I can use?

zanieb · 2024-02-21T21:38:37Z

@MichaReiser you could generate a stub package and add it to the repository or use some "small" real world package with a pinned version

MichaReiser · 2024-02-21T21:42:20Z

Do you have a link with some resources on how I would do that?

MichaReiser · 2024-02-22T08:59:32Z

scripts/wheels/simple_launcher-0.1.0-py3-none-any.whl

Thanks @konstin for providing me with a test wheel!

crates/uv/tests/pip_install.rs

BurntSushi

Nice, thank you!

BurntSushi · 2024-02-22T13:32:42Z

crates/uv-trampoline/.cargo/config.toml

@@ -0,0 +1,3 @@
+[unstable]
+build-std = ["core", "panic_abort", "alloc", "std"]
+build-std-features = ["compiler-builtins-mem"]


Did you mean to include this in this PR?

Yes that's intentional, considering that I won't land my std branch anytime soon. It requires me to only type cargo build --release --target x86_64-pc-windows-msvc instead of that plus the -Z compiler flags.

BurntSushi · 2024-02-22T13:42:32Z

crates/uv-trampoline/src/bounce.rs

+            if i64::from(bytes_to_read) > file_size {
+                eprintln!("The length of the python executable path exceeds the file size. Verify that the path length is appended to the end of the launcher script as a u32 in little endian.");
+                exit_with_status(1);
+            }


At first I was unsure if this was limiting heap memory since bytes_to_read was still being calculated based on path_len, but I see above that path_len is limited to a maximum. So I think that's right.

MichaReiser force-pushed the win-trampoline-symlinks branch from ff62bd7 to 9110f3c Compare February 21, 2024 11:29

MichaReiser added bug Something isn't working windows Specific to the Windows platform labels Feb 21, 2024

MichaReiser force-pushed the win-trampoline-symlinks branch from 9110f3c to d88ffdf Compare February 21, 2024 11:30

MichaReiser commented Feb 21, 2024

View reviewed changes

MichaReiser force-pushed the win-trampoline-symlinks branch 3 times, most recently from 73f0bff to e772c05 Compare February 21, 2024 15:29

MichaReiser marked this pull request as ready for review February 21, 2024 15:30

MichaReiser requested review from konstin and BurntSushi February 21, 2024 15:30

AlexWaygood changed the title ~~Win Trampoline: Use Pyhton executable path encoded in binary~~ Win Trampoline: Use Python executable path encoded in binary Feb 21, 2024

BurntSushi requested changes Feb 21, 2024

View reviewed changes

MichaReiser commented Feb 22, 2024

View reviewed changes

scripts/wheels/simple_launcher-0.1.0-py3-none-any.whl Outdated

Copy link

Member Author

MichaReiser Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @konstin for providing me with a test wheel!

MichaReiser added 4 commits February 22, 2024 10:07

Replace check macro with regular function,

ed318dd

Lookup Python executable from binary

649f8c5

Add tests

d93faf8

Distrust path length

c13eef2

MichaReiser force-pushed the win-trampoline-symlinks branch from c4c98f3 to c13eef2 Compare February 22, 2024 09:08

Fix test snapshots

ad59171

konstin reviewed Feb 22, 2024

View reviewed changes

crates/uv/tests/pip_install.rs Outdated Show resolved Hide resolved

MichaReiser mentioned this pull request Feb 22, 2024

uv fails to find python in non-venv install on Windows #1779

Closed

Reduce path limit to 32KB, add .cargo/config.toml

a07e5f1

MichaReiser requested a review from BurntSushi February 22, 2024 13:24

Rename tests

ed98eeb

MichaReiser force-pushed the win-trampoline-symlinks branch from e6e011d to ed98eeb Compare February 22, 2024 13:39

BurntSushi approved these changes Feb 22, 2024

View reviewed changes

MichaReiser merged commit 12a96ad into main Feb 22, 2024
7 checks passed

MichaReiser deleted the win-trampoline-symlinks branch February 22, 2024 15:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Win Trampoline: Use Python executable path encoded in binary #1803

Win Trampoline: Use Python executable path encoded in binary #1803

MichaReiser commented Feb 21, 2024 •

edited

Loading

MichaReiser Feb 21, 2024

BurntSushi left a comment

BurntSushi Feb 21, 2024

MichaReiser Feb 21, 2024

BurntSushi Feb 21, 2024

konstin Feb 22, 2024

BurntSushi Feb 22, 2024 •

edited

Loading

MichaReiser Feb 22, 2024

MichaReiser commented Feb 21, 2024

zanieb commented Feb 21, 2024

MichaReiser commented Feb 21, 2024

MichaReiser Feb 22, 2024

BurntSushi left a comment

BurntSushi Feb 22, 2024

MichaReiser Feb 22, 2024

BurntSushi Feb 22, 2024

Win Trampoline: Use Python executable path encoded in binary #1803

Win Trampoline: Use Python executable path encoded in binary #1803

Conversation

MichaReiser commented Feb 21, 2024 • edited Loading

Summary

Alternatives

Resolving symlinks

Shebang parsing

Binary size increase

Test Plan

Choose a reason for hiding this comment

BurntSushi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BurntSushi Feb 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaReiser commented Feb 21, 2024

zanieb commented Feb 21, 2024

MichaReiser commented Feb 21, 2024

Choose a reason for hiding this comment

BurntSushi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaReiser commented Feb 21, 2024 •

edited

Loading

BurntSushi Feb 22, 2024 •

edited

Loading