Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shared env: executables perms problem on install #5581

Closed
stas00 opened this issue Jul 30, 2024 · 48 comments · Fixed by #5582
Closed

shared env: executables perms problem on install #5581

stas00 opened this issue Jul 30, 2024 · 48 comments · Fixed by #5582
Assignees
Labels
bug Something isn't working

Comments

@stas00
Copy link

stas00 commented Jul 30, 2024

As a follow up to #5496 - there is another (new?) perms issue.

There are 2 possibly related issues

  1. I have upgraded to 0.2.31
$ uv pip install py-spy
Resolved 1 package in 1m 33s
error: Failed to install: py_spy-0.3.14-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.http.whl (py-spy==0.3.14)
  Caused by: failed to set permissions for file `/env/lib/conda/stas-inference/bin/py-spy`
  Caused by: Operation not permitted (os error 1)

and the file has the correct perms, but is owned by another user in the cache and I think it copied the file from the cache:

-rwxrwxrwx 2 foo foo 9.4M Jul 17 05:13 /env/lib/conda/stas-inference/bin/py-spy

Not sure if it copies it from this file?

-rwxrwxrwx 3 foo foo 9.4M Jul 17 05:13 /env/cache/uv/archive-v0/KXNvF2h2RHusTTHhLSDxY/py_spy-0.3.14.data/scripts/py-spy
*

So that 0666 in this case has to be 0777 perhaps for executable files here #5498

So currently nobody can use uv on our setup because of that.

  1. another user reported a similar issue:
error: Failed to install: deepspeed-0.14.4-py3-none-any.whl (deepspeed==0.14.4)
  Caused by: failed to hardlink file from /data/env/cache/uv/archive-v0/v9yFzmI6xnIw5rZHZhlUq/deepspeed-0.14.4.data/scripts/ds_ssh to /env/lib/conda/graphrag_raw/lib/python3.10/site-packages/deepspeed-0.14.4.data/scripts/ds_ssh
  Caused by: Operation not permitted (os error 1)

the source file that the other user is trying to hardlink to via uv pip install deepspeed is owned by me and it has the correct perms:

l /data/env/cache/uv/archive-v0/v9yFzmI6xnIw5rZHZhlUq/deepspeed-0.14.4.data/scripts/ds_ssh
-rwxrwxrwx 2 stas stas 680 Jul 26 23:05 /data/env/cache/uv/archive-v0/v9yFzmI6xnIw5rZHZhlUq/deepspeed-0.14.4.data/scripts/ds_ssh

I think it's the same problem. But I'm not 100% sure, since this is hardlinking.

@charliermarsh

@charliermarsh
Copy link
Member

Are the permissions 0o755? And you want 0o777?

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

The perms of the source file are correct, but uv fails. why is it trying to change perms? I thought perhaps it's trying to force a+rwx (0777) to a+rw (0666)?

@charliermarsh
Copy link
Member

The file gets moved. We then set the executable bit on it.

@charliermarsh
Copy link
Member

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

Everything works fine for the first user to win the race of being the first to install a new version of a package. The next user installing the same package version into a different conda env fails to install it.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

It's intended to be this: https://github.com/pypa/pip/blob/0f21fb920647f07439c3ec9fb29579c0e00072ec/src/pip/_internal/utils/unpacking.py#L88. Perhaps it's wrong as-is.

Perhaps it should first check if this operation is redundant and the right perms have been set already?

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

Here is what is happening:

sudo su usera
touch /tmp/foo
chmod a+rwx /tmp/foo
l /tmp/foo
-rwxrwxrwx 1 usera usera 0 Jul 30 00:47 /tmp/foo*

then:

sudo su userb
chmod a+rwx /tmp/foo
chmod: changing permissions of '/tmp/foo': Operation not permitted

I think that's what's happening with uv

@charliermarsh
Copy link
Member

I see... So, in your case, only the first install works, since that user owns the file and can change its permissions? But subsequent installs fail, because they can't change the permissions?

I'll think on it. We can either make it robust to these kinds of failures, or possibly remove the chmod for files that are copied directly from the cache.

@charliermarsh
Copy link
Member

I need to test it out. It's very hard to get right.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

I'm not able to reproduce the hardlinking problem outside of uv - it'd help to know how it does it.

sudo su usera
touch /tmp/foo
chmod a+rwx /tmp/foo
l /tmp/foo
-rwxrwxrwx 1 usera usera 0 Jul 30 00:47 /tmp/foo*

then:

sudo su userb
ln /tmp/foo /tmp/bar 

no Operation not permitted error in this case.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

We can either make it robust to these kinds of failures, or possibly remove the chmod for files that are copied directly from the cache.

or perhaps you need to check if you actually need to chmod when you copy from cache by checking the current perms first.

but from #5581 (comment) we now know that it's not the 0666 vs 0777 issue - so this problem is not related to the previous PR I linked to. The problem is doing chmod on a file not owned by the user running it.

@charliermarsh
Copy link
Member

Correct. I see the issue, I believe I can fix it.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

Thank you, Charles

if after you merge the fix PR you could give me a wheel I could test with (or I guess I could try to patch 0.2.31) then it'd be easier - because as I said hardlinking is potentially another issue to resolve.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

do you want me to create another issue for:

error: Failed to install: deepspeed-0.14.4-py3-none-any.whl (deepspeed==0.14.4)
  Caused by: failed to hardlink file from /data/env/cache/uv/archive-v0/v9yFzmI6xnIw5rZHZhlUq/deepspeed-0.14.4.data/scripts/ds_ssh to /env/lib/conda/graphrag_raw/lib/python3.10/site-packages/deepspeed-0.14.4.data/scripts/ds_ssh
  Caused by: Operation not permitted (os error 1)

as I don't see any treatment of hardlinking in the PR you proposed. Or we could work it out one item at a time. Whatever works for you.

@charliermarsh
Copy link
Member

Sure. Any more info you can provide would be great, not sure what's going on there. (It doesn't say "Permission denied" so it may not be a permissions issue.)

@charliermarsh
Copy link
Member

If you want, there are executables available on #5582.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

I confirm that this PR/executable resolves the first issue.

I need to figure out next how to reproduce the 2nd one.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

ok, I can reproduce this w/ the binary you shared. Placed it under /home/stas/uv which you will see in the examples below.

This works for the first user:

$ sudo su usera
$ /home/stas/uv pip uninstall deepspeed
Uninstalled 1 package in 2.69s
 - deepspeed==0.14.4
$ /home/stas/uv pip install deepspeed
Resolved 34 packages in 517ms
Installed 1 package in 2.42s
 + deepspeed==0.14.4

then 2nd user installing into a different conda env:

$ sudo su userb
$ /home/stas/uv pip uninstall deepspeed
Uninstalled 1 package in 2.65s
 - deepspeed==0.14.4
$ /home/stas/uv pip install deepspeed
Resolved 34 packages in 280ms
error: Failed to install: deepspeed-0.14.4-py3-none-any.whl (deepspeed==0.14.4)
  Caused by: failed to hardlink file from /data/env/cache/uv/archive-v0/3bJMARV6EJsb_nzOq2K_S/deepspeed-0.14.4.data/scripts/ds_ssh to /env/lib/conda/tr058-mistral-7b-ft-fin/lib/python3.10/site-packages/deepspeed-0.14.4.data/scripts/ds_ssh
  Caused by: Operation not permitted (os error 1)

I can repro this:

ln /data/env/cache/uv/archive-v0/3bJMARV6EJsb_nzOq2K_S/deepspeed-0.14.4.data/scripts/ds_ssh /env/lib/conda/tr058-mistral-7b-ft-fin/lib/python3.10/site-packages/deepspeed-0.14.4.data/scripts/ds_ssh
ln: failed to create hard link '/env/lib/conda/tr058-mistral-7b-ft-fin/lib/python3.10/site-packages/deepspeed-0.14.4.data/scripts/ds_ssh' => '/data/env/cache/uv/archive-v0/3bJMARV6EJsb_nzOq2K_S/deepspeed-0.14.4.data/scripts/ds_ssh': Operation not permitted

@charliermarsh
Copy link
Member

Why would the hard link already exist though?

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

one, sec, I made a mistake. I edited my repro-comment above.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

so it appears that if any parent dir in the link's path is not owned by the user who is trying to hardlink (or perhaps the ownership of the source file?) - it'd fail.

$ ln /data/env/cache/uv/archive-v0/3bJMARV6EJsb_nzOq2K_S/deepspeed-0.14.4.data/scripts/ds_ssh /data/env/lib/conda/tr058-mistral-7b-ft-fin/lib/python3.10/site-packages/deepspeed-0.14.4.data/scripts/ds_ssh
ln: failed to create hard link '/data/env/lib/conda/tr058-mistral-7b-ft-fin/lib/python3.10/site-packages/deepspeed-0.14.4.data/scripts/ds_ssh' => '/data/env/cache/uv/archive-v0/3bJMARV6EJsb_nzOq2K_S/deepspeed-0.14.4.data/scripts/ds_ssh': Operation not permitted

symlinking works fine though:

$ ln -s /data/env/cache/uv/archive-v0/3bJMARV6EJsb_nzOq2K_S/deepspeed-0.14.4.data/scripts/ds_ssh /data/env/lib/conda/tr058-mistral-7b-ft-fin/lib/python3.10/site-packages/deepspeed-0.14.4.data/scripts/ds_ssh

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

yes, it's the ownership of the source file that matters. The link's path must have the same ownership as the source it seems.

I sudo chown'ed the source to the target's username and then hardlink works.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

why not use cp instead of ln if you're concerned the cache may go away? cp works just fine:

$ ln /data/env/cache/uv/archive-v0/3bJMARV6EJsb_nzOq2K_S/deepspeed-0.14.4.data/scripts/ds_ssh /data/env/lib/conda/tr058-mistral-7b-ft-fin/lib/python3.10/site-packages/deepspeed-0.14.4.data/scripts/ds_ssh
ln: failed to create hard link '/data/env/lib/conda/tr058-mistral-7b-ft-fin/lib/python3.10/site-packages/deepspeed-0.14.4.data/scripts/ds_ssh' => '/data/env/cache/uv/archive-v0/3bJMARV6EJsb_nzOq2K_S/deepspeed-0.14.4.data/scripts/ds_ssh': Operation not permitted

$ cp /data/env/cache/uv/archive-v0/3bJMARV6EJsb_nzOq2K_S/deepspeed-0.14.4.data/scripts/ds_ssh /data/env/lib/conda/tr058-mistral-7b-ft-fin/lib/python3.10/site-packages/deepspeed-0.14.4.data/scripts/ds_ssh
# no error

why do you want that the files are linked? it's not like someone is going to edit those files. In fact someone might edit a hardlinked file and it'd impact all conda envs at once - and potentially break other people's envs. I think those should be stand-alone isolated copies.

@charliermarsh
Copy link
Member

@stas00 -- The behavior is configurable. You can set --link-mode=copy on the command-line or link-mode = copy in uv.toml or pyproject.toml. Hardlinks are just the default on Linux -- they're much faster and much more space-efficient since you don't have isolated copies of every file in every project. (On macOS we use copy-on-write links which is ideal but not supported on most Linux filesystems.)

@charliermarsh charliermarsh added the bug Something isn't working label Jul 30, 2024
@charliermarsh charliermarsh self-assigned this Jul 30, 2024
@stas00
Copy link
Author

stas00 commented Jul 30, 2024

That's unfortunate. uv is advertising itself as a drop-in replacement. But if one needs to create a config file somewhere whereas pip just works and doesn't use hardlinks - your workaround is not going to work in a shared environment. I have no place to create uv.toml - this is meant to be a system-wide tool.

Will have to revert to the slow pip :(

@zanieb
Copy link
Member

zanieb commented Jul 30, 2024

Sorry that this is frustrating, but we have made some decisions differently from pip for performance and correctness.

You can set UV_CONFIG_FILE on your system and place the file wherever you please.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

Also as I suggested earlier hardlinking is unsafe in this situation. One user modifying a file in their env will impact all conda envs. If they broke it it'll be broken everywhere else. So I think copy is the correct default in a shared environment.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

You can set UV_CONFIG_FILE on your system and place the file wherever you please.

That's very helpful, @zanieb - thank you. I will try that!

@charliermarsh
Copy link
Member

Candidly, you seem to have a fairly bespoke setup — this has never come up in millions upon millions of usages. It doesn’t seem like a lot to ask that you set some global configuration, just as you would for setting up a custom index.

Regardless, I’ll probably just make this fallback to copy with a warning like we do for hard linking across drives.

@charliermarsh
Copy link
Member

You can also set UV_LINK_MODE=copy if you want to avoid a configuration file altogether.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

All we want to do is to avoid each user having a copy of uv cache - instead we share it - what's so special about it? We do the same for conda and pip and many other things - otherwise there will be an insane disk space usage when you multiply each cache by dozens of users.

Thank you for the UV_LINK_MODE=copy suggestion - that would work well. UV_CONFIG_FILE I can easily handle too. I will test it out and let you know if there are still any hiccups remaining.

I appreciate your amazing support, @charliermarsh and @zanieb and wanting to meet the needs of outliers too.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

@charliermarsh, your suggestion doesn't work:

UV_LINK_MODE=copy /home/stas/uv pip install deepspeed
Resolved 34 packages in 211ms
error: Failed to install: deepspeed-0.14.4-py3-none-any.whl (deepspeed==0.14.4)
  Caused by: failed to copy file from /data/env/cache/uv/archive-v0/3bJMARV6EJsb_nzOq2K_S/deepspeed-0.14.4.data/scripts/ds_elastic to /env/lib/conda/tr058-mistral-7b-ft-fin/lib/python3.10/site-packages/deepspeed-0.14.4.data/scripts/ds_elastic
  Caused by: Operation not permitted (os error 1)

edit: but it's the copy that fails now - looking at why

@zanieb
Copy link
Member

zanieb commented Jul 30, 2024

The shared cache use-case makes sense, I don't think that's something we've focused on in particular yet — I'm surprised there haven't been more questions about it. I've created another issue to track documenting some recommendations #5611

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

@charliermarsh, if I do it manually I get a different error:

cp /data/env/cache/uv/archive-v0/3bJMARV6EJsb_nzOq2K_S/deepspeed-0.14.4.data/scripts/ds_elastic /env/lib/conda/tr058-mistral-7b-ft-fin/lib/python3.10/site-packages/deepspeed-0.14.4.data/scripts/ds_elastic
cp: '/data/env/cache/uv/archive-v0/3bJMARV6EJsb_nzOq2K_S/deepspeed-0.14.4.data/scripts/ds_elastic' and '/env/lib/conda/tr058-mistral-7b-ft-fin/lib/python3.10/site-packages/deepspeed-0.14.4.data/scripts/ds_elastic' are the same file

Both files are owned by me and they are the same

-rwxrwxrwx 2 stas stas 0 Jul 30 17:17 /data/env/cache/uv/archive-v0/3bJMARV6EJsb_nzOq2K_S/deepspeed-0.14.4.data/scripts/ds_elastic*
-rwxrwxrwx 2 stas stas 0 Jul 30 17:17 /env/lib/conda/tr058-mistral-7b-ft-fin/lib/python3.10/site-packages/deepspeed-0.14.4.data/scr
ipts/ds_elastic*

and the error happens when I try to run as another user.

I wonder if it's because of the previously created hardlink - let me delete those first and try again.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

@charliermarsh, I started to get this error with your latest uv and UV_LINK_MODE=copy:

UV_LINK_MODE=copy /home/stas/uv pip install deepspeed
Resolved 34 packages in 175ms
error: Failed to install: deepspeed-0.14.4-py3-none-any.whl (deepspeed==0.14.4)
  Caused by: failed to fill whole buffer

UV_LINK_MODE=copy /home/stas/uv pip install deepspeed
Audited 1 package in 98ms

it then succeeds on the 2nd attempt.

It happens 100% of the time.

@charliermarsh
Copy link
Member

Is that a public package? Can I test it somehow?

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

It's the package you gave me yesterday. #5581 (comment)

@charliermarsh
Copy link
Member

Sorry, I meant, the package you're installing.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

The culprit is UV_LINK_MODE=copy - removing it, removes the problem.

Yes, deepspeed is a public pypi package.

@charliermarsh
Copy link
Member

charliermarsh commented Jul 30, 2024

Hmm, ok. Do you see it with the latest uv on PyPI? I've never seen that reported and can't reproduce it on main on macOS:

cargo run pip install deepspeed==0.14.4 --link-mode=copy  --no-deps

I can try it in GitHub Actions to see if it reproduces there.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

I will try shortly and report back.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

$ cargo run pip install deepspeed==0.14.4 --link-mode=copy  --no-deps
error: could not find `Cargo.toml` in `/home/stas` or any parent directory

$ touch ~/Cargo.toml
$ cargo run pip install deepspeed==0.14.4 --link-mode=copy  --no-deps
error: failed to parse manifest at `/home/stas/Cargo.toml`

Caused by:
  virtual manifests must be configured with [workspace]

@charliermarsh
Copy link
Member

Ah sorry, I meant, if you just pip install uv then run uv pip install deepspeed==0.14.4 --link-mode=copy --no-deps in your environment, does that fail too? Trying to understand if it's a new regression from the latest release.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

same issue w/ uv==0.2.31

$ pip install uv
Requirement already satisfied: uv in /data/env/lib/conda/stas-inference/lib/python3.10/site-packages (0.2.31)
$ uv pip uninstall deepspeed
Uninstalled 1 package in 1.87s
 - deepspeed==0.14.4
$ uv pip install deepspeed==0.14.4 --link-mode=copy  --no-deps
Resolved 1 package in 61ms
error: Failed to install: deepspeed-0.14.4-py3-none-any.whl (deepspeed==0.14.4)
  Caused by: failed to fill whole buffer

@charliermarsh
Copy link
Member

It looks like it ran without error here: #5615

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

You meant this one, right? https://github.com/astral-sh/uv/actions/runs/10167135512/job/28118835463?pr=5615

I was able to overcome this problem by first nuking the cache where that package was residing.

rm -rf /data/env/cache/uv/archive-v0/3bJMARV6EJsb_nzOq2K_S
uv pip uninstall deepspeed
Uninstalled 1 package in 2.14s
 - deepspeed==0.14.4
dojo-a3-ghpc-61:1-|stas-inference ~> uv pip install deepspeed==0.14.4 --link-mode=copy  --no-deps
Resolved 1 package in 7ms
Installed 1 package in 2.33s
 + deepspeed==0.14.4

works with UV_LINK_MODE=copy /home/stas/uv pip install deepspeed as well.

@stas00
Copy link
Author

stas00 commented Jul 30, 2024

Everything seems to work now!

I repeated the install/uninstall many times by different users - not a hitch!

Thank you!

@charliermarsh
Copy link
Member

Great!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants