-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pip should not cache large files in /tmp or TMPDIR #12868
Comments
The data is temporary files (it's the unpacked source of In fact, pip simply uses Python's standard temporary file management functions, which are also not limited to "small files only". So I would suggest that if you disagree with setting |
I mean, I literally gave you the definitions, "The place for small temporary files", "/tmp/ should be used for smaller, size-bounded files only; /var/tmp/ should be used for everything else." |
For the filesystem locations, yes. I'm not disputing that. But why is
But realistically, I would say that you should simply set |
Just because I was curious about the spec, here is some reading: |
While systemd defines the two temp directories as you mentioned, I don’t see the distinction being made anywhere by systems that actually create the two directories for software to use. Even the reference they ultimate link to, https://refspecs.linuxfoundation.org/FHS_3.0/fhs-3.0.html, has nothing to distinguish vetween |
Yea, it sounds like the situation is the exact same as it was back when I'd posted #5816 (comment). I think one thing called out in OP that's worth figuring out how to improve is:
I think it's worth improving this line removing it when pip calls itself in a subprocess (the error will contain this message at the relevant place).
As the person who wrote the closing comment, this is not what I wrote or meant. I'd explicitly referenced |
Note the reasoning and context (of Solaris etc.) given here, as well as the advice to fixapplications that write large files to use /var/tmp: https://fedoraproject.org/wiki/Features/tmp-on-tmpfs#Comments_and_Discussion |
I’ll repeat my comment from above - there is nothing actionable for pip here (except possibly an improved error message or documentation, as noted by @pradyunsg). We aren’t going to stop using the stdlib functionality, so the only realistic approaches here are a stdlib change (unlikely, IMO) or a user config change (the recommended approach) |
If you reread the FHS citation, you will notice that there is in fact something to distinguish between If you assume a bunch of things:
then /var/tmp may have much more available space than /tmp. It is unlikely to have less space. That is probably the reason why systemd "requires" the rules it has made up -- because systemd likes to assume things based on what its authors would choose to do, and then rule out all other possible scenarios. But, the systemd advice is a very very bad idea to follow if you have 64gb of RAM and your /var/tmp has a quota / dedicated partition and can "only" store 8gb on it -- you'll get 4x less space in /var/tmp than you'll get in /tmp, because /tmp usually allows half your available RAM... |
It would be very, very wrong for pip to declare that its work-in-progress compiled packages are de-facto "temporary files or directories that are preserved between system reboots", unless pip has changed its approach quite a bit... |
The actual, correct solution here is for someone to write a spec such as https://devmanual.gentoo.org/eclass-reference/check-reqs.eclass/index.html It would allow python source code that is automatically built into a wheel, to declare in advance the amount of space it estimates it will probably require, so that tools such as pip can query the underlying filesystem for the directory that has been created by pip could then error out before doing anything at all, with a clever error message such as:
|
I'm having a look through the issue, as I've experienced it quite a bit too. It's not a pip issue, it's an issue with the user system being misconfigured on setup and it's highly specific to Linux. The user's system was misconfigured with too small partitions (I've have endless problems in some companies with infra teams provisioning ridiculously small partitions on VMs). The system needs to be resized. There may be a broader issue with some Linux distributions defaulting to smaller /tmp in memory, since recently, which would be an issue to report to distributions.
There was a similar issue with extremely small disks around 2015 when AWS became popular. VM images had 10GB disks to fit with the free tier, which created a lot of problems with running out of disks all the time and everything crashing. Eventually the provisioning tools became able to resize VM on creation and people learned to use it. Users can run @eli-schwartz what is the output of |
@morotti I think you misunderstood my stake in this. :) I saw a recent issue where someone was spreading FUD about systemd and claiming that pip isn't "compliant with systemd requirements". I like systemd quite well as it happens, I simply don't think that this one specific rationale in the systemd documentation holds water. In fact, I don't use pip to install large packages. I generally regard the need to compile C/C++/Fortran software as a sign that pip is the wrong tool for the job and you should be using conda or a linux distro's prebuilt packages. But I don't think that pip is doing the wrong thing here and I don't think that pip's developers should have to be lectured about how they're doing the wrong thing because systemd. ... All that being said:
No one claimed it is? The claim was that it "usually" is -- and this claim is correct in the sense that it is a popular mechanism for system distributors, and also, if your system happens to use systemd, systemd will mount /tmp as a tmpfs backed by RAM automatically unless you go out of your way to change the defaults by masking it. 1-4 GB of memory is certainly enough memory to hold plenty of things regardless. Especially if you try to optimize for the average desktop user, who doesn't use /tmp interactively but has various programs writing short-lived temporary files there, which on average don't go above 100mb.
It's not limited to single digit size, and it's not recent. Fedora has been doing it since 2012, systemd provided the configs for it since 2010. Early on, Debian disabled this based on initial feedback and hasn't really revisited the topic in 12 years -- but now they're finally changing and will do the same.
This bug report was about COMPILING packages, not using them. Due to pip's policy of COMPILING packages in No usage is implied. And if you can get past that hurdle and install it, you are not installing your workspace to $TMPDIR so the question becomes moot.
It is not specific to Linux, there is an entire industry of Windows software for creating ramdisks and moving |
Correct, this has been repeatedly suggested. :) |
Description
Pip violates system specifications, and therefore essentially only works by accident right now, when it "gets lucky" with packages being small enough for /tmp.
This becomes a serious problem when working with larger pip installations, such as for vllm, pytorch with cuda acceleration, etc.
#5816 was closed with the advice to ensure that /tmp is large enough. #4462 was on the same topic and also closed without action. Pip also blames the user/system, with this output:
However, these analyses are NOT correct.
If one reads the Linux file-hierarchy (7) man page specification (i.e., runs
man 7 file-hierarchy
) on Linux (Debian 12, at least), it states:This document also refers readers to:
Which similarly states:
Moreover, the data in question seems to be cached data, not normal temporary files, and should therefore go in /var/cache (probably only if daemon is writing it, I believe), or into the XDG cache directory (e.g., ~/.cache/pip/).
Expected behavior
Pip should follow all relevant specifications when creating files, rather than putting large files in the wrong place and overloading filesystems that are not intended for large files.
pip version
24.0
Python version
3.12.2
OS
Debian 12
How to Reproduce
bin/pip3 install vllm
when /tmp has 1.7GB available.Output
Code of Conduct
The text was updated successfully, but these errors were encountered: