-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Symlink (and other) handling of archives #5919
Comments
Personal opinion (from a Windows background, so I tend to think of symlinks as a bad thing): we should treat symlinks as errors. I don't honestly see why there's a need for symlinks in a Python sdist at all (I assume that's when this code gets used?) and if we are to support them I'd like to see a compelling use case for having them that can't be handled any other way. Bonus - with a good use case, we'd pretty immediately have a better understanding of "how to handle filesystems that don't support symlinks"[1] for free. [1] Don't forget, that not all filesystems even on Unix support symlinks - consider smbfs, or fat32 on a USB drive - so it's a filesystem issue, not a Windows vs Unix one... |
My only concern with erroring on symlinks, is that it might take packages that are already working and "break" them. it would be useful to try to audit PyPI to see how widespread the breakage would be. If it's bad then maybe we can emulate symlinks by copying the file instead of using a symlink across all platforms. If the breakage is small, then just start erroring. |
Worth checking #5856 for an example of someone with a package trying to get symlinks to work (looks like (some?) package creation tools won't currently handle them, although pip's I might try auditing piwheels in a bit to see if there's any wheels we've built with would-be symlinks in them - shouldn't take too much effort to whip up something that'll scan for 'em. |
@waveform80 Wheels can't contain symlinks, because they are zipfiles and zipfiles don't support symlinks (at least the Python zipfile module doesn't, and the zipfile spec says nothing about them AIUI). I'm a very strong -1 on allowing symlinks in wheels, certainly not without the PEP being updated to explain precisely how they should be handled on all platforms. Source distributions is another matter - while I remain -1, I'm fine with what @dstufft says, that we should review to determine if we risk breaking existing packages. |
@pfmoore I'm rather on the fence about symlinks. Your arguments against them are good (and lead to a consistent treatment on all platforms) but at the same time I can see use-cases for them and there are plenty of packages on PyPI which are only intended for UNIX-like platforms (I'm not so concerned about file-systems that don't support symlinks on such UNIX-like; e.g. I don't think pip's generally concerned about supporting users attempting to install stuff on FAT32 under Linux!). Basically, I'm happy to go with whatever the consensus turns out to be, but I do think the current situation isn't ideal and some work needs to be done on As to whether zips can support symlinks ... the infozip tools on Linux handle them quite happily (storing and re-constructing them). On UNIX-like systems, the file-mode (including whether or not the member is a symlink, device, FIFO, etc.) is stored in the high 16-bits of the external-file-attributes which the zip spec simply defines as "...host-system dependent" (which can in turn be interpreted according to the version-made-by field, which includes a value for UNIX (3)). Now, as you note the Python ZipFile module doesn't handle them, but we could support them if we wanted to and there's nothing in the zip spec that forbids this. Still, as noted above, I'm happy to go with the consensus and I certainly have some sympathy with the "just don't allow symlinks" position. |
Is this easy for someone to do? What is involved? |
Something like this? https://gist.github.com/waveform80/7d0a74e59ab1d073908d99b7a4523c01 I'm running that on piwheels at the moment; we've only got wheels (obviously :) so I'm guessing it'll come up empty, but I've added in a branch for checking tars too (untested) in case anybody might find it useful for checking PyPI. Update: left that running overnight on piwheels; as expected, no symlinks found (which most likely means that all the current methods of producing wheels won't currently attempt to store symlinks within them, or if they do, that they're stored as something other than a symlink, e.g. as a regular file). Anyway, I'd say with some high degree of confidence that means throwing an error in the event of finding a symlink in a wheel won't break any extant packages. However, that doesn't necessarily extend to sdists in zip format. |
If a wheel is platform-specific, and that platform supports symlinks, then I think it makes a lot of sense for pip to support symlinks as well. Not supporting symlinks can lead to huge increases in filesize when dealing with shared object libraries. Also there is the issue that you can't just duplicate the contents of a symlink if the symlink points to a file present in a dependency. This is because the dependency may not be around when constructing the wheel. |
FYI, there is some recent discussion of this here: https://discuss.python.org/t/symbolic-links-in-wheels/1945 |
I vote for supporting symlinks on Windows. |
Just to chime in here: this would be immensely useful on macOS for bundling .frameworks, since they make heavy use of symlinks and thus balloon in size when getting zipped/unzipped by pip + wheel since this replaces all symlinks with copies of the files they point to (SDL2_mixer.framework goes from 4.6 MB to 40.2 MB after being run through bdist_wheel). |
My sentiment exactly. Note that #8368 will remove calls to setup.py install and force wheel creation in pip 21.0, thus breaking any packages that might depend on symlinks. |
To support symlinks in wheels needs someone to go through the process of proposing a change to the standard. @cjerdonek linked here to the relevant discussion at the time, but no-one seems to have followed up on that, so there's been no progress since then. |
I'm gonna close this off, since this needs a change to the wheel file format and that isn't a trackable change for pip's issue tracker. For anyone interested in this, please see the discussion at https://discuss.python.org/t/symbolic-links-in-wheels/1945. Feel free to start a new thread there for pushing this forward if you're interested. If wheels get updated to support this, then there'd be a trackable task for pip here which would make sense for this issue tracker. |
Following on from #5848 I've been looking into pip's handling of symlinks in zips (and tars). I've got some experiments to PR, but I think it might be more useful to have a quick conversation about what pip actually wants to do with symlinks in archives it handles, if anything.
The current state
At present, pip handles symlinks present in tar archives (largely because Python's
TarFile
handles them). By contrast, pip doesn't handle symlinks in zips properly (again, largely because Python'sZipFile
doesn't); it extracts them as a regular file containing the name of the target. However, there is enough information in the file-attributes in a zip to accurately reconstruct symlinks (demonstrably as the infozip tools do this).Then there's the OS differences. On UNIX-like platforms, symlinks (in tars) will be re-constructed normally. However, Windows presents certain challenges:
Hence, if pip is running as Administrator on a recent (Vista onwards) version of Windows, symlinks (in tars) will be re-constructed as symlinks. Otherwise, they'll be extracted as a regular file containing the target's contents (i.e. the target file will be duplicated under the symlink's name).
Questions
untar_file
currently does viaTarFile
)Other stuff
Basically, I think
unzip_file
anduntar_file
need a bit of a re-work to make them both consistent with each other. While I'm at it there's a few other things that I'd like to fix:untar_file
handles symlinks by calling an "internal" method ofTarFile
(_extract_member
1); ideally I'd like to re-work that to avoid such calls. I admit it's unlikely to change (it's been there almost unchanged since 2.7) but still, I don't like relying on undocumented methodsZipFile.extract
2 fixes illegal characters in filenames when extracting in Windows (oddlyTarFile.extract
doesn't). Might be useful to add this functionality too - especially if the "always work" option is selected (as presumably the expectation there is that all archives should extract successfully on all platforms)ZipFile.extract
3). Currently pip doesn't guard against this, and given it's occasionally run as root on UNIX-like systems that's a bit dangerousThe text was updated successfully, but these errors were encountered: