Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to run "pip install ." on Windows when package data contains unicode paths #5972

Closed
pombredanne opened this issue Oct 31, 2018 · 14 comments
Labels
auto-locked Outdated issues that have been locked by automation C: encoding Related to text encoding and likely, UnicodeErrors OS: windows Windows specific type: bug A confirmed bug or unintended behavior

Comments

@pombredanne
Copy link
Contributor

Environment

  • pip version: pip 18.1
  • Python version: 2.7.14
  • OS: Windows 7 and 10

Description
Running pip install . on a package that contains paths with unicode file names fails.
See the attached minimal zip example.

Expected behavior
The package should be installed.

How to Reproduce

  1. Get package from the attached zip and unzip it on Windows example.zip

  2. Create a virtualenv and activate this

  3. in the extracted package directory, run pip install .

  4. the installation fails. For info pip install -e . works fine (because there is no file copy involved)

Output

(scancode-toolkit) C:\dev\PortableGit-2.19.1\example3\example>pip install .
Processing c:\dev\portablegit-2.19.1\example3\example
Could not install packages due to an EnvironmentError: [
('C:\\dev\\PortableGit-2.19.1\\example3\\example\\src\\module\\unicodepath\\???', 
'c:\\users\\pombreda\\appdata\\local\\temp\\pip-req-build-wcge_1\\src\\module\\unicodepath\\???', 
"[Errno 22] invalid mode ('rb') or filename: 'C:\\\\dev\\\\PortableGit-2.19.1\\\\example3\\\\example\\\\src\\\\module\\\\unicodepath\\\\???'"), 

('C:\\dev\\PortableGit-2.19.1\\example3\\example\\src\\module\\unicodepath\\???a',
 'c:\\users\\pombreda\\appdata\\local\\temp\\pip-req-build-wcge_1\\src\\module\\unicodepath\\???a', 
"[Errno 22] invalid mode ('rb') or filename: 'C:\\\\dev\\\\PortableGit-2.19.1\\\\example3\\\\example\\\\src\\\\module\\\\unicodepath\\\\???a'")]
@pombredanne
Copy link
Contributor Author

For reference, it works fine on Linux:

$ pip install .
Processing /home/pombreda/tmp/example
Building wheels for collected packages: sample
  Running setup.py bdist_wheel for sample ... done
  Stored in directory: /tmp/pip-ephem-wheel-cache-3Ui6Ih/wheels/53/51/26/6a5a10f315c3810c0e2aee8ca14064ce9dcfbe49aabceba1b2
Successfully built sample
Installing collected packages: sample
Successfully installed sample-1.0

@pombredanne
Copy link
Contributor Author

For info this bug was found by @sschuberth and reported here: aboutcode-org/scancode-toolkit#755

@cjerdonek
Copy link
Member

Thanks for the report. Can you include the path itself in this issue report so people can see it without having to download the zip? What encoding was used?

@pombredanne
Copy link
Contributor Author

@cjerdonek the original encoding of the file paths is not known. And that what makes these things fun.
On Linux with a UTF-8 FS encoding the paths looks like this:

$ find .
.
./src
./src/module
./src/module/unicodepath
./src/module/unicodepath/Ϩὀ⌨a
./src/module/unicodepath/Ϩὀ⌨
./src/module/unicodepath/Izgradnja sufiksnog polja korištenjem Kärkkäinen – Sandersovog algoritma.pdf
./src/module/__init__.py
./setup.py

On windows, they look different of course.
As for the FS encoding of Windows, that's whatever the default is there.

They are also here: https://github.com/nexB/scancode-toolkit/tree/1bb19b7cdccc750a488bcebdeb21d427fa61bddb/tests/scancode/data/unicodepath/uc/unicodepath

This is a rare case of course, and it is a set of weird files that I have used only for testing of weird unicode path in ScanCode and should not be something that is common.

And this is a FUBAR area of Python 2.7 ... there are plenty of bugs there mostly fixed in Python 3.x.

As a hint in general on Windows and Python 2.7 IFF you get paths from any os.listdir or os.walk using a unicode path string (as opposed to bytes/str), then Windows and Python will behave correctly and give you Unicode on anything that's on the filesystem.
If you ever feed it str, it fails. Here the copy may use shutil? that's a darker area to me.

@pombredanne
Copy link
Contributor Author

And based on the above error messages it looks like unicode was not used for paths on Windows. That could be a possible failure condition

@pombredanne
Copy link
Contributor Author

pombredanne commented Nov 2, 2018

(and of course if you use unicode on POSIX with Python 2.7 things will fail unless you use the backports in backports.os of os.fsencode and os.fsdecode by @pjdelport )

@pradyunsg pradyunsg added the S: needs triage Issues/PRs that need to be triaged label Dec 14, 2018
@pombredanne
Copy link
Contributor Author

@cjerdonek @pradyunsg gentle ping... you both chimed in on this issue. Any update? Or will this not be fixed for 2.7x?

@cjerdonek
Copy link
Member

@pombredanne Can you show the command output when running the command in verbose mode (passing -v or -vv)? I believe that should cause a traceback to be displayed to show what line is at fault.

@pombredanne
Copy link
Contributor Author

@cjerdonek I spun a VM and got this on Python 2.7.16 32 bits on Windows 7 using the example.zip I attached above.


(scancode-toolkit) C:\dev\scancode-toolkit>python
Python 2.7.16 (v2.7.16:413a49145e, Mar  4 2019, 01:30:55) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> ^Z


(scancode-toolkit) C:\dev\scancode-toolkit>pip --version
pip 19.1.1 from c:\dev\scancode-toolkit\lib\site-packages\pip (python 2.7)


(scancode-toolkit) C:\dev\scancode-toolkit>pip -vvv install example/
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that d
ate. A future version of pip will drop support for Python 2.7.
Config variable 'Py_DEBUG' is unset, Python ABI tag may be incorrect
Config variable 'WITH_PYMALLOC' is unset, Python ABI tag may be incorrect
Config variable 'Py_UNICODE_SIZE' is unset, Python ABI tag may be incorrect
Created temporary directory: c:\users\pombre~1\appdata\local\temp\pip-ephem-wheel-cache-vx5t_u
Created temporary directory: c:\users\pombre~1\appdata\local\temp\pip-req-tracker-eednrr
Created requirements tracker 'c:\\users\\pombre~1\\appdata\\local\\temp\\pip-req-tracker-eednrr'
Created temporary directory: c:\users\pombre~1\appdata\local\temp\pip-install-e5k_hs
Processing c:\dev\scancode-toolkit\example
  Created temporary directory: c:\users\pombre~1\appdata\local\temp\pip-req-build-bs8obn
ERROR: Could not install packages due to an EnvironmentError.
Traceback (most recent call last):
  File "c:\dev\scancode-toolkit\lib\site-packages\pip\_internal\commands\install.py", line 352, in run
    resolver.resolve(requirement_set)
  File "c:\dev\scancode-toolkit\lib\site-packages\pip\_internal\resolve.py", line 131, in resolve
    self._resolve_one(requirement_set, req)
  File "c:\dev\scancode-toolkit\lib\site-packages\pip\_internal\resolve.py", line 294, in _resolve_one
    abstract_dist = self._get_abstract_dist_for(req_to_install)
  File "c:\dev\scancode-toolkit\lib\site-packages\pip\_internal\resolve.py", line 242, in _get_abstract_dist_for
    self.require_hashes
  File "c:\dev\scancode-toolkit\lib\site-packages\pip\_internal\operations\prepare.py", line 347, in prepare_linked_requirement
    progress_bar=self.progress_bar
  File "c:\dev\scancode-toolkit\lib\site-packages\pip\_internal\download.py", line 873, in unpack_url
    unpack_file_url(link, location, download_dir, hashes=hashes)
  File "c:\dev\scancode-toolkit\lib\site-packages\pip\_internal\download.py", line 778, in unpack_file_url
    shutil.copytree(link_path, location, symlinks=True)
  File "C:\Python27\Lib\shutil.py", line 231, in copytree
    raise Error, errors
Error: [('C:\\dev\\scancode-toolkit\\example\\src\\module\\unicodepath\\???', 'c:\\users\\pombre~1\\appdata\\local\\temp\\pip-req-build-bs8obn\\src\\m
odule\\unicodepath\\???', "[Errno 22] invalid mode ('rb') or filename: 'C:\\\\dev\\\\scancode-toolkit\\\\example\\\\src\\\\module\\\\unicodepath\\\\??
?'"), ('C:\\dev\\scancode-toolkit\\example\\src\\module\\unicodepath\\???a', 'c:\\users\\pombre~1\\appdata\\local\\temp\\pip-req-build-bs8obn\\src\\mo
dule\\unicodepath\\???a', "[Errno 22] invalid mode ('rb') or filename: 'C:\\\\dev\\\\scancode-toolkit\\\\example\\\\src\\\\module\\\\unicodepath\\\\??
?a'")]
Cleaning up...
Removed build tracker 'c:\\users\\pombre~1\\appdata\\local\\temp\\pip-req-tracker-eednrr'

I am porting everyhting on Python 3 so this may be a non problem soon?
And I long have been worked around this bug anyway. So since I am the only one that ever saw this with @sschuberth feel free to close alright. No need to waste time there anymore unless it also exists on Python 3.

pombredanne added a commit to pombredanne/pip that referenced this issue Jun 28, 2019
With these minimal changes the exmaple.zip from pypa#5972 installs
correctly on Windows.

Link: pypa#5972
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne
Copy link
Contributor Author

@cjerdonek this can help : pombredanne@84a792b ... this is a minimalist hackish POC to allow the directory in example.zip to be installed alright

@cjerdonek
Copy link
Member

Thanks. Is it really minimal in the sense that if you don’t include any of the four changes, it doesn’t start working?

@cjerdonek cjerdonek added C: encoding Related to text encoding and likely, UnicodeErrors OS: windows Windows specific type: bug A confirmed bug or unintended behavior labels Jun 28, 2019
@triage-new-issues triage-new-issues bot removed the S: needs triage Issues/PRs that need to be triaged label Jun 28, 2019
@AWhetter
Copy link

AWhetter commented Nov 7, 2019

So that you don't have to download a random zip file, I'm seeing this with the wheel sdist (specifically wheel 0.33.6, https://pypi.org/project/wheel/0.33.6/#files, although I'm sure it exists in earlier versions as well) as they include some files with unicode filepaths as test data (https://github.com/pypa/wheel/tree/0.33.6/tests/testdata/unicode.dist/unicodedist).

@pombredanne
Copy link
Contributor Author

@cjerdonek re:

Is it really minimal in the sense that if you don’t include any of the four changes, it doesn’t start working?

Sorry for the late reply, yes this was a minimal patch in that sense (but I reckon now that it would not work on Python 3).

With that said, I am inclined to let this issue not be fixed and die of its slow death as it does not exist on Python 3.

@pradyunsg
Copy link
Member

As per pip's documented policy, pip's maintainers won't be fixing such issues but PRs are welcome that address this (and will be subject to our regular review process).

I'm going to go ahead and close this, since it's a Python 2-only issue that doesn't merit being tracked and if someone wants to file a PR or move this discussion forward, I suggest they file a new issue.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Mar 10, 2020
@lock lock bot locked as resolved and limited conversation to collaborators Mar 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation C: encoding Related to text encoding and likely, UnicodeErrors OS: windows Windows specific type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants