Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import pandas error for missing compression libraries #27575

Closed
raybuhr opened this issue Jul 24, 2019 · 26 comments · Fixed by #27882
Closed

import pandas error for missing compression libraries #27575

raybuhr opened this issue Jul 24, 2019 · 26 comments · Fixed by #27882
Labels
IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@raybuhr
Copy link

raybuhr commented Jul 24, 2019

Code Sample

[dev]rbuhr:~% python
Python 3.7.2 (default, Jul 24 2019, 19:27:42)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/__init__.py", line 55, in <module>
    from pandas.core.api import (
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/api.py", line 24, in <module>
    from pandas.core.groupby import Grouper, NamedAgg
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/groupby/__init__.py", line 1, in <module>
    from pandas.core.groupby.generic import (  # noqa: F401
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 44, in <module>
    from pandas.core.frame import DataFrame
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/frame.py", line 88, in <module>
    from pandas.core.generic import NDFrame, _shared_docs
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/core/generic.py", line 71, in <module>
    from pandas.io.formats.format import DataFrameFormatter, format_percentiles
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/io/formats/format.py", line 47, in <module>
    from pandas.io.common import _expand_user, _stringify_path
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/site-packages/pandas/io/common.py", line 9, in <module>
    import lzma
  File "/home/admin/.pyenv/versions/3.7.2/lib/python3.7/lzma.py", line 27, in <module>
    from _lzma import *
ModuleNotFoundError: No module named '_lzma'
>>>

Problem description

After installing pandas 0.25.0, I can't import the library because of missing compression libraries. First it returned the error message ModuleNotFoundError: No module named '_bz2'. I installed with sudo apt-get install libbz2-dev and tried again to get the error message from the code sample above, ModuleNotFoundError: No module named '_lzma'.

This was not an issue with the previous version of pandas and I tested by downgrading to pandas 0.24.0 and was able to import without the error messages. I feel like pandas should not prevent usage just because some optional compression programs are not installed, like the default behavior of the last version.

Expected Output

>>> import pandas
>>>

Output of pd.show_versions()

Unable to run because can't import pandas.

@mroeschke
Copy link
Member

Please see the discussions in #27532 and #27543. You may have to reconfigure your environment.

@mroeschke mroeschke added the Duplicate Report Duplicate issue or pull request label Jul 24, 2019
@WillAyd
Copy link
Member

WillAyd commented Jul 24, 2019

I am surprised this issue keeps popping up for something in the stdlib though - conda and official distributions do always come with lzma right?

@mroeschke
Copy link
Member

My stab in the dark guess is that this is a pipenv issue (based on the 3 issues) and somehow '_lzmamodule.c' isn't getting built?

https://github.com/python/cpython/blob/b9a0376b0dedf16a2f82fa43d851119d1f7a2707/setup.py#L1558

@WillAyd
Copy link
Member

WillAyd commented Jul 25, 2019

Yea so here's a related discussion on bpo:

https://bugs.python.org/issue34895

So not a definitive response but I guess still implied that lzma is expected to be available as part of a standard Python distribution

@raybuhr
Copy link
Author

raybuhr commented Aug 1, 2019

I feel like the closing of this issue was not appropriate. The other two issues linked also have the same problem -- that pandas 0.25 assumes you have things installed that may not actually have come with python by default. This should be made explicitly clear up front before installation completes, not as an import error after installation.

@supermitch
Copy link

I second @raybuhr 's comment. Pyenv is a project with 16k stars. It's very widely used.

I guess still implied that lzma is expected to be available as part of a standard Python distribution

I feel this is an incorrect assumption then. I've been using Pyenv successfully and never run into an issue with _lzma until this release. It's not a nice experience that people should read through Stack Overflow and 3 closed (!) Pandas issue threads to figure out how to brew install xz as a solution.

@TomAugspurger
Copy link
Contributor

This should be made explicitly clear up front before installation completes, not as an import error after installation.

That's not possible, at least not with binary distributions (wheels / conda packages). Unless you're saying that there should be an error when you're compiling Python, in which case I agree (right now it's just a warning).


On the larger issue, I'm not sure what's best. Clearly this is affecting people. But at some point we need to be able to rely on importing module from the standard library, right?

@TomAugspurger
Copy link
Contributor

And just to be clear, this isn't a pyenv issue. It's a problem on the user's machine not having the proper dependencies when Python is compiled.

@WillAyd
Copy link
Member

WillAyd commented Aug 2, 2019

Yea this is certainly unfortunate but quoting what I think is the most definitive response from the Python mailing list:

I agree that modules that are necessarily optional should be documented
as such, and as I mentioned on https://bugs.python.org/issue34895, many
are so documented.
In the absence of such documentation, I would considered it to be not
optional except as some distributor decides to omit it. But then it is
the responsibility of the distributor to document the omission.

https://mail.python.org/pipermail/python-ideas/2018-October/054089.html

So since Python doesn't document this library as optional it should be available and if not the responsibility of the distributor to handle that expectation

@WillAyd
Copy link
Member

WillAyd commented Aug 2, 2019

FWIW pyenv also documents this as the first step in their "Common Build Problems" page:

https://github.com/pyenv/pyenv/wiki/Common-build-problems

So perhaps could help them improve that aspect of the documentation if it isn't immediately obvious

@raybuhr
Copy link
Author

raybuhr commented Aug 2, 2019

I see the points made above about this probably being an issue with system level dependencies. I am in fact using pyenv to install and fixing for our team isn't particularly difficult.

Since python expects the compression libraries to be installed since the modules are part of the standard library, this probably doesn't have to be an issue for the pandas team. That said, I still feel like making the compression libraries prerequisites for using pandas as unnecessary overhead. I think a more sympathetic response would be to try importing the compression modules and return a message that they aren't installed while still allowing pandas to be imported and used, just without support for compression.

@k-dahl
Copy link

k-dahl commented Aug 7, 2019

Pandas 0.25.0 is not useable with tools like kubeless as debian base images for Docker don't appear to contain the proper libs for _lzma any more. You'd need to build out custom images.

Pandas 0.24.2 works fine.

vmware-archive/runtimes#44

@TomAugspurger
Copy link
Contributor

I suspect we would accept a PR that did the lzma import in a try / except ImportError block.

When the module is not present, we would emit a UserWarning that their Python was not compiled properly and that lzma compression is not available. And if they use lzma compression we would raise at runtime.

Is anyone interested in submitting a PR?

@TomAugspurger TomAugspurger reopened this Aug 7, 2019
@TomAugspurger TomAugspurger added this to the 0.25.1 milestone Aug 7, 2019
@TomAugspurger
Copy link
Contributor

FYI, we'll probably want to do the 0.25.1 release in 1-2 weeks. It'd be good to include this.

@TomAugspurger TomAugspurger added Build Library building on various platforms IO Data IO issues that don't fit into a more specific label and removed Duplicate Report Duplicate issue or pull request Build Library building on various platforms labels Aug 8, 2019
@TomAugspurger
Copy link
Contributor

Any takers to work on this? No obligation of course. If no one else is able to, I'll put something together later in the week.

cc @islander, @selvathiruarul, @salompas, @tvanyo who reported this in other issues.

@guilherme-salome
Copy link
Contributor

If I remember correctly I was able to solve this issue by brew install xz, and then reinstalling Python with pyenv. It makes sense that this should not be a pandas issue, but it would be nice to have a warning message or something else to fall back on (while alerting the user that it needs to be solved).
I'll give it a try (but I am a noob, so let me know if there are issues with the code).

@TomAugspurger
Copy link
Contributor

Thanks @salompas. Feel free to start something if you think you have a handle on what needs to be done.

We're deciding a release date for 0.25.1 at our dev meeting on Wednesday. If necessary, one of us will step in and finish things off before we need to release.

@guilherme-salome
Copy link
Contributor

@TomAugspurger I am following your suggestions and will submit a PR soon (just need a bit of time to go through the "Contributing to pandas" page).

I suspect we would accept a PR that did the lzma import in a try / except ImportError block.

When the module is not present, we would emit a UserWarning that their Python was not compiled properly and that lzma compression is not available. And if they use lzma compression we would raise at runtime.

Is anyone interested in submitting a PR?

@guilherme-salome
Copy link
Contributor

guilherme-salome commented Aug 12, 2019

@TomAugspurger I have been trying to modify the code, but ran into a problem. One of the first files to complain about a missing lzma module is pandas/_libs/parsers.pyx. Unfortunately, my knowledge of Cython is zero. I have tried a couple of things but without much success (cythonize fails on parsers.pyx when doing pip install -e .). If you have any ideas, please let me know!

@guilherme-salome
Copy link
Contributor

@TomAugspurger I have been trying to modify the code, but ran into a problem. One of the first files to complain about a missing lzma module is pandas/_libs/parsers.pyx. Unfortunately, my knowledge of Cython is zero. I have tried a couple of things but without much success (cythonize fails on parsers.pyx when doing pip install -e .). If you have any ideas, please let me know!

The issue with cythonize might be due to something else. I have tried it adding a blank line to the pyx file and reinstalling with pip and I get the same error.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 12, 2019

Fortunately in this case, you can just use regular Python in parsers.pyx. I'd recommend writing a compat function like the following, and replacing import lzma with that.

diff --git a/pandas/_libs/parsers.pyx b/pandas/_libs/parsers.pyx
index cafc31dad..385349629 100644
--- a/pandas/_libs/parsers.pyx
+++ b/pandas/_libs/parsers.pyx
@@ -2,7 +2,6 @@
 # See LICENSE for the license
 import bz2
 import gzip
-import lzma
 import os
 import sys
 import time
@@ -59,9 +58,12 @@ from pandas.core.arrays import Categorical
 from pandas.core.dtypes.concat import union_categoricals
 import pandas.io.common as icom
 
+from pandas.compat import import_lzma
 from pandas.errors import (ParserError, DtypeWarning,
                            EmptyDataError, ParserWarning)
 
+lzma = import_lzma()
+
 # Import CParserError as alias of ParserError for backwards compatibility.
 # Ultimately, we want to remove this import. See gh-12665 and gh-14479.
 CParserError = ParserError
diff --git a/pandas/compat/__init__.py b/pandas/compat/__init__.py
index 5ecd641fc..04e8d44a3 100644
--- a/pandas/compat/__init__.py
+++ b/pandas/compat/__init__.py
@@ -65,3 +65,17 @@ def is_platform_mac():
 
 def is_platform_32bit():
     return struct.calcsize("P") * 8 < 64
+
+
+def import_lzma():
+    import warnings
+
+    try:
+        import lzma
+        return lzma
+    except ImportError:
+        msg = (
+            "Could not import the lzma module. Your installed Python is incomplete. "
+            "Attempting to use `lzma` compression will result in a RuntimeError."
+        )
+        warnings.warn(msg)
diff --git a/pandas/io/common.py b/pandas/io/common.py
index e01e47304..0a66c58b8 100644
--- a/pandas/io/common.py
+++ b/pandas/io/common.py
@@ -6,7 +6,6 @@ import csv
 import gzip
 from http.client import HTTPException  # noqa
 from io import BytesIO
-import lzma
 import mmap
 import os
 import pathlib
@@ -31,10 +30,12 @@ from pandas.errors import (  # noqa
     ParserWarning,
 )
 
+from pandas.compat import import_lzma
 from pandas.core.dtypes.common import is_file_like
 
 from pandas._typing import FilePathOrBuffer
 
+lzma = import_lzma()
 # gh-12665: Alias for now and remove later.
 CParserError = ParserError

Then going through and fixing up uses of lzma to check for lzma being None.

LMK if you want me to take over. We can always find other issues for you to work on 😄 This is getting to be a bit tricky (which is why it'd be nice to rely on lzma just being present!)

@guilherme-salome
Copy link
Contributor

@TomAugspurger cool idea! I am trying that right now, thanks for the hint!

@Daletxt
Copy link

Daletxt commented Aug 19, 2019

ModuleNotFoundError: No module named '_lzma':
Oh,shit! This problem killed my whole day!
0.25.0 has this error however 0.24.2 is OK!
I rollback 0.24.2 version.
However problems is lacking like _lzma.cpython-36m-darwin.so file in lib_dynload directory.
Maybe, I need to recompiled。

@kailichou
Copy link

kailichou commented Dec 25, 2020

I second @raybuhr 's comment. Pyenv is a project with 16k stars. It's very widely used.

I guess still implied that lzma is expected to be available as part of a standard Python distribution

I feel this is an incorrect assumption then. I've been using Pyenv successfully and never run into an issue with _lzma until this release. It's not a nice experience that people should read through Stack Overflow and 3 closed (!) Pandas issue threads to figure out how to brew install xz as a solution.

I already brew install xz but the ModuleNotFoundError: No module named '_bz2' still shows up

OS: BigSur
python 3.8.6
pandas 1.1.5
python version manager pyenv

@corbinday
Copy link

I was getting this warning:
/Users/usr/.pyenv/versions/3.9.5/lib/python3.9/site-packages/pandas/compat/__init__.py:97: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.

I was finally able to get rid of it with this command:
CPPFLAGS="-I$(brew --prefix xz)/include" pyenv install [your version]

Just thought I'd drop this here for anyone with my same problem.

OS: Big Sur
M1 chip
pyenv
python 3.9.5

@ChrisFetterly
Copy link

Thank you @corbinday ! This saved me as well on M1 with python 3.9.0 + pyenv + BigSur v11.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging a pull request may close this issue.