Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): ensure pyarrow.compute module is loaded #6353

Merged
merged 1 commit into from
Jan 22, 2023
Merged

fix(python): ensure pyarrow.compute module is loaded #6353

merged 1 commit into from
Jan 22, 2023

Conversation

josh
Copy link
Contributor

@josh josh commented Jan 21, 2023

fix(python): ensure pyarrow.compute module is loaded

Stumbled across a pyarrow lazy loading race condition where pa.compute functions may not be available just yet. It's difficult to test in the test suite since another test may have triggered the module to be fully loaded hiding the bug.

I believe the pyarrow docs recommend importing and using the compute module directly rather than depending on them to be loaded on the root package. This change adds an explicit lazy load dependency for that pyarrow.compute module.

Reproduction Steps

import pyarrow as pa
import pyarrow.feather as feather

col = pa.chunked_array([["foo"], ["bar"]], type=pa.dictionary(pa.int8(), pa.string()))
table = pa.table([col], names=["a"])
feather.write_feather(table, "example.ipc")
import polars as pl
# import pyarrow.compute # enable workaround

pl.read_ipc("example.ipc", use_pyarrow=True)
Traceback (most recent call last):
  File "example.py", line 5, in <module>
    pl.read_ipc("example.ipc", use_pyarrow=True)
  File "polars/utils.py", line 394, in wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "polars/io.py", line 860, in read_ipc
    df = DataFrame._from_arrow(tbl, rechunk=rechunk)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "polars/internals/dataframe/frame.py", line 470, in _from_arrow
    return cls._from_pydf(arrow_to_pydf(data, columns=columns, rechunk=rechunk))
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "polars/internals/construction.py", line 936, in arrow_to_pydf
    column = coerce_arrow(column)
             ^^^^^^^^^^^^^^^^^^^^
  File "polars/internals/construction.py", line 1105, in coerce_arrow
    array = pa.compute.cast(
            ^^^^^^^^^^
  File "polars/dependencies.py", line 82, in __getattr__
    return getattr(module, attr)
           ^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/__init__.py", line 335, in __getattr__
    raise AttributeError(
AttributeError: module 'pyarrow' has no attribute 'compute'

@github-actions github-actions bot added fix Bug fix python Related to Python Polars labels Jan 21, 2023
@ritchie46
Copy link
Member

@alexander-beedie could you take a look if this still makes sense regarding the lazy loading?

@alexander-beedie
Copy link
Collaborator

@alexander-beedie could you take a look if this still makes sense regarding the lazy loading?

No problem; I have a block of time tomorrow afternoon 👍

@josh
Copy link
Contributor Author

josh commented Jan 21, 2023

could you take a look if this still makes sense regarding the lazy loading?

I guess another option would just putting the import pyarrow.compute right inline the coerce_arrow body since it's only ever used there.

@ritchie46
Copy link
Member

could you take a look if this still makes sense regarding the lazy loading?

I guess another option would just putting the import pyarrow.compute right inline the coerce_arrow body since it's only ever used there.

I like that more. Could you make this change?

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jan 22, 2023

All looks good to me; does seem that pyarrow wants that explicitly imported, but unless we're going to have more than one such import I think it's fine to special-case it and import inline.

@ritchie46
Copy link
Member

Great! Thanks @josh and @alexander-beedie

@ritchie46 ritchie46 merged commit f2e54b1 into pola-rs:master Jan 22, 2023
@josh josh deleted the fix-pyarrow-compute branch January 22, 2023 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants