-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(python): streamline lazy imports #5302
refactor(python): streamline lazy imports #5302
Conversation
Looks like some typing-related lint, a complaint from sphinx docs, and a "must define exec_module()" gremlin. Will dig-in and take care of it; all looks solvable ;) |
Will those isinstance lines, not just import the module if it is available? The lazy instance way only would import it, when the data object would be an np.ndarray. Maybe I don't understand the code, but it looks to me that the import of polars might be fast, but calling any function which has a *_Available, section will cause importing of modules, even when not needed (depending on the arguments). elif _NUMPY_AVAILABLE and isinstance(data, np.ndarray): |
Ahh, I get what you mean - I'll see if it's possible to black-magic it, or maybe tweak/reinstate |
@ghuls: solved, without having to reinstate if _NUMPY_TYPE(data) and isinstance(data, np.ndarray): The Just need to address the sphinx issue (determining Update: done - all lint/docs/tests happy... |
Thanks @alexander-beedie. I think it is good to separate the lazy string type checking from isinstance indeed. At least mypy likes it better. And cool that python already has lazy imports! |
Maintains the fantastic new import speedups (tuna graph below to validate that claim ;) but also preserves more natural interaction with the lazy-loaded modules - import
pandas
,numpy
,pyarrow
, etc frompolars.dependencies
and then treat like any other module. (I think the name change from "import_check" to "dependencies" makes sense?)Comparison:
Before
After (all lazy features preserved)
eg: polars/internals/dataframe/frame.py:1377 -
Performance:
Attribute access into lazy-loaded modules should be (marginally) faster; after first access they exist in
sys.modules
and aren't mediated by a function call containing the import directive. Import speeds match the previous patch.Bonus:
Able to remove module imports from
TYPE_CHECKING
block in quite a few modules (as this gets centralised inpolars.dependencies
).Import times:
No regressions -
Todo:
from_pandas
Check import without optional dependencies
CI passlazy_isinstance
behaviour (where module isn't imported unless the type may actually match)Update: done...
Misc: docs requirements should also include modules that are referenced (pandas/numpy/pyarrow) - added.