Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: restrict downstream usage of core.internals #40226

Open
jbrockmendel opened this issue Mar 4, 2021 · 8 comments
Open

DEPR: restrict downstream usage of core.internals #40226

jbrockmendel opened this issue Mar 4, 2021 · 8 comments
Labels
Compat pandas objects compatability with Numpy or Python functions Deprecate Functionality to remove in pandas Internals Related to non-user accessible pandas implementation

Comments

@jbrockmendel
Copy link
Member

There are some optimizations I'd like to make in core.internals (mostly making the signatures stricter so we can do less validation at __init__-time). But that risks breaking changes for downstream packages that use the existing non-public APIs.

I'd like to start by asking downstream projects that access internals to identify what they actually use/need.

Once we identify what is used, my thought is to 1) encourage usage of public APIs where possible and 2) provide backward-compatible psuedo-public APIs (xref #40182).

cc @jorisvandenbossche (pyarrow), @TomAugspurger (dask), @mdurant (fastparquet), @shoyer (xarray), anyone else?

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 4, 2021
@jorisvandenbossche
Copy link
Member

As mentioned in #40182 (comment), pyarrow makes use of the make_block() and BlockManager constructors.
Further, pyarrow accesses the CategoricalBlock, DatetimeTZBlock etc classes, but only as object, eg for isinstance(block, ObjectBlock) (never calling the __init__ of those). Now, I suppose that we could remove this usage (instead of checking for CategoricalBlock, we could also check that block.values is a Categorical, for example).

Pyarrow also uses the Block.values attribute.

@jorisvandenbossche
Copy link
Member

As mentioned above, pyarrow uses the Block classes for isinstance checks. Removing the CategoricalBlock (#40527) broke this, so I think we need to add back the CategoricalBlock class for now (and at the same time remove in in pyarrow, of course, opened https://issues.apache.org/jira/browse/ARROW-12057 to track this)

@jbrockmendel jbrockmendel added Community Community topics (meetings, etc.) Deprecate Functionality to remove in pandas Internals Related to non-user accessible pandas implementation and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 24, 2021
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Apr 7, 2021

Another type of use cases that eg both dask (partd) and pyarrow have: accessing Block.values.

For example, the change to return DatetimeArray from Datetime(like)Block instead of an ndarray[datetime64[ns]] breaks dask (because DatetimeArray is not a proper EA but some hybrid with a np.dtype)

@jbrockmendel
Copy link
Member Author

breaks dask

anything in particular?

@shoyer
Copy link
Member

shoyer commented Apr 7, 2021

Xarray does not rely upon pandas's private APIs.

@jorisvandenbossche
Copy link
Member

@jbrockmendel because DatetimeArray is not a proper EA, is_extension_array_dtype failed to catch the case of DatetimeArray, which resulted in taking a numpy-only code path with DatetimeArray (see dask/partd#48, dask/partd#49)

@jbrockmendel
Copy link
Member Author

@jorisvandenbossche has any of pyarrow's usage of pandas internals changed in the last 2 years? In particular im wondering if the introduction of ArrowDtype might make it easier for pyarrow to use public constructors

@jorisvandenbossche
Copy link
Member

I don't think much has changed in that part of pyarrow. ArrowDtype also won't change anything on the short term, since pyarrow constructs blocks with default dtypes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Deprecate Functionality to remove in pandas Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

No branches or pull requests

4 participants