-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: pseudo-public internals API for downstream libraries #40182
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test which exercises this from a downstream author POV (e.g. create an manipulate things, e.g. like what pyarrow is doing), but only do this by importing the new api you are establishing so they can't cheat
authors | ||
|
||
1) Try to avoid using internals directly altogether, and failing that, | ||
2) Use only functions exposed here (or in core.internals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally we remove 2 as soon as possible.
@@ -0,0 +1,61 @@ | |||
""" | |||
This is a pseudo-public API for downstream libraries. We ask that downstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have these import to pandas.api.internals i think (then you can really control the exports)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know pyarrow accesses the pd.core.internals namespace. not sure about others. we can ask them to change, but for the forseeable future will need these in the namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right what i mean is let's expose a wrapper api namespace and then we can change the downstream packages when we have released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough. keep it in this file though? im wary of adding it to the pd.api namespace lest new downstream packages adopt bad habits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sure
i think the way to do this is to rename the internal version, then our existing downstream/io tests will test for this |
Thanks, it's probably a good idea to make a public-ish In pyarrow, we currently make use of |
@@ -0,0 +1,61 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test for this model that asserts the exact name that we expose, similar to https://github.com/pandas-dev/pandas/blob/master/pandas/tests/api/test_api.py
from pandas.core import internals | ||
from pandas.core.internals import api | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also assert the entire namespace of api itself e.g. [name for name in dir(api) if not name.startswith('_')]
and then compare to a hard coded list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think we want to hard-code the existing list bc whats in there is somewhat haphazard. Once we hear back from all the relevant parties on #40226 we'll be able to make that list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would fix it now with all of the symbols that are currently exposed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
picking my battles; updated.
Downstream libraries accessing core.internals causes headaches pretty regularly (#40149 (comment), #38134,). The proposal: define a public-ish stable-ish API to expose for these libraries, leaving us free-er to make changes in internals.
This exposes
make_block
. I expect we'll also need to expose one or both ofcreate_block_manager_from_arrays
/create_block_manager_from_blocks
(comment ininternals.__init__
says they are there for downstream xref #33892)