-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC/ENH: Add documentation and whatsnew entry for ArrowDtype and ArrowExtensionArray #47854
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just added some minor suggestions.
doc/source/whatsnew/v1.5.0.rst
Outdated
ser_float = pd.Series([1.0, 2.0, None], dtype="float32[pyarrow]") | ||
ser_float | ||
|
||
ser_list = pd.Series([[1, 2]], dtype=pd.ArrowDtype(pa.list_(pa.int64()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably add a short comment here on what we're creating, and maybe add another row to the Series. Maybe when rendered it becomes clearer, but reading just this it feels like it may take some time for readers to understand what you're doing here.
pandas/core/arrays/arrow/array.py
Outdated
|
||
Parameters | ||
---------- | ||
values: pyarrow.Array or pyarrow.ChunkedArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
values: pyarrow.Array or pyarrow.ChunkedArray | |
values : pyarrow.Array or pyarrow.ChunkedArray |
I think this space is needed for Sphinx to render this correctly
pandas/core/arrays/arrow/array.py
Outdated
|
||
Please install the latest version of PyArrow to enable this functionality and avoid | ||
potential bugs in prior versions of PyArrow. | ||
""" # noqa: E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a comment on what's E501 and why liniting fails here, so readers don't need to look it up?
:toctree: api/ | ||
:template: autosummary/class_without_autosummary.rst | ||
|
||
arrays.ArrowExtensionArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be core.arrays.arrow.ArrowExtensionArray
? Seems like we don't have ArrowExtensionArray
imported in arrays/__init__.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this PR is depends on #47818 where these will be defined.
:toctree: api/ | ||
:template: autosummary/class_without_autosummary.rst | ||
|
||
ArrowDtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be core.arrays.arrow.ArrowDtype
?
CI is red, I guess this is the cause: https://github.com/pandas-dev/pandas/runs/7616698322?check_suite_focus=true#step:6:153
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this PR is depends on #47818 where these will be defined.
|
||
.. ipython:: python | ||
|
||
ser_list.take([1, 0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like variables are cleaned from one block to the next, and sphinx is not happy here for ser_list
not being defined. Maybe we can merge both blocks and move the comment in between after the block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like variables are cleaned from one block to the next
FYI, that's not the case, all variables are kept for (at least) a full file, so you can define a variable in one ipython code block, and use it in a next.
(the reason it was failing was because ArrowDtype wasn't actually yet exposed in the main namespace)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up just combining both blocks anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, and that's perfect here, just wanted to clarify ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really nice documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick comments!
doc/source/reference/arrays.rst
Outdated
=================== ========================= ============================= ============================= | ||
Kind of Data pandas Data Type Scalar Array | ||
=================== ========================= ============================= ============================= | ||
PyArrow :class:`ArrowDtype` Python Scalars or :class:`NA` :ref:`api.arrays.arrow` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: but since this list is not alphabetically ordered, can you move this to the last entry? Since this is an experimental new feature, while the first entries here are widely used data types, it seems better to leave the others as first listed data types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. Moved this entry to the bottom
doc/source/whatsnew/v1.5.0.rst
Outdated
Most operations are supported and have been implemented using `pyarrow compute <https://arrow.apache.org/docs/python/api/compute.html>`__ functions. | ||
We recommend installing the latest version of PyArrow to access the most recently implemented compute functions. | ||
|
||
This feature is experimental and may change in a future release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe put this in a warning box to give it a bit more visibility (like you did in the docstring)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Changed.
doc/source/whatsnew/v1.5.0.rst
Outdated
Most operations are supported and have been implemented using `pyarrow compute <https://arrow.apache.org/docs/python/api/compute.html>`__ functions. | ||
We recommend installing the latest version of PyArrow to access the most recently implemented compute functions. | ||
|
||
This feature is experimental and may change in a future release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature is experimental and may change in a future release. | |
This feature is experimental and the API can change in a future release without warning. |
Will merge tomorrow if no further comments. |
…wExtensionArray (pandas-dev#47854) * Start arrow docs * Add whatsnew example * Add note about strings * Address review * modify whatsnew * Add experimental note * Address Joris' comments * Fix repr, and add missing import * Add extra sections * Add quotes and example
doc/source/whatsnew/v1.5.0.rst
file if fixing a bug or adding a new feature.