Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC/ENH: Add documentation and whatsnew entry for ArrowDtype and ArrowExtensionArray #47854

Merged
merged 18 commits into from
Aug 18, 2022

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Jul 26, 2022

@mroeschke mroeschke added Docs Arrow pyarrow functionality labels Jul 26, 2022
@mroeschke mroeschke added this to the 1.5 milestone Jul 26, 2022
@mroeschke mroeschke changed the title DOC/ENH: Add documentation and whatsnew entry for ArrowDtype and ArrowExtensionAray DOC/ENH: Add documentation and whatsnew entry for ArrowDtype and ArrowExtensionArray Jul 26, 2022
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just added some minor suggestions.

ser_float = pd.Series([1.0, 2.0, None], dtype="float32[pyarrow]")
ser_float

ser_list = pd.Series([[1, 2]], dtype=pd.ArrowDtype(pa.list_(pa.int64())))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably add a short comment here on what we're creating, and maybe add another row to the Series. Maybe when rendered it becomes clearer, but reading just this it feels like it may take some time for readers to understand what you're doing here.


Parameters
----------
values: pyarrow.Array or pyarrow.ChunkedArray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
values: pyarrow.Array or pyarrow.ChunkedArray
values : pyarrow.Array or pyarrow.ChunkedArray

I think this space is needed for Sphinx to render this correctly


Please install the latest version of PyArrow to enable this functionality and avoid
potential bugs in prior versions of PyArrow.
""" # noqa: E501
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment on what's E501 and why liniting fails here, so readers don't need to look it up?

:toctree: api/
:template: autosummary/class_without_autosummary.rst

arrays.ArrowExtensionArray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be core.arrays.arrow.ArrowExtensionArray? Seems like we don't have ArrowExtensionArray imported in arrays/__init__.py.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this PR is depends on #47818 where these will be defined.

:toctree: api/
:template: autosummary/class_without_autosummary.rst

ArrowDtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be core.arrays.arrow.ArrowDtype?

CI is red, I guess this is the cause: https://github.com/pandas-dev/pandas/runs/7616698322?check_suite_focus=true#step:6:153

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this PR is depends on #47818 where these will be defined.


.. ipython:: python

ser_list.take([1, 0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like variables are cleaned from one block to the next, and sphinx is not happy here for ser_list not being defined. Maybe we can merge both blocks and move the comment in between after the block?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like variables are cleaned from one block to the next

FYI, that's not the case, all variables are kept for (at least) a full file, so you can define a variable in one ipython code block, and use it in a next.
(the reason it was failing was because ArrowDtype wasn't actually yet exposed in the main namespace)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up just combining both blocks anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, and that's perfect here, just wanted to clarify ;)

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really nice documentation

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick comments!

=================== ========================= ============================= =============================
Kind of Data pandas Data Type Scalar Array
=================== ========================= ============================= =============================
PyArrow :class:`ArrowDtype` Python Scalars or :class:`NA` :ref:`api.arrays.arrow`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit: but since this list is not alphabetically ordered, can you move this to the last entry? Since this is an experimental new feature, while the first entries here are widely used data types, it seems better to leave the others as first listed data types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Moved this entry to the bottom

doc/source/reference/arrays.rst Show resolved Hide resolved
Most operations are supported and have been implemented using `pyarrow compute <https://arrow.apache.org/docs/python/api/compute.html>`__ functions.
We recommend installing the latest version of PyArrow to access the most recently implemented compute functions.

This feature is experimental and may change in a future release.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe put this in a warning box to give it a bit more visibility (like you did in the docstring)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Changed.

Most operations are supported and have been implemented using `pyarrow compute <https://arrow.apache.org/docs/python/api/compute.html>`__ functions.
We recommend installing the latest version of PyArrow to access the most recently implemented compute functions.

This feature is experimental and may change in a future release.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This feature is experimental and may change in a future release.
This feature is experimental and the API can change in a future release without warning.

@mroeschke mroeschke mentioned this pull request Aug 15, 2022
@mroeschke
Copy link
Member Author

Will merge tomorrow if no further comments.

@mroeschke mroeschke merged commit 065b51c into pandas-dev:main Aug 18, 2022
@mroeschke mroeschke deleted the arrow/docs branch September 8, 2022 18:41
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
…wExtensionArray (pandas-dev#47854)

* Start arrow docs

* Add whatsnew example

* Add note about strings

* Address review

* modify whatsnew

* Add experimental note

* Address Joris' comments

* Fix repr, and add missing import

* Add extra sections

* Add quotes and example
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants