Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Change boxplot return_type kwarg #12216

Merged
merged 2 commits into from
Sep 4, 2016

Conversation

TomAugspurger
Copy link
Contributor

Part of #6581
Deprecation started in #7096

Changes the default value of return_type in DataFrame.boxplot
and DataFrame.plot.box from None to 'axes'.

@jorisvandenbossche jorisvandenbossche added Visualization plotting Deprecate Functionality to remove in pandas labels Feb 3, 2016
@jorisvandenbossche jorisvandenbossche added this to the 0.18.0 milestone Feb 3, 2016
@jorisvandenbossche
Copy link
Member

Tests still need to be updated

@jreback
Copy link
Contributor

jreback commented Feb 9, 2016

@TomAugspurger pls update

@jreback
Copy link
Contributor

jreback commented Feb 12, 2016

@TomAugspurger can you update

@TomAugspurger
Copy link
Contributor Author

Still working on this. It looks I didn't properly deprecate the df.boxplot(by=) (or my docs are wrong). According to the docs by= is supposed to return an ordered dict of whatever return_type is. But it's returning a numpy array instead. I'll work on this more tonight, sorry.

@jreback
Copy link
Contributor

jreback commented Feb 15, 2016

@TomAugspurger if you can get to this would be great, otherwise push to 0.19 :<

@TomAugspurger
Copy link
Contributor Author

I’ll update tonight or tomorrow morning. Might want to push anyway since I’ve found a discrepancy between our documented behavior and actual behavior for groupby.boxplot

On Feb 15, 2016, at 2:22 PM, Jeff Reback notifications@github.com wrote:

@TomAugspurger https://github.com/TomAugspurger if you can get to this would be great, otherwise push to 0.19 :<


Reply to this email directly or view it on GitHub #12216 (comment).

@jreback
Copy link
Contributor

jreback commented Feb 15, 2016

@TomAugspurger ok, we do have a little time, and this has been around for a while. I'll keep it here and move later (or you decide its a real stumbling block)

@TomAugspurger
Copy link
Contributor Author

Well, here's a not so great thing. I think we'll have to keep the behavior of return_type=None not being the same as return_type='axes' when by is not None.

# before
a = df.groupby('d').boxplot()  # OD[dict]  warned
b = df.boxplot()               # dict, warned
c = df.boxplot(by='d')         # [axes], didn't warn
c = df.boxplot(by='d', return_type='axes') # OD[axes]

and

# after
a = df.groupby('d').boxplot()  # OD[axes]
b = df.boxplot()               # axes
c = df.boxplot(by='d')         # [axes]
c = df.boxplot(by='d', return_type='axes') # OD[axes]

I had hoped to just make the default 'axes', but since I din't warn in the 3rd example, I don't think we can change that. Are we OK with that? Here's the updated docs:

Plot functions return scalar or arrays of :class:`matplotlib Axes <matplotlib.axes.Axes>`.
In ``boxplot``, the return type can be controlled by the ``return_type``, keyword. The valid choices are ``{"axes", "dict", "both"}``.
``'axes'`` returns a single matplotlib axes.
``'dict'`` returns a dict of matplotlib artists, similar to the matplotlib boxplot function.
``'both'`` returns a named tuple of axes and dicts.
When ``by`` is not None, you get back an ``OrderedDict`` of whatever ``return_type`` is, unless ``return_type`` is just ``None``, in which case you get back an array of axes.

Finally, when calling boxplot on a :class:`Groupby` object, a dict of ``return_type``
Plot functions return scalar or arrays of :class:`matplotlib Axes <matplotlib.axes.Axes>`.
In ``boxplot``, the return type can be controlled by the ``return_type``, keyword. The valid choices are ``{"axes", "dict", "both"}``.
``'axes'`` returns a single matplotlib axes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you keep those as a list?

@jorisvandenbossche
Copy link
Member

(maybe not fully on topic, but) Is there actually a reason we are using OrderDicts here ? For example why not a series? This would give both the key access of an OrderDict as the integer indexing of a an array (and seems more natural in pandas)

@jorisvandenbossche
Copy link
Member

On returning a series: that is also what df.groupby('d').plot() does, so it seems logical that df.groupby('d').boxplot() does the same?

@jorisvandenbossche
Copy link
Member

But more on topic: wouldn't it be possible to raise that warning now for the third case? (but keeps its behaviour as you proposed) So we can maybe in a future release align the behaviour fully.

Basically, plot functions return :class:`matplotlib Axes <matplotlib.axes.Axes>` as a return value.
In ``boxplot``, the return type can be changed by argument ``return_type``, and whether the subplots is enabled (``subplots=True`` in ``plot`` or ``by`` is specified in ``boxplot``).
Plot functions return scalar or arrays of :class:`matplotlib Axes <matplotlib.axes.Axes>`.
In ``boxplot``, the return type can be controlled by the ``return_type``, keyword. The valid choices are ``{"axes", "dict", "both"}``. If the ``by`` argument is ``None``,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would add a warning box that this is changed in 0.18.0

@jreback
Copy link
Contributor

jreback commented Feb 22, 2016

@TomAugspurger lgtm. just some small doc comments. ping when updated.

@jreback
Copy link
Contributor

jreback commented Feb 23, 2016

lgtm.

@jorisvandenbossche ?

@TomAugspurger
Copy link
Contributor Author

Forgot to send the post I had started last night,

I could deprecate the third case (df.boxplot(by='d') -> [axes]), but at this point I feel a bit defeated by our boxplot implementations, plural, since we have 3(!), and none is equivalent to any other. It'd be nice to have them all aligned but I kind of feel like I've done enough API breaking changes here, especially when people should just use seaborn.

If we aren't warning on that case, this is good to go.

@jorisvandenbossche
Copy link
Member

@TomAugspurger About my comment on using Series instead of OrderedDict, wouldn't that be a possible way to just change it?

To be clear: the third case is c = df.boxplot(by='d'), it currently gives you an array of axes (if there are multiple columns) and ideally we would like that it returns an OrderedDict of axes to be consistent with a default of 'axes'.
Suppose we would return a Series of axes, wouldn't that cover most of the use cases for both cases? Or would a lot of people relied on the fact it was an actual numpy array it returned? I would think most use is accessing one of the axes with c[0], which would still work with a Series as return value.

@TomAugspurger
Copy link
Contributor Author

@jorisvandenbossche yes, that makes much more sense. It's not clear to me the best way to get to there though. Do we return a Series for c = df.boxplot(by='d') now, or have another FutureWarning?

@jorisvandenbossche
Copy link
Member

To answer that question, we should have an idea to which extent it would break people's code. And that is of course difficult to guess, but I think in the case of c = df.boxplot(by='d') return an array of axes, if you do something with that return value, it will typically be accessing one of the axes to modify the subplot further with matplotlib. That case would keep working with returning a Series.
Of course code like isintance(c, np.ndarray) would break, but would people do that?

The questions is also, do we return Series for the cases where now already OrderedDicts are returned? Again, the most logical usecase (accessing a value by its key) keeps working, but things might not.

@TomAugspurger
Copy link
Contributor Author

My thinking was similar, the most common operations upon receiving the array or dict of axes is to 1. do nothing, 2. select individual axes by position or key. Thankfully Series handles both of those.

I'll update tonight to return Series of return_type wherever we returned an OrderedDict / numpy array before. It'll be nice to have this in a somewhat consistent and stable state.

* if ``return_type`` is ``'both'`` a namedtuple containing the :class:`matplotlib Axes <matplotlib.axes.Axes>`
and :class:`matplotlib Lines <matplotlib.lines.Line2D>` is returned
Faceted boxplots, created by ``Groupby.boxplot`` or ``DataFrame.boxplot`` with the ``by``
keyword will return either a NumPy array of ``axes``, when ``return_type='axes'``,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true in case of DataFrame.boxplot, not for Groupby.boxplot (unless you changed that)

@TomAugspurger TomAugspurger force-pushed the boxplot-depr branch 2 times, most recently from 51731a8 to 29c8efc Compare August 21, 2016 13:18
@TomAugspurger
Copy link
Contributor Author

Ok, this is probably more info than anyone cares about, but

The key one to note is the difference between return_type=None and return_type='axes' in .boxplot(by='column'). The first returns an array of axes, the second returns a Series / dict of axes.

method faceted keyword 0.18 output 0.19 output notes
.plot.box No None axes axes
.plot.box No axes axes axes
.plot.box No dict dict dict
.plot.box No both both both
.plot.box by None axes axes no facet
.plot.box by axes axes axes no facet
.plot.box by dict dict dict no facet
.plot.box by both both both no facet
--------- -------- -------- ----------- ------------ --------
.boxplot No None dict axes depr warn
.boxplot No axes axes axes
.boxplot No dict dict dict
.boxplot No both both both
.boxplot by None axes axes no depr
.boxplot by axes ODict[axes] Series[axes] !
.boxplot by dict ODict[dict] Sereis[dict]
.boxplot by both ODict[both] Series[both]
--------- -------- -------- ----------- ------------ --------
.groupby yes None ODict[dict] Series[axes] depr warn
.groupby yes axes ODict[axes] Series[axes]
.groupby yes dict ODict[dict] Series[dict]
.groupby yes both Odict[both] Series[both]
--------- -------- -------- ----------- ------------ --------

Since we didn't deprecate the difference there, I'll leave it in place. Will get this all cleaned up tonight and submit for review. Sorry this has taken so long :/

@jreback
Copy link
Contributor

jreback commented Aug 23, 2016

is there a reason we r not deprecating
.boxplot in favor of .plot.box

seems otherwise this is very confusing and not consistent

@TomAugspurger
Copy link
Contributor Author

Updated, Travis is green. To summarize the changes:

  1. Enforce the deprecation on .boxplot and .groupby.boxplot return_type. Previously it was 'dict', now it's 'axes'.
  2. Return Series instead of OrderedDicts from faceted boxplots, which is more consistent with other plotting methods. The exception is .boxplot(by='col') with the default return_type=None, which continues to return a 2-D NumPy array

seems otherwise this is very confusing and not consistent

Extremely 😞 I believe that after this deprecation we should have a clear path forward API-wise, unless I'm missing something.


Most plot functions return a scalar or an array of :class:`matplotlib Axes <matplotlib.axes.Axes>`.
In ``boxplot``, the return type can be controlled by the ``return_type``, keyword. The valid choices are ``{"axes", "dict", "both"}``. If the ``by`` argument is ``None``,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this with the table below. IOW would rather see the table, then comment if needed on particular ones?

@TomAugspurger
Copy link
Contributor Author

FYI the only CI failure here was codecov.

When grouping with ``by``, a dict mapping columns to ``return_type``
is returned.
When grouping with ``by``, a Series mapping columns to ``return_type``
is returned, unless ``return_type`` is None, in which case a NumPy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a link to the docs here

@jreback
Copy link
Contributor

jreback commented Sep 2, 2016

couple comments, but lgtm. merge on green. this has been outstanding for quite a while, let's just put it in.

TomAugspurger and others added 2 commits September 3, 2016 14:30
Part of pandas-dev#6581
Deprecation started in pandas-dev#7096

Changes the default value of `return_type` in DataFrame.boxplot
and DataFrame.plot.box from None to 'axes'.
Aligns behavior of `Groupby.boxplot` and DataFrame.boxplot(by=.)
to return a Series.
@jorisvandenbossche
Copy link
Member

@TomAugspurger Thanks!

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.19.0rc, 0.19.0 Sep 7, 2016
@TomAugspurger TomAugspurger deleted the boxplot-depr branch April 5, 2017 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Visualization plotting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants