Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrameGroupBy.boxplot with subplots=False fails when using column param #16748

Closed
tdpetrou opened this issue Jun 21, 2017 · 7 comments · Fixed by #28102
Closed

DataFrameGroupBy.boxplot with subplots=False fails when using column param #16748

tdpetrou opened this issue Jun 21, 2017 · 7 comments · Fixed by #28102
Milestone

Comments

@tdpetrou
Copy link
Contributor

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({'cat':np.random.choice(list('abcde'), 100), 
                    'v':np.random.rand(100), 
                    'v1':np.random.rand(100)})
df.groupby('cat').boxplot(subplots=False, column='v')

outputs
KeyError: "['v'] not in index"

Problem description

The boxplot works when either subplots=False or column='v' but not when they are both specified.

Expected Output

A single axes plot with each group having its own boxplot. The column 'cat' would label the x-axis.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.2
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.13.0
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: 1.5.5
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.3.0.post

@adhaamehab
Copy link

How could i start working on this issue ?

@arpheno
Copy link

arpheno commented Oct 18, 2017

Experiencing the same issue on pandas 0.20.3

@fuglede
Copy link

fuglede commented Oct 18, 2017

I also ran into the issue on 0.20.3. Inspired by your own Stack Overflow post, I ended up going with the hacky solution of first boiling down the data frame to what's necessary, plotting through

df[['cat', 'v']].groupby('cat').boxplot(subplots=False)

@TomAugspurger
Copy link
Contributor

With this diff, the example seems to work

diff --git a/pandas/plotting/_core.py b/pandas/plotting/_core.py
index 0d77b5f41..eead95ad3 100644
--- a/pandas/plotting/_core.py
+++ b/pandas/plotting/_core.py
@@ -2373,6 +2373,8 @@ def boxplot_frame_groupby(grouped, subplots=True, column=None, fontsize=None,
                             right=0.9, wspace=0.2)
     else:
         from pandas.core.reshape.concat import concat
+        from pandas import IndexSlice
+
         keys, frames = zip(*grouped)
         if grouped.axis == 0:
             df = concat(frames, keys=keys, axis=1)
@@ -2381,7 +2383,9 @@ def boxplot_frame_groupby(grouped, subplots=True, column=None, fontsize=None,
                 df = frames[0].join(frames[1::])
             else:
                 df = frames[0]
-        ret = df.boxplot(column=column, fontsize=fontsize, rot=rot,
+        if column:
+            df = df.loc[:, IndexSlice[:, column]]
+        ret = df.boxplot(fontsize=fontsize, rot=rot,
                          grid=grid, ax=ax, figsize=figsize,
                          layout=layout, **kwds)
     return ret

gh

Is anyone able to verify that that's correct, and that that fix doesn't break anything else? @ adhaamehab are you still interested in working on this issue (sorry I didn't see your comment the first time around).

@TomAugspurger
Copy link
Contributor

In particular, I haven't explored how that change interacts with the other options to groupby boxplot. I'm not familiar with that section of the codebase.

@TomAugspurger TomAugspurger added this to the Next Major Release milestone Oct 18, 2017
@tiffanygwilson
Copy link

tiffanygwilson commented Jun 7, 2018

I am also experiencing this issue in pandas 0.20.3. Is anybody working on it? @TomAugspurger the plot for the fix from 18 Oct 2017 looks good except that I think the x-axis labels should just be (for example) a instead of (a, v) since all labels contain v. The hack mentioned by @fuglede works but now throws numpy FutureWarning: reshape is deprecated and will raise in a subsequent release. Please use .values.reshape(...) instead return getattr(obj, method)(*args, **kwds).

@takekazuomi
Copy link

I have same issue in pandas 0.23.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants