Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixing pandas.DataFrame.plot(): labels do not appear in legend and label kwd #9574

Merged
merged 1 commit into from
Mar 31, 2015

Conversation

schmohlio
Copy link

Closes #9542, #8905

The following behavior has been tested:

  • df.plot(y='sin(x)') -> gives a legend with label 'None' -> this should give no legend instead (as it plots one series, and then we don't automatically add a legend, see behaviour of df['sin(x)'].plot())
  • df.plot(y='sin(x)', legend=True) -> gives a legend with label 'None' -> this should of course give a legend with label 'sin(x)' (behaviour as df['sin(x)'].plot(legend=True))
  • df.plot(y='sin(x)', label='something else', legend=True) -> gives a legend with label 'None' -> should be a legend with label 'something else', as we want that the label kwarg overwrites the column name.

based on following data:
x=np.linspace(-10,10,201)
y,z=np.sin(x),np.cos(x)
x,y,z=pd.Series(x),pd.Series(y),pd.Series(z)
df=pd.concat([x,y,z],axis=1)
df.columns=['x','sin(x)','cos(x)']
df=df.set_index('x')

@schmohlio
Copy link
Author

not sure why there would be a test_url failure in the build... also not used to Travis - did the build properly incorporate both commits?

@jorisvandenbossche
Copy link
Member

@schmohlio I think you can ignore that test_url error, it does not seem to be related to this PR.

You can indeed put the tests in test_graphics.
I did not yet look in detail, but a quick feedback: I don't think it should be necessary to change the default value for legend. Further, is it needed to change the existing tests?

cc @TomAugspurger @sinhrks

@schmohlio
Copy link
Author

Thanks for looking @jorisvandenbossche

Is there an elegant way to check for user supplied arg (legend=) when default is not None?

Didn't spend too much time on tests, but I made changes to align plot_frame with plot_series. Previously, label wasn't getting passed through at all + it was overwriting index name within plot_frame. Per #9542, it probably doesn't make sense to mutate the index name with label=, which is reflected in test changes. That previous design flaw was exacerbated when composite .plot calls are made, and the x axis only uses label= arg from the latest call.

Will await more feedback @TomAugspurger @sinhrks . Thanks again for letting me contribute!

@@ -886,10 +891,11 @@ def _iter_data(self, data=None, keep_index=False, fillna=None):

from pandas.core.frame import DataFrame
if isinstance(data, (Series, np.ndarray, Index)):
label = self.label if self.label else data.name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to be careful here. We need to treat False differently than None, in case someone wants the actual label in the legend to be the text "False". do label = self.label if self.label is not None else data.name.

@TomAugspurger
Copy link
Contributor

I'll try to look more later.

@schmohlio
Copy link
Author

Good catch.

@@ -816,7 +816,12 @@ def __init__(self, data, kind=None, by=None, subplots=False, sharex=True,
grid = False if secondary_y else True

self.grid = grid
self.legend = legend
if legend is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this if block? I'm not sure I see the point of it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original point of this was decide if a label should be rendered based on whether there was a valid label provided or if there was more than 1 series. I can revert back the defaults, though, and make legend show default series names if no label is provided.

@TomAugspurger
Copy link
Contributor

I agree with Joris that the default for legend doesn't need to change.

@jreback jreback added this to the 0.16.1 milestone Mar 6, 2015
@@ -2518,12 +2512,12 @@ def test_df_legend_labels(self):
self._check_legend_labels(ax, labels=['a', 'b (right)', 'c', 'g', 'h', 'i'])

# scatter
ax = df.plot(kind='scatter', x='a', y='b', label='data1')
ax = df.plot(kind='scatter', x='a', y='b', label='data1', legend=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you change the default for legend back to True you can remove the legend=Trues that you added.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@schmohlio
Copy link
Author

@jorisvandenbossche @TomAugspurger - here is a summary. Pushing the following in a few:

  1. Revert to default legend=True.
    • this means that even if no label= arg is provided by the user and only 1 series is plotted, the legend will still show something based on reindex_like function #2. This will just be a difference for Series and DataFrames.
    • Previously I had introduced logic based on the number of series.
    • will revert back legend=True on test edits
  2. Ensure the following logic for coalescing the legend labels:
    • check label= arg (and make sure it is passed through)--> check series name property (y label, column name, etc)
  3. Keep index name test removed, and keep other tests that check legend. Maybe expand upon these.

@TomAugspurger
Copy link
Contributor

Thanks for the summary. That all looks perfect.

When you push again could you squash things down to a single commit and add a note in doc/source/whatsnew/v0.16.0. Thanks.

@schmohlio
Copy link
Author

I added tests to the proper check legend labels method. Altogether, this test method validates all the changes we discussed. Now the label= arg functions intuitively. I also added a check to make sure there is no column mutation. Thanks for patience.

Pushing to branch now @TomAugspurger @jorisvandenbossche

@TomAugspurger
Copy link
Contributor

Great thanks. If you could just rebase + squash and add a release not I'll get it merged.

@schmohlio
Copy link
Author

Hey @TomAugspurger Tom - just squashed my commits, rebased, pulled, and pushed. Hope that's okay.

@jorisvandenbossche
Copy link
Member

Is it possible to keep the behaviour as before? So that when plotting one series by specifying the y kwarg no legend is shown? (although I don't know if this is the best behaviour, but when plotting one series, it looks to me as the name of the series should be the ylabel)

@schmohlio
Copy link
Author

@jorisvandenbossche - right now a legend will appear with the ylabel as the label, unless the user explicitly supplies legend=False. I agreed before but seeing it in action now it almost seems intuitive to have the legend default to true. If we are going to keep the default to True it seems difficult to cleanly override defaults in case the user wants the legend.

@schmohlio
Copy link
Author

The tests also line up, seemingly making this the smallest incremental change that fixes the labels not getting passed through. What if we added another ticket for adding a legend to series plots?

@TomAugspurger I think I see what you mean by release notes- will add those. does everything else look ok?

@schmohlio
Copy link
Author

@TomAugspurger I added release notes and squashed my commits properly. Figured we could go from there in a cleaner manner.

@@ -546,6 +546,37 @@ Bug Fixes
- Bug in ``rank`` where comparing floats with tolerance will cause inconsistent behaviour (:issue:`8365`).
- Fixed character encoding bug in ``read_stata`` and ``StataReader`` when loading data from a URL (:issue:`9231`).
- Bug in adding ``offsets.Nano`` to other offets raises ``TypeError`` (:issue:`9284`)
<<<<<<< HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for rebasing. You can remove this line, the one below it, and the one on 579. git puts these in when the same line has been edited on your branch and what you're rebasing onto.

@TomAugspurger
Copy link
Contributor

I think we're really close. Code-wise everything looks good. Just those couple things in the release notes. If you want to just rebase one more time :)

I'll be in our chat room at https://gitter.im/pydata/pandas Give me a shout if you have any trouble.

@TomAugspurger
Copy link
Contributor

And @schmohlio looks like you'll be one of the first ones in 0.16.1. Could you make a new file at doc/source/whatsnew/v0.16.1.txt And paste this in:

EDIT: the formatting was a bit odd here. Paste in the text from this gist: https://gist.github.com/TomAugspurger/7f5061219243a24ac31f

Then you can move your section in doc/source/whatsnew/v0.16.0 to that new file. Thanks.

@schmohlio
Copy link
Author

cool this looks good. will shorten and rebase with new template.

@jreback
Copy link
Contributor

jreback commented Mar 20, 2015

whats new is now available for 0.16.1 5ebf521

@schmohlio
Copy link
Author

@jreback should i remove the whatsnew file I had added?

@jreback
Copy link
Contributor

jreback commented Mar 20, 2015

yes, when you pull and rebase you will get the new one

TomAugspurger pushed a commit that referenced this pull request Mar 31, 2015
fixing pandas.DataFrame.plot(): labels do not appear in legend and label kwd
@TomAugspurger TomAugspurger merged commit a004c59 into pandas-dev:master Mar 31, 2015
@TomAugspurger
Copy link
Contributor

Thanks @schmohlio!

Looks like this was your first PR for pandas. Stick around if you find some more bugs :)

@schmohlio
Copy link
Author

@TomAugspurger thanks to you! will look out for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pandas.DataFrame.plot(): Labels do not appear in legend
4 participants