Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Violin plots? #5676

Closed
olgabot opened this issue Dec 11, 2013 · 19 comments
Closed

Violin plots? #5676

olgabot opened this issue Dec 11, 2013 · 19 comments

Comments

@olgabot
Copy link

olgabot commented Dec 11, 2013

Besides boxplots, another way to look at the distribution of data is by violin plots:

http://nbviewer.ipython.org/gist/olgabot/7902901

I'd like to add this to pandas. Is there interest in these plots? It currently depends on my prettyplotlib but that can be removed. My concern is that it depends on scipy.stats.gaussian_kde - is that fine?

@TomAugspurger
Copy link
Contributor

Statsmodels has them. I think the idea is to keep the more statistical things there/maybe move some pandas stats functionality over.

@hayd
Copy link
Contributor

hayd commented Dec 11, 2013

These look fun! I think that dep is fine (similar to elsewhere in codebase: https://github.com/pydata/pandas/blob/0327addef295fef292f2fdf8c95546c4cc039abb/pandas/tools/rplot.py#L528)

Edit: I posted this before seeing @TomAugspurger's comment...

@jtratner
Copy link
Contributor

except that the rplot module isn't very well maintained and is quite slow.

@olgabot
Copy link
Author

olgabot commented Dec 11, 2013

Ah yes statsmodels. I looked at it but didn't end up using them because you couldn't specify the kernel bandwidth.Theirs is quite nice because you can do the left/right distributions for direct comparisons. Maybe I'll just add a PR there that adds a bandwidth specification to the args.

@jtratner
Copy link
Contributor

If they already have some of this, probably better to move it to there
instead. Statsmodels has pandas as a dep so you can still use pandas stuff.

@ghost
Copy link

ghost commented Dec 12, 2013

Note that rplot, in retrospect, was a controversial merge precisely for the reason @TomAugspurger mentioned.
So that's indeed the guideline we've tried (or should have) to stick to in the past.

@jtratner , yep, it was in PR limbo for the longest while and I wasn't aware of the statsmodels issue
which is why I prodded for it's merge. It was, in fairness, a sizeable piece of code to contribute. #2015, statsmodels/statsmodels#692

@olgabot
Copy link
Author

olgabot commented Dec 12, 2013

Okay great! I'll stick to statsmodels for the violinplots. Should other stuff like the clustergrams/clustered heatmaps also be there?

@ghost
Copy link

ghost commented Dec 12, 2013

It's a fuzzy boundary. The plots are beautiful but are far more bespoke then pandas' current meat
and potatoes stance, they do stick out a bit among the plaid and denim.

@ghost
Copy link

ghost commented Dec 12, 2013

We might call them "2nd order plots" maybe? :)

@jreback
Copy link
Contributor

jreback commented Dec 12, 2013

maybe its worthwhile to refactor out these secondary plots from both pandas and statsmodels and make a common import? (so in effect available in both, but not code dupe)

@ghost
Copy link

ghost commented Dec 12, 2013

Where's the all-singing, all-dancing pydata exploratory data viz library??

@olgabot
Copy link
Author

olgabot commented Dec 12, 2013

pydataviz? aka every data analyst's ipython notebook and codebase...

@dragoljub
Copy link

I'm hoping https://github.com/vispy/vispy will be the beginning of such a plotting library. Plotting 100M points a second with a GPU beats the 25 seconds plus it takes for matplotlib right now. 😄

@ghost
Copy link

ghost commented Dec 14, 2013

@olgabot, I see ggplot2 supports these plots as a distinct geom, would be nice to have these
available in http://github.com/yhat/ggplot. I'm rooting for them.

cc @JanSchulz

@ghost
Copy link

ghost commented Dec 14, 2013

#783 still needs a champion, that plot should fit right in to pandas.

@jtratner
Copy link
Contributor

@y-p @cpcloud @jreback and others - so what's the deal with this PR, dendogram PR, #5700, and #783? Are those things in scope for pandas? I'm not clear from the discussion where we stand on that...

@olgabot
Copy link
Author

olgabot commented Dec 14, 2013

Violin plots I'm working on in statsmodels so I'll close this one.

@olgabot olgabot closed this as completed Dec 14, 2013
@ghost
Copy link

ghost commented Dec 15, 2013

@jtratner, no decrees here - just opinion. I think #783 is a reasonable addition to include in pandas,
while the dendogram and violin plots are great but belong in a more specialized context.
Also, I think it'd be a good idea to pool these efforts towards building up viz capabilities
behind a single project that plays nicely with pandas.
Not necessarily a pandas dependency, mind you, just as long as it's easy to use them together.

Please feel free to disagree loudly.

@ghost
Copy link

ghost commented Jan 29, 2014

Noting that these seem to have landed in seaborn, and are part of ggplot and so
should land eventually in the python port if they haven't already.

Relieved I'm not just shooting down functionality. These actually find their way to
a better home... :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants