Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sunburst/treemap path #2006

Merged
merged 34 commits into from
Jan 22, 2020
Merged

Sunburst/treemap path #2006

merged 34 commits into from
Jan 22, 2020

Conversation

emmanuelle
Copy link
Contributor

@emmanuelle emmanuelle commented Dec 16, 2019

Still WIP, TODO

  • write doc
  • discuss the different cases we want to support
  • discuss case of hover columns

@@ -82,6 +82,7 @@
colref_desc,
"Values from this column or array_like are used to set ids of sectors",
],
path=[colref_type, colref_desc],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hehe yes :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops yes. The PR is not completely ready but I'll still appreciate your thoughts about the API and how it's implemented, then I will polish if this is the right direction, unless there is some refactoring to do first.

@mwouts
Copy link

mwouts commented Dec 18, 2019

Thank you @emmanuelle for working on this! This will make is way easier for everyone to produce these beautiful plots, and I will certainly be a user of this.

I have a few remarks/questions:

  • Would it make sense to could extend this to non-rectangular trees by using 'None' in the path?
  • Sometime want to compute aggregates on all nodes, not only for the 'value', but also for the hovertext, like the life expectancy in the plot below. Do you think we could cover this use case with a dataframe? Maybe with intermediate nodes? (I was using a function)
    image
  • (technical) do you want to add epsilon when you compute the sum, until branchvalues='total' too strict plotly.js#4405 gets fixed? See Sunburst plots only work when the data has type 'int' mwouts/easyplotly#3 for an example that produces an empty plot because of float comparisons issues.

@nicolaskruchten
Copy link
Contributor

Thanks for taking a look @mwouts !

Re intermediate nodes, I think we should try to accept null/None/NaN for trailing path elements somehow, yes.

Re values for intermediate nodes, let's see how far we can get by having users provide them with the approach above. We could accept an aggregating function as a parameter eventually if this is too limiting :)

Also, I don't think we should add an epsilon, as this will be released as part of 4.5.0, which will include the fix for plotly/plotly.js#4405

@emmanuelle
Copy link
Contributor Author

Thanks for taking a look @mwouts ! @nicolaskruchten you were mentioning that we could accept a dictionary of lambda functions for doing the aggregation, is this what you had in mind?

@emmanuelle
Copy link
Contributor Author

In order to address the case of lines with None, maybe we should have branchvalues='remainder' as default and when doing the groupby, put the value of the parent to 0 is it's not already specified? (instead of computing the sum as it is implemented now?).

@emmanuelle
Copy link
Contributor Author

@mwouts we were also looking for a way to address the case of https://plot.ly/python/sunburst-charts/#sunburst-chart-with-a-continuous-colorscale but it seems hard to use a generic API for this...

@emmanuelle
Copy link
Contributor Author

After all I included an EPS fix because otherwise my doc example (with tips) did not work! On second thoughts, I also kept the groupby mechanism (including computing the sum of children) since if we want to generalize to more complex aggregations we will have a similar syntax, whereas switching to a remainder mode implied a somewhat different code.

@emmanuelle
Copy link
Contributor Author

I wrote the doc for sunburst but not for treemap since it will be mostly a copy so I'll update after a round of review.

@mwouts
Copy link

mwouts commented Dec 18, 2019

Thanks! Glad that helps!

@mwouts we were also looking for a way to address the case of https://plot.ly/python/sunburst-charts/#sunburst-chart-with-a-continuous-colorscale but it seems hard to use a generic API for this...

Well don't you think the same approach, generalized to magic underscore notation, could work? In my example (scroll to the second treemap plot) I have defined marker_color as a function, but I believe a column of the dataframe (maybe extended to include the parent nodes) could work as well.

@emmanuelle emmanuelle changed the title [WIP] Sunburst/treemap path Sunburst/treemap path Dec 20, 2019
@nicolaskruchten nicolaskruchten added this to the v4.5.0 milestone Jan 9, 2020
@nicolaskruchten
Copy link
Contributor

nicolaskruchten commented Jan 17, 2020

Some feedback to start with:

  1. The path should go from parent to child rather than the other way around
  2. For the continuous-color case, it makes sense to me that for "generated" parent nodes (i.e. ones that don't have a dedicated row whose path ends in None!) for the color of the parent to be the value-weighted average of the children's color. (NB color here is a scalar!)
  3. For the discrete-color case, it's trickier: there's no sensible 'implicit' behaviour that I can think of other than imputing some dummy value to color by, or to error out

@nicolaskruchten
Copy link
Contributor

Overall this is really, really awesome though, I love the symmetry with parcoords/parcats/splom !

@nicolaskruchten
Copy link
Contributor

PS: I think we can probably use a similar API for px.sankey

@nicolaskruchten nicolaskruchten mentioned this pull request Jan 17, 2020
@nicolaskruchten
Copy link
Contributor

As a motivating example for the implicit behaviour in continuous color, think about a case like px.sunburst(gapminder2007, path=["continent", "country"], values="pop", color="lifeExp")

With my scheme, the continent sectors would be colored by the population-weighted average life expectancy, which is pretty much what you'd want. Ditto for values="gdpPercap" etc.

@emmanuelle
Copy link
Contributor Author

Question about color: if color_continuous_scale is passed but color=None, then a discrete color sequence will be used. Is it ok or should we take the values argument instead? I'd rather keep the current behaviour (not too much magics) but just checking.

@nicolaskruchten
Copy link
Contributor

Yep the current behaviour is fine when color is not passed IMO

for i, level in enumerate(path):
df_tree = pd.DataFrame(columns=df_all_trees.columns)
if not agg_f:
dfg = df.groupby(path[i:]).sum(numerical_only=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

under what circumstances do we do this, and what's the reasoning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's if neither values nor color was passed. I think I need to use some sort of aggregation function because after I call reset_index to get the groups, but maybe it's possible to do it in a more elegant/efficient way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could get rid of this part with the count_colname logic.

@nicolaskruchten
Copy link
Contributor

This causes an error: px.sunburst(df, path=["continent", "country"], values="pop", color="continent")

Copy link
Contributor Author

@emmanuelle emmanuelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have addressed all your comments!

@emmanuelle emmanuelle merged commit 705ec3b into master Jan 22, 2020
@emmanuelle emmanuelle deleted the sunburst-path branch February 3, 2020 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants