Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError when running the create_dendrogram example #2627

Merged
merged 4 commits into from
Jul 13, 2020

Conversation

nicholas-esterer
Copy link
Contributor

closes #2618

With the new color names ('C0','C1', 'C2', ...) I only added 5 colours whereas with the old names 8 colours were added. Looking at matplotlib.rcParams['axes.prop_cycle'] it seems up to 'C9' is possible, so do we add mappings for colours 'C5' to 'C9'?

@nicolaskruchten
Copy link
Contributor

So with scipy 1.4.1, if I made a plotly dendrogram with 10 colors, which colours would they be?

@nicolaskruchten
Copy link
Contributor

(small request: could you please name your branches and your PRs a little more descriptively? a good branch name here would be fix2618_dendrogram and a good PR name would be "Fix ff.create_dendrogram with scipy 1.5+" or something like this :)

@jonmmease jonmmease mentioned this pull request Jul 9, 2020
2 tasks
@nicolaskruchten
Copy link
Contributor

To answer my own question, here's what the output looks like today with scipy 1.4.1:

image

@nicolaskruchten
Copy link
Contributor

This same code fails on this branch with scipy 1.5.1 with the error KeyError: 'C5', but ideally it would have the same output as above.

@nicolaskruchten
Copy link
Contributor

Final note: let's slightly improve the docstring and doc page for this thing so that it's clearer that this is a thin thin wrapper around https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.dendrogram.html :)

I might actually say this just visualizes the output of

from scipy.cluster.hierarchy import dendrogram
dendrogram(
    Z=linkagefun(distfun(X)),
    orientation=orientation,
    labels=labels,
    color_threshold=color_threshold,
)

@nicolaskruchten nicolaskruchten added this to the 4.9.0 milestone Jul 9, 2020
@nicholas-esterer
Copy link
Contributor Author

So sadly I don't think we can get exactly the same color sequence with scipy==1.5.1, because the Cn color sequence and 'r','g','b' etc. are actually not isomorphic, but it's pretty close:

image
image

(number shown is the scipy version number)

@nicholas-esterer nicholas-esterer changed the title Issue 2618 KeyError when running the create_dendrogram example Jul 9, 2020
@nicolaskruchten
Copy link
Contributor

That is indeed pretty close! I'm surprised that it's that close without being identical :)

@nicolaskruchten
Copy link
Contributor

Oh, I see, you're repeating "red, green, cyan" at the end of the cycle?

@nicholas-esterer
Copy link
Contributor Author

I don't think so, I just tried to find a mapping that made the change subtle. It was really hard to figure out exactly the relationship between the 1.4.1 colors and the 1.5.1 colors, but if you compare the sequences for a large dendrogram (in the old version a list of 'r','g', etc., in the new version a list of 'c1', 'c2', etc), there are cns that map to multiple old colors. But I think the mapping I chose keeps the color change pretty minimized between version updates. I'll submit a PR in a few minutes and you can see exactly what I did.

@nicholas-esterer
Copy link
Contributor Author

I also improved the documentation for the colorscale argument. You might think the documentation is weird, but this is literally what the argument does :O !

@@ -32,7 +32,29 @@ def create_dendrogram(
:param (ndarray) X: Matrix of observations as array of arrays
:param (str) orientation: 'top', 'right', 'bottom', or 'left'
:param (list) labels: List of axis category labels(observation labels)
:param (list) colorscale: Optional colorscale for dendrogram tree
:param (list) colorscale: Optional colorscale for dendrogram tree. To
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably too detailed an explanation... most people will likely not want to follow the indirection and go read up on what scipy does here. I think just saying "should be 8 colors" is sufficient :)

Copy link
Contributor

@nicolaskruchten nicolaskruchten Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should mention that no matter the version of scipy, the 7th color in the scale is ignored (note: this is because the w key isn't outputted by scipy), and in scipy 15+, colors 2, 3 and 6 are used twice as often as the others. 🤦

@nicolaskruchten
Copy link
Contributor

OK, let's just update the docstring with my terser version and merge this.

The default colorscale for _dendrogram contains color names compatible
with the default colors given by scipy===1.5.0. It is still backwards
compatible with older scipy versions.
This is done for ff.create_dendrogram. It was tried as much as possible
to preserve the old color sequence, but this was not possible.

Also improved the documentation of the colorscale argument.
@nicholas-esterer nicholas-esterer changed the base branch from nocolor to master July 10, 2020 19:31
:param (list) colorscale: Optional colorscale for the dendrogram tree. With
scipy<=1.4.1 requires 8 colors to be specified,
the 7th of which is ignored. With scipy>=1.5.0,
requires 10 colors. In this case the 8th color is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so actually we only ever want 8 here, and the 7th is always ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right?

Copy link
Contributor Author

@nicholas-esterer nicholas-esterer Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right what I wrote this time is actually wrong. Yes we always want 8, and the 7th is ignored. How about this:
:param (list) colorscale: Optional colorscale for the dendrogram tree.
Requires 8 colors to be specified, the 7th of which is ignored. With scipy>=1.5.0, the 2nd, 3rd and 6th are used twice as often as the others.Given a shorter list, the missing values are replaced with defaults and with a longer list the extra values are ignored.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 💃 once done

@nicholas-esterer nicholas-esterer merged commit f37fd3c into master Jul 13, 2020
@nicholas-esterer nicholas-esterer deleted the issue-2618 branch July 13, 2020 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error when running the create_dendrogram example
2 participants