Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add consistent support for categorical axes in bokeh #1089

Merged
merged 24 commits into from
Jan 30, 2017

Conversation

philippjfr
Copy link
Member

@philippjfr philippjfr commented Jan 28, 2017

Previously categorical axes weren't really handled in bokeh causing issues when trying to Overlay HeatMaps or other categorical Elements. We can now update categorical axes with new factors which makes it possible to animate HeatMaps with varying density:

heatmap_animated

We can also specify and overlay categorical curves and points:

hv.Curve((['A', 'B', 'C'], (1,2, 3))) *\
hv.Curve((['A', 'B', 'C'], (3,2, 1))) *\
hv.Points((['B', 'C', 'D'], (2.5, 2, 3)))

screen shot 2017-01-29 at 2 05 05 pm

And we can also overlay on top of a HeatMap:

hv.HeatMap([('A',1, 1), ('B', 2, 2)]) * hv.Points([('A', 2), ('B', 1),  ('C', 3)])

screen shot 2017-01-29 at 2 12 05 pm

And:

screen shot 2017-01-29 at 2 04 30 pm

Note that webgl does not play well with this. There are now so many issues I've encountered with webgl that I would suggest disabling it by default.

Elements that can use categorical axes:

  • Bars
  • BoxWhisker
  • Points/Scatter
  • Curve
  • Text
  • ErrorBars

@philippjfr philippjfr changed the title Allow updating of Factor Ranges on HeatMap Allow consistent support for categorical axes in bokeh Jan 29, 2017
@philippjfr philippjfr changed the title Allow consistent support for categorical axes in bokeh Add consistent support for categorical axes in bokeh Jan 29, 2017
@jbednar
Copy link
Member

jbednar commented Jan 29, 2017

This looks great, thanks!

I second the vote for disabling webgl by default, until it becomes something fully supported and maintained and used consistently in Bokeh.

"""
Cleans the data before instantiating the datasource to handle
categorical axes correctly.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean by 'clean'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the best name I agree, something like _categorize_data would be more appropriate. Basically when overlaying on a categorical axis, column source data has to be converted to categories (which are simply the pretty printed representation of the value).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling it _categorize_data and explicitly mentioning the string conversion/pretty printing in the docstring would be clearer.

@jlstevens
Copy link
Contributor

jlstevens commented Jan 29, 2017

I second the vote for disabling webgl by default, until it becomes something fully supported and maintained and used consistently in Bokeh.

I also agree.

Being able to overlay points over a heatmap is something I've wanted in the past. Nice to know that it will soon be possible!

This PR looks like it touches some core methods in the Bokeh backend. From what I've seen, the code looks ok but I would probably recommend merging this PR ASAP so we can test it properly.

It might also be nice to add some notebook examples if we can find a suitable place to demonstrate this functionality. That might be tricky so how about adding some quick examples to contrib?

@philippjfr
Copy link
Member Author

It might be nice to add some notebook examples if we can find a suitable place to demonstrate this functionality. That might be tricky so how about adding some quick examples to contrib?

Some example in the bokeh backend notebook might make sense. There's still work to be done to support this fully in matplotlib so for now that's the right place.

@jlstevens
Copy link
Contributor

Some example in the bokeh backend notebook might make sense.

I'll leave it up to you whether the examples go there or in contrib. Either way, I think some notebook examples would be helpful.

@philippjfr
Copy link
Member Author

philippjfr commented Jan 29, 2017

Added a section to the notebook:

Categorical axes

A number of Elements will also support categorical (i.e. string types) as dimension values, these include HeatMap, Points, Scatter, Curve, ErrorBar and Text types.

Here we create a set of points indexed by ascending alphabetical x- and y-coordinates and values multiplying the integer index of each coordinate. We then overlay a HeatMap of the points with the points themselves enabling the hover tool for both and scaling the point size by the 'z' coordines.

%%opts Points [size_index='z' tools=['hover']] HeatMap [toolbar='above' tools=['hover']]
points = hv.Points([(chr(i+65), chr(j+65), i*j) for i in range(10) for j in range(10)], vdims=['z'])
hv.HeatMap(points) * points

screen shot 2017-01-29 at 10 08 23 pm

In the example above both axes are categorical because a HeatMap by definition represents 2D categorical coordinates (unlike Image and Raster types). Other Element types will automatically infer a categorical dimension if the coordinates along that dimension include string types.

Here we will generate random samples indexed by categories from 'A' to 'E' using the Scatter Element and overlay them. Secondly we compute the mean and standard deviation for each category and finally we overlay these two elements with a curve representing the mean value and a text element specifying the global mean. All these Elements respect the categorical index, providing us a view of the distribution of values in each category:

%%opts Overlay [show_legend=False height=400 width=600] ErrorBars (line_width=5) Scatter(alpha=0.2 size=6)

overlay = hv.NdOverlay({group: hv.Scatter(([group]*100, np.random.randn(100)*(i+1)+i))
                        for i, group in enumerate(['A', 'B', 'C', 'D', 'E'])})

errorbars = hv.ErrorBars([(k, el.reduce(function=np.mean), el.reduce(function=np.std))
                          for k, el in overlay.items()])

global_mean = hv.Text('A', 12, 'Global mean: %.3f' % overlay.dimension_values('y').mean())

errorbars * overlay * hv.Curve(errorbars) * global_mean

screen shot 2017-01-29 at 11 44 51 pm

@philippjfr
Copy link
Member Author

philippjfr commented Jan 30, 2017

@jlstevens, @jbednar Ready for final review. Would be very happy to see this go in. Not being able to update heatmaps with changing coordinates has annoyed me for a while, and making categoricals work more generally wasn't a major step from there. Unfortunately charts (Bars and BoxWhisker) still cannot be overlaid because of the way they are constructed. I could probably hack it, but really we should move to replace them at some point.

@jlstevens
Copy link
Contributor

The new notebook examples look good.

The functionality in this PR is very useful and unless @jbednar has any additional comments, I am happy to see it merged.

@jbednar
Copy link
Member

jbednar commented Jan 30, 2017

Happy to see it merged.

A number of Elements will also support categorical (i.e. string types)

So in HoloViews, string == categorical? Various dataframe packages support treating other discrete types (e.g. integers) as categoricals as well, and under the hood typically use integers to represent the possible enumerated values. But from the above (particularly the 5593 commit) it looks like numbers are fine as the category labels, so I'm a bit confused about this bit of documentation.

@philippjfr
Copy link
Member Author

philippjfr commented Jan 30, 2017

But from the above it looks like numbers are fine as the category labels, so I'm a bit confused about this bit of documentation.

A HeatMap (and Bars/BoxWhisker if they could be overlaid) will force categorical axes even if the categories are integer/float types. Other Elements only treat string types as categorical. In future we discussed that a Dimension parameter or preferences could let you declare a dimension as categorical explicitly but that will wait on decisions/actions in #843.

@jbednar
Copy link
Member

jbednar commented Jan 30, 2017

Makes sense. I'd vote for changing that bit of documentation so that it does not conflate string with categorical, if we can eventually distinguish those. I don't think we have to wait on #843, though, because whether a dimension is categorical seems very clearly semantic, to me, and not just a style option. So regardless of the outcome of #843, I would think this information belongs directly to the Dimension as a core property.

@philippjfr
Copy link
Member Author

Makes sense. I'd vote for changing that bit of documentation so that it does not conflate string with categorical, if we can eventually distinguish those

Sure, but what else should I say?

I don't think we have to wait on #843, though, because whether a dimension is categorical seems very clearly semantic, to me, and not just a style option.

Are you arguing I should add a categorical parameter to Dimension in this PR?

So regardless of the outcome of #843, I would think this information belongs directly to the Dimension as a core property.

I do agree, it is a semantic thing and therefore would be reasonable to have directly on Dimension.

@jbednar
Copy link
Member

jbednar commented Jan 30, 2017

You're the expert on what to say, but my guess would be what you said here in the chat:

A number of Elements will also support categorical types as dimension values, including HeatMap, Points, Scatter, Curve, ErrorBar and Text types. Currently, whether a type is treated as categorical is determined partly by the Element type (e.g. all key dimensions on HeatMap and XXX types are treated as categorical), though string dimensions are considered categorical by all Element types.

I'm not arguing for a categorical parameter as part of this PR, just arguing that it doesn't need to wait on the very-contentious #843 to be resolved.

@philippjfr
Copy link
Member Author

Okay, revised it.

I'm not arguing for a categorical parameter as part of this PR, just arguing that it doesn't need to wait on the very-contentious #843 to be resolved.

Okay, and yes I agree, although @jlstevens wants to push forward on #843 soon.

@jlstevens
Copy link
Contributor

jlstevens commented Jan 30, 2017

although @jlstevens wants to push forward on #843 soon.

That's correct. I'm working on a Dimension tutorial which will discuss aliases. Once that is done I want to tackle #843 (I don't think it is contentious anymore!) at which point I will update the tutorial.

I have no opinion on a categorical parameter at this time - Philipp is right in saying that I want #843 resolved before considering it.

@philippjfr
Copy link
Member Author

I don't think it is contentious anymore!

It is a little bit, personally I still don't see strong arguments for it, but your strong opinion probably outweighs our relative indifference.

@jlstevens
Copy link
Contributor

At any rate, I don't think #843 needs to hold up this PR from being merged.

@philippjfr
Copy link
Member Author

Yes, ready now, PR build passed.

@jlstevens
Copy link
Contributor

Merged!

@jlstevens jlstevens merged commit 7d4f0b7 into master Jan 30, 2017
@philippjfr philippjfr deleted the factor_range_update branch February 10, 2017 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants