Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to histogram #1784

Closed
wants to merge 33 commits into from

Conversation

timothydmorton
Copy link

Two updates to the histogram operation:

@jlstevens
Copy link
Contributor

The proposed changes sound good to me!

Two quick comments: it would be good if #1783 is fixed first so we don't have to update display tests (but improve the linking behavior). Secondly, I don't think 'height' prefix conveys much inheight_normed as I don't think it makes much sense to normalize across the kdim. I would prefer a simple name such as normalize which I think is self-explanatory enough.

@philippjfr
Copy link
Member

it would be good if #1783 is fixed first so we don't have to update display tests

I think that is a much more substantial amount of work and should not hold up this PR.

@jlstevens
Copy link
Contributor

I think that is a much more substantial amount of work and should not hold up this PR.

That is a good point. I'm just not sure about changing the display and then changing it back later.

@jlstevens
Copy link
Contributor

Maybe we can add an option to the histogram operation to control labelling with a plan on how it can solve our problems now with a plan for how we can deprecate it (or maybe it will still be useful?) once #1783 is merged?

@timothydmorton
Copy link
Author

@jlstevens I don't know about just calling it "normalize" because there's already a "normed" parameter that normalizes the total integral of the histogram. That seems like it could be confusing. But I'll leave the naming up to you all.

@jlstevens
Copy link
Contributor

That is a good point.

The problem I have looking at the docstrings for normed and height_normed is that I am now confused as to the difference and what each really means. That suggests better names and docstrings should be possible.

@timothydmorton
Copy link
Author

Does this clarify the difference? Attached is a screenshot of a height_normed histogram overlay. If this were just normed, then the 'star' distribution would be way taller than the galaxy distribution; this way, we can see both of them.
screenshot 2017-08-04 11 23 27

@jlstevens
Copy link
Contributor

Thanks... that does clarify things!

I am tempted to say there should be one normalization parameter with options such as None (or False), 'integral', 'max-height' as these are all different normalization options.

@philippjfr What do you think? Could we do this without breaking backwards compatibility?

@timothydmorton
Copy link
Author

I think people are used to passing normed=True to np.histogram, so it's probably not a good idea to change that behavior; maybe we could change our parameter to not having to be a bool, to accommodate other options?

* Fixed problems in rst links

* Used pandoc to convert CHANGELOG to Markdown

* Added CHANGELOG entry for 1.8.2
@jlstevens
Copy link
Contributor

jlstevens commented Aug 4, 2017

Right. If we make normed an ObjectSelector it could be one of True, False, 'integral' and 'max-height'. We don't technically need 'integral' as an option but I do like an explicit alternative to True (which doesn't really say anything about what the normalization really is).

@timothydmorton
Copy link
Author

timothydmorton commented Aug 4, 2017

Looks like actually normed is deprecated in favor of density in numpy, so I'll correct for that too.

@timothydmorton
Copy link
Author

I think this does what we want now; I've just changed it so there's an option to pass 'height' or 'integral' to 'normed' in addition to a bool.

@jlstevens
Copy link
Contributor

Looks good! Though there seems to be failures in the unit tests. E.g:

======================================================================
ERROR: test_points_histogram_mean_weighted (tests.testoperation.OperationTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/ioam/holoviews/tests/testoperation.py", line 101, in test_points_histogram_mean_weighted
    op_hist = histogram(points, num_bins=3, weight_dimension='y', mean_weighted=True)
  File "/home/travis/miniconda/envs/test-environment/lib/python2.7/site-packages/param/parameterized.py", line 1888, in __new__
    return inst.__call__(*args,**params)
  File "/home/travis/build/ioam/holoviews/holoviews/core/operation.py", line 141, in __call__
    processed = self._process(element)
  File "/home/travis/build/ioam/holoviews/holoviews/operation/element.py", line 576, in _process
    label=view.label, **params)
TypeError: ParameterizedMetaclass object got multiple values for keyword argument 'vdims'

@philippjfr
Copy link
Member

I'd be happy to merge this as soon as that unit test has been resolved. I think it occurs when a weight_dimension is passed in which case vdims is passed twice.

@timothydmorton
Copy link
Author

OK, this should be good now... I corrected the bug and updated the tests to reflect the new dimension-naming scheme.

@philippjfr
Copy link
Member

Updated the test data and restarted the build, hopefully it will pass now and I can merge.

@jlstevens
Copy link
Contributor

Updated the test data

Are we going to keep 'X Frequency' labels then? Or go back to simply 'Frequency' in 2.0 once axis linking can be done properly? I'm happy with the new normed option but I think that maybe the label change could be optional and not on by default...

* Fixed installation page link in introduction

* Added link to reference gallery in the introduction

* Fixed remaining links in introduction guide

* Fixed typo in tabular datasets getting started guide
@timothydmorton
Copy link
Author

I'd be happy with an option to pass a custom name for the frequency dimension. And then passing custom names would be the solution to the linking issue until the underlying issue is fixed.

@philippjfr
Copy link
Member

Are we going to keep 'X Frequency' labels then?

I'd say so yes, the histogram operation is most frequently used for adjoined histograms where the y-label is hidden anyway. I think in most other cases it's still a sensible label and we can decide to revert once the normalization system supports independent normalization of x- and y-axes.

I'd also like to remove a workaround for the gridmatrix operation I had to introduce due to this issue. If you don't mind could you remove the .opts declaration on line 775 of holoviews/operation/element.py, i.e.:

opts = dict(axiswise=True, framewise=True)
el = p.diagonal_operation(element, dimension=d1.name,
                   bin_range=bin_range).opts(norm=opts)

should just become:

el = p.diagonal_operation(element, dimension=d1.name, bin_range=bin_range)

Still working on the fix for the other issue I mentioned.

@jlstevens
Copy link
Contributor

How about a label_format parameter that could be {dim} Frequency by default but could be simply 'Frequency' to go back to the old behavior?

I imagine people might have other words they might want to use for the value dimension other than 'frequency' so such a formatter might be useful regardless (instead of hard coding 'Frequency' into the dimension name).

@philippjfr
Copy link
Member

Okay, I'm going to make a branch of this PR and make some of the fixes to get the tests passing.

@timothydmorton
Copy link
Author

timothydmorton commented Aug 18, 2017 via email

@philippjfr
Copy link
Member

No problem at all, really it's not your mess to clean up, since it's a pre-existing bug (that's revealed by your changes) causing the test failures.

@philippjfr
Copy link
Member

philippjfr commented Aug 31, 2017

Okay, now that JupyterCon is over I can finally look at this. I'll make a copy of this branch and then reopen a PR.

@philippjfr
Copy link
Member

Going to close this PR in favor of #1836, your commits will be preserved though @timothydmorton.

@philippjfr philippjfr closed this Aug 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants