-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Histogram type and hist methods #61
Conversation
👍 |
alternatively, instead of |
I like |
Okay, I've changed the usage to Unless there are any objections, I'll merge this in tonight. |
Histogram type and hist methods
I've been thinking about this, and while I'm sympathetic to the idea of standardizing on |
It seems to me that there are two related ideas – a histogram is just counting the items in bins, whereas you can also estimate what portion of a distribution falls into each bin. For the latter, |
Binning and histograms aren't really the same thing: a histogram decides the height of a box based on both the box's width and the probability mass in the region defined by the box's width, whereas counting items in bins ignores the width of the bin. In most conventional histograms, this isn't important because all of the bins are chosen to have the same width, but in general the two concepts are distinct. True binning is much closer to what the As for the use of |
If it makes more sense to treat histogram construction as a form of non-parametric model fitting – albeit a very simple one – then I think that using |
Well, I'm not totally sure we're making the right decision. But if we end up using |
I think we should add the |
I'm not 100% sure on this either, but I think it's worth trying to see how it goes. My rough idea is that I originally did plan to define I'm not really sure this is a pattern we should follow, as it doesn't really sit well with the rest of julia. One option worth considering is keeping
|
Yeah, let's not do that. It's really absurd this is a random side-effect of computing a histogram, especially since we don't have just one standard graphics package. Each package can have plot methods that apply to Histogram objects, however. Asking the programmer to write |
Still think that I agree with @StefanKarpinski that functions for computation should not be entangled with those that do plotting. |
If |
Recently I had an argument with myself in a github thread on this in the context of |
Histogram, in essence, is statistics of some sort (not a model). Whereas I am fine with the |
To follow up on this, I've also thought of an alternative approach which combines histograms and contingency tables: see #32 |
New histogram functionality: it creates a new type
Histogram
, and works for arbitrary dimensions. It has been proposed to move this here, and deprecate the currenthist
function in base (see JuliaLang/julia#6601).Some decisions:
hist(x::Matrix)
? should this besize(x,2)
-dimensional histogram? If so, should we cap it at some dimension (say 5), so people don't accidentally call it on a 100x100 matrix?Histogram
type be mutable: the one advantage of this is that it would adaptive resizing when appending additional elements (in particular for streaming data).it would be nice to incorporate weighted vectors.