Skip to content

Commit

Permalink
Fix #34 - don’t coerce to numbers.
Browse files Browse the repository at this point in the history
Histograms do not require numeric input in general; they merely require ordinal
input and ordinal thresholds for binning. Thus, we were being overly-strict by
coercing everything to numbers.

This also fixes a bug where the Freedman–Diaconis threshold sorted the input
array of values, rather than sorting a copy of the input.
  • Loading branch information
mbostock committed Jun 8, 2016
1 parent ebe6992 commit 96a1280
Show file tree
Hide file tree
Showing 4 changed files with 22 additions and 25 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,9 +300,9 @@ Computes the histogram for the given array of *data* samples. Returns an array o

<a name="histogram_value" href="#histogram_value">#</a> <i>histogram</i>.<b>value</b>([<i>value</i>])

If *value* is specified, sets the value accessor to the specified function or number and returns this histogram generator. If *value* is not specified, returns the current value accessor, which defaults to the identity function.
If *value* is specified, sets the value accessor to the specified function or constant and returns this histogram generator. If *value* is not specified, returns the current value accessor, which defaults to the identity function.

When a histogram is [generated](#_histogram), the value accessor will be invoked for each element in the input data array, being passed the element `d`, the index `i`, and the array `data` as three arguments. The default value accessor assumes that the input data are numbers, or that they are coercible to numbers using [valueOf](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/valueOf). If your data are not simply numbers, then you should specify an accessor that returns the corresponding numeric value for a given datum.
When a histogram is [generated](#_histogram), the value accessor will be invoked for each element in the input data array, being passed the element `d`, the index `i`, and the array `data` as three arguments. The default value accessor assumes that the input data are orderable (comparable), such as numbers or dates. If your data are not, then you should specify an accessor that returns the corresponding orderable value for a given datum.

This is similar to mapping your data to values before invoking the histogram generator, but has the benefit that the input data remains associated with the returned bins, thereby making it easier to access other fields of the data.

Expand All @@ -329,7 +329,7 @@ Note that the domain accessor is invoked on the materialized array of [values](#
<a name="histogram_thresholds" href="#histogram_thresholds">#</a> <i>histogram</i>.<b>thresholds</b>([<i>count</i>])
<br><a name="histogram_thresholds" href="#histogram_thresholds">#</a> <i>histogram</i>.<b>thresholds</b>([<i>thresholds</i>])

If *thresholds* is specified, sets the [threshold generator](#histogram-thresholds) to the specified function or array and returns this histogram generator. If *thresholds* is not specified, returns the current threshold generator, which by default implements [Sturges’ formula](#thresholdSturges). Thresholds are defined as an array of numbers [*x0*, *x1*, …]. Any value less than *x0* will be placed in the first bin; any value greater than or equal to *x0* but less than *x1* will be placed in the second bin; and so on. Thus, the [generated histogram](#_histogram) will have *thresholds*.length + 1 bins. See [histogram thresholds](#histogram-thresholds) for more information.
If *thresholds* is specified, sets the [threshold generator](#histogram-thresholds) to the specified function or array and returns this histogram generator. If *thresholds* is not specified, returns the current threshold generator, which by default implements [Sturges’ formula](#thresholdSturges). (Thus by default, the histogram values must be numbers!) Thresholds are defined as an array of values [*x0*, *x1*, …]. Any value less than *x0* will be placed in the first bin; any value greater than or equal to *x0* but less than *x1* will be placed in the second bin; and so on. Thus, the [generated histogram](#_histogram) will have *thresholds*.length + 1 bins. See [histogram thresholds](#histogram-thresholds) for more information.

If a *count* is specified instead of an array of *thresholds*, then the [domain](#histogram_domain) will be uniformly divided into approximately *count* bins; see [ticks](#ticks).

Expand All @@ -339,12 +339,12 @@ These functions are typically not used directly; instead, pass them to [*histogr

<a name="thresholdFreedmanDiaconis" href="#thresholdFreedmanDiaconis">#</a> d3.<b>thresholdFreedmanDiaconis</b>(<i>values</i>, <i>min</i>, <i>max</i>)

Returns the number of bins according to the [Freedman–Diaconis rule](https://en.wikipedia.org/wiki/Histogram#Mathematical_definition).
Returns the number of bins according to the [Freedman–Diaconis rule](https://en.wikipedia.org/wiki/Histogram#Mathematical_definition); the input *values* must be numbers.

<a name="thresholdScott" href="#thresholdScott">#</a> d3.<b>thresholdScott</b>(<i>values</i>, <i>min</i>, <i>max</i>)

Returns the number of bins according to [Scott’s normal reference rule](https://en.wikipedia.org/wiki/Histogram#Mathematical_definition).
Returns the number of bins according to [Scott’s normal reference rule](https://en.wikipedia.org/wiki/Histogram#Mathematical_definition); the input *values* must be numbers.

<a name="thresholdSturges" href="#thresholdSturges">#</a> d3.<b>thresholdSturges</b>(<i>values</i>)

Returns the number of bins according to [Sturges’ formula](https://en.wikipedia.org/wiki/Histogram#Mathematical_definition).
Returns the number of bins according to [Sturges’ formula](https://en.wikipedia.org/wiki/Histogram#Mathematical_definition); the input *values* must be numbers.
4 changes: 4 additions & 0 deletions src/array.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
var array = Array.prototype;

export var slice = array.slice;
export var map = array.map;
27 changes: 9 additions & 18 deletions src/histogram.js
Original file line number Diff line number Diff line change
@@ -1,14 +1,11 @@
import {slice} from "./array";
import bisect from "./bisect";
import constant from "./constant";
import extent from "./extent";
import identity from "./identity";
import ticks from "./ticks";
import sturges from "./threshold/sturges";

function number(x) {
return +x;
}

export default function() {
var value = identity,
domain = extent,
Expand All @@ -20,22 +17,20 @@ export default function() {
x,
values = new Array(n);

// Coerce values to numbers.
for (i = 0; i < n; ++i) {
values[i] = +value(data[i], i, data);
values[i] = value(data[i], i, data);
}

var xz = domain(values),
x0 = +xz[0],
x1 = +xz[1],
x0 = xz[0],
x1 = xz[1],
tz = threshold(values, x0, x1);

// Convert number of thresholds into uniform thresholds.
if (!Array.isArray(tz)) tz = ticks(x0, x1, +tz);
if (!Array.isArray(tz)) tz = ticks(x0, x1, tz);

// Coerce thresholds to numbers, ignoring any outside the domain.
// Remove any thresholds outside the domain.
var m = tz.length;
for (i = 0; i < m; ++i) tz[i] = +tz[i];
while (tz[0] <= x0) tz.shift(), --m;
while (tz[m - 1] >= x1) tz.pop(), --m;

Expand All @@ -61,19 +56,15 @@ export default function() {
}

histogram.value = function(_) {
return arguments.length ? (value = typeof _ === "function" ? _ : constant(+_), histogram) : value;
return arguments.length ? (value = typeof _ === "function" ? _ : constant(_), histogram) : value;
};

histogram.domain = function(_) {
return arguments.length ? (domain = typeof _ === "function" ? _ : constant([+_[0], +_[1]]), histogram) : domain;
return arguments.length ? (domain = typeof _ === "function" ? _ : constant([_[0], _[1]]), histogram) : domain;
};

histogram.thresholds = function(_) {
if (!arguments.length) return threshold;
threshold = typeof _ === "function" ? _
: Array.isArray(_) ? constant(Array.prototype.map.call(_, number))
: constant(+_);
return histogram;
return arguments.length ? (threshold = typeof _ === "function" ? _ : Array.isArray(_) ? constant(slice.call(_)) : constant(_), histogram) : threshold;
};

return histogram;
Expand Down
4 changes: 3 additions & 1 deletion src/threshold/freedmanDiaconis.js
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
import {map} from "../array";
import ascending from "../ascending";
import number from "../number";
import quantile from "../quantile";

export default function(values, min, max) {
values.sort(ascending);
values = map.call(values, number).sort(ascending);
return Math.ceil((max - min) / (2 * (quantile(values, 0.75) - quantile(values, 0.25)) * Math.pow(values.length, -1 / 3)));
}

0 comments on commit 96a1280

Please sign in to comment.