Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(types): fix histogram bin allocation (#9711)
## Description of changes ``` import ibis ibis.options.interactive = True ibis.options.repr.interactive.max_rows = 20 t = ibis.range(1000).unnest().name("index").as_table() t.select(hist=t["index"].histogram(nbins=10)).value_counts() ``` ``` ┏━━━━━━━┳━━━━━━━━━━━━┓ ┃ hist ┃ hist_count ┃ ┡━━━━━━━╇━━━━━━━━━━━━┩ │ int64 │ int64 │ ├───────┼────────────┤ │ 5 │ 100 │ │ 9 │ 100 │ │ 0 │ 100 │ │ 3 │ 100 │ │ 6 │ 100 │ │ 2 │ 100 │ │ 7 │ 100 │ │ 8 │ 100 │ │ 1 │ 100 │ │ 4 │ 100 │ └───────┴────────────┘ ``` ## Issues closed * Resolves #9687. I had to make a slight change to ``histogram`` to account for an edge case that was tested for Impala. It would fail if ``nbins`` was not passed, which is a rather niche use case because ``np.histogram`` for example requires the number of bins to be passed either explicitly or implicitly. I also found a slight quirk with the current design when fixing this because if a ``base`` is passed that is not the minimum value, it would assign those out-of-bound values smaller than the base a negative bin index. It now clips those out-of-bound values to the bin index of -1 to group them together, rather than potentially having bin indices of -1 and -2 for example, so this now aligns with how ``np.histogram`` assigns a bin index of 0 for out-of-bound values.
- Loading branch information