Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve automatic bin determination for histograms #217

Closed
dorisjlee opened this issue Jan 11, 2021 · 1 comment
Closed

Improve automatic bin determination for histograms #217

dorisjlee opened this issue Jan 11, 2021 · 1 comment
Assignees
Labels
easy Easy to fix; Good issues for newcomers enhancement New feature or request

Comments

@dorisjlee
Copy link
Member

dorisjlee commented Jan 11, 2021

Currently, the formula for histogram binning sometimes results in bins that are very "skinny" and sometimes bins that are very "wide". We need to improve histogram bin width and size determination to ensure more accurate histograms are plotted.
This is especially true for the "Filter" action.

Example:

df = pd.read_csv("https://github.com/lux-org/lux-datasets/blob/master/data/olympic.csv?raw=True")
df.intent=["Height"]
df

image

image

This needs to be customized for matplotlib and Altair.

@dorisjlee dorisjlee added enhancement New feature or request easy Easy to fix; Good issues for newcomers labels Jan 11, 2021
micahtyong added a commit to micahtyong/lux that referenced this issue Feb 22, 2021
@micahtyong micahtyong self-assigned this Feb 22, 2021
dorisjlee pushed a commit that referenced this issue Mar 3, 2021
…d step attributes (#285)

* Merge upstream

* Sync with master

* Fix bin size variance from #217

* Format and test

* Change labelOverlap to True

Co-authored-by: Dominik Moritz <domoritz@gmail.com>

* Modify markbar; currently questioning whether or not it's needed

* Remove markbar enitrely, rely on Altair automatic bin detection https://altair-viz.github.io/user_guide/generated/core/altair.BinParams.html

* Modify code snippet

* Revert "Remove markbar enitrely, rely on Altair automatic bin detection https://altair-viz.github.io/user_guide/generated/core/altair.BinParams.html"

This reverts commit 9cb9418.

* Implement bin size estimation via Freedman Diaconis's Rule

* Use numpy to compute IQR for better performance (pandas too slow)

* Add tests

* Add test cases for histogram binning

* Address changes from @domoritz review (small optimizations)

* Black and format

* Move histogram bin width computation to pandas executor (execute_binning)

* Center bars between ticks in distribution setting

* Renaming in execute_binning

* Bin width computed accurately in execute_binning; no need for get_bin_size()

* Revert to Freedman rule; maintain correct ticks

Co-authored-by: Micah Yong <micahyong@Micahs-MacBook-Pro.local>
Co-authored-by: Dominik Moritz <domoritz@gmail.com>
@micahtyong
Copy link
Member

Closed via #285.

dorisjlee pushed a commit that referenced this issue Mar 15, 2021
* Merge upstream

* Sync with master

* Fix bin size variance from #217

* Format and test

* Change labelOverlap to True

Co-authored-by: Dominik Moritz <domoritz@gmail.com>

* Modify markbar; currently questioning whether or not it's needed

* Remove markbar enitrely, rely on Altair automatic bin detection https://altair-viz.github.io/user_guide/generated/core/altair.BinParams.html

* Modify code snippet

* Revert "Remove markbar enitrely, rely on Altair automatic bin detection https://altair-viz.github.io/user_guide/generated/core/altair.BinParams.html"

This reverts commit 9cb9418.

* Implement bin size estimation via Freedman Diaconis's Rule

* Use numpy to compute IQR for better performance (pandas too slow)

* Add tests

* Add test cases for histogram binning

* Address changes from @domoritz review (small optimizations)

* Black and format

* Move histogram bin width computation to pandas executor (execute_binning)

* Center bars between ticks in distribution setting

* Renaming in execute_binning

* Bin width computed accurately in execute_binning; no need for get_bin_size()

* Revert to Freedman rule; maintain correct ticks

* Sync exported code with new histogram bin determination rules

* Sync exported code with new histogram bin determination rules

* Modify histogram code test case

Co-authored-by: Micah Yong <micahyong@Micahs-MacBook-Pro.local>
Co-authored-by: Dominik Moritz <domoritz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easy Easy to fix; Good issues for newcomers enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants