Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve binning of histograms #91

Open
adamperer opened this issue Nov 28, 2022 · 2 comments
Open

Improve binning of histograms #91

adamperer opened this issue Nov 28, 2022 · 2 comments
Labels
feature new feature

Comments

@adamperer
Copy link
Member

Consider re-implementing Vega's binning strategy: https://github.com/vega/vega/blob/72b9b3bbf912212e7879b6acaccc84aff969ef1c/packages/vega-statistics/src/bin.js#L23

@adamperer adamperer added the feature new feature label Nov 28, 2022
@paddymul
Copy link

Is there a writeup of that binning style? I found this. https://vega.github.io/vega-lite/docs/bin.html

When I implemented binning for my project, I made an extra bin for the 1st and 99th percentile, capturing most outliers and making a fatter higher resolution middle. Ideally I would include that first and last percentile in the first and last regular bins and communicate the change via mousover for Bin sizes. the 1/99 will only ever have 2% of total values and will never have a high bar.

@willeppy
Copy link
Member

So I dont know the name of the algorithm they use in vega, right now in AutoProfiler we use equal width bins. This issue was about doing something smarter to pick the number of bins. Right now we do bins = min(unique values, 20) iirc which is very simple and vega seemingly has a better approach depending on cardinality or range of data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature new feature
Projects
None yet
Development

No branches or pull requests

3 participants