Add custom kernel density estimation #239
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This gets rid of the KernelDensity dependency. The main annoyance with KernelDensity was the FFTW dependency which in turn pulled in MKL.
The method used to select the bandwidth for the KDE is taken from S. J. Sheather and M. C. Jones, ‘A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation’, Journal of the Royal Statistical Society: Series B (Methodological), vol. 53, no. 3, pp. 683–690, 1991, doi: 10.1111/j.2517-6161.1991.tb01857.x.
Along with the PDF I also changed the CDF and inverse CDF. Before we used the native Julia
ecdf
function for the CDF and interpolated thequantile
from the available data andecdf
values, but I'm not a fan because it's not a smooth function. Now I've implemented numerical integration of the actual underlying PDF usingQuadKG
for the CDF and the inverse through root finding via Root. Two new dependencies, but they are pretty lightweight so I think its fine. The biggest benefit here is, that now the CDF and quantile are actual inverses soquantile(cdf(x) == x
which didn't really hold before.We'll have see if we need more efficient approaches in the future, but the accuracy is of the charts in comparison to the previous implementation. I've also added docs, docstrings and more stable tests.