Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom kernel density estimation #239

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Conversation

FriesischScott
Copy link
Owner

@FriesischScott FriesischScott commented Dec 17, 2024

This gets rid of the KernelDensity dependency. The main annoyance with KernelDensity was the FFTW dependency which in turn pulled in MKL.

The method used to select the bandwidth for the KDE is taken from S. J. Sheather and M. C. Jones, ‘A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation’, Journal of the Royal Statistical Society: Series B (Methodological), vol. 53, no. 3, pp. 683–690, 1991, doi: 10.1111/j.2517-6161.1991.tb01857.x.

Along with the PDF I also changed the CDF and inverse CDF. Before we used the native Julia ecdf function for the CDF and interpolated the quantile from the available data and ecdf values, but I'm not a fan because it's not a smooth function. Now I've implemented numerical integration of the actual underlying PDF using QuadKG for the CDF and the inverse through root finding via Root. Two new dependencies, but they are pretty lightweight so I think its fine. The biggest benefit here is, that now the CDF and quantile are actual inverses so quantile(cdf(x) == x which didn't really hold before.

We'll have see if we need more efficient approaches in the future, but the accuracy is of the charts in comparison to the previous implementation. I've also added docs, docstrings and more stable tests.

This gets rid of the KernelDensity dependency. It works, but can probly
be a bit more efficient.
Copy link

codecov bot commented Dec 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.42%. Comparing base (fa6c842) to head (0ca4b24).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #239      +/-   ##
==========================================
+ Coverage   94.32%   94.42%   +0.09%     
==========================================
  Files          35       36       +1     
  Lines        1568     1596      +28     
==========================================
+ Hits         1479     1507      +28     
  Misses         89       89              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

In comparison to the previous approach this give completely smooth pdf,
cdf, and quantile curves. It is a bit inefficient and might be improved
with interpolation in the future.
@FriesischScott FriesischScott marked this pull request as ready for review January 2, 2025 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant