Add custom kernel density estimation #239

FriesischScott · 2024-12-17T13:01:44Z

This gets rid of the KernelDensity dependency. The main annoyance with KernelDensity was the FFTW dependency which in turn pulled in MKL.

The method used to select the bandwidth for the KDE is taken from S. J. Sheather and M. C. Jones, ‘A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation’, Journal of the Royal Statistical Society: Series B (Methodological), vol. 53, no. 3, pp. 683–690, 1991, doi: 10.1111/j.2517-6161.1991.tb01857.x.

Along with the PDF I also changed the CDF and inverse CDF. Before we used the native Julia ecdf function for the CDF and interpolated the quantile from the available data and ecdf values, but I'm not a fan because it's not a smooth function. Now I've implemented numerical integration of the actual underlying PDF using QuadKG for the CDF and the inverse through root finding via Root. Two new dependencies, but they are pretty lightweight so I think its fine. The biggest benefit here is, that now the CDF and quantile are actual inverses so quantile(cdf(x) == x which didn't really hold before.

We'll have see if we need more efficient approaches in the future, but the accuracy is of the charts in comparison to the previous implementation. I've also added docs, docstrings and more stable tests.

This gets rid of the KernelDensity dependency. It works, but can probly be a bit more efficient.

codecov · 2024-12-27T11:14:33Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.42%. Comparing base (fa6c842) to head (0ca4b24).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #239      +/-   ##
==========================================
+ Coverage   94.32%   94.42%   +0.09%     
==========================================
  Files          35       36       +1     
  Lines        1568     1596      +28     
==========================================
+ Hits         1479     1507      +28     
  Misses         89       89

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

In comparison to the previous approach this give completely smooth pdf, cdf, and quantile curves. It is a bit inefficient and might be improved with interpolation in the future.

FriesischScott added 2 commits December 17, 2024 13:58

Initial commit of custom KDE

05f7fc1

This gets rid of the KernelDensity dependency. It works, but can probly be a bit more efficient.

Update test for new empirical distribution

df5cbba

FriesischScott added 4 commits December 27, 2024 17:13

Emperical distribution with maximum accuracy

c944e46

In comparison to the previous approach this give completely smooth pdf, cdf, and quantile curves. It is a bit inefficient and might be improved with interpolation in the future.

Add more tests

a10fe60

Remove unused arguments

583f1ed

Add documentation

0ca4b24

FriesischScott marked this pull request as ready for review January 2, 2025 07:58

FriesischScott requested a review from AnderGray January 2, 2025 08:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add custom kernel density estimation #239

Add custom kernel density estimation #239

FriesischScott commented Dec 17, 2024 •

edited

Loading

codecov bot commented Dec 27, 2024 •

edited

Loading

Add custom kernel density estimation #239

Are you sure you want to change the base?

Add custom kernel density estimation #239

Conversation

FriesischScott commented Dec 17, 2024 • edited Loading

codecov bot commented Dec 27, 2024 • edited Loading

Codecov Report

FriesischScott commented Dec 17, 2024 •

edited

Loading

codecov bot commented Dec 27, 2024 •

edited

Loading