-
Notifications
You must be signed in to change notification settings - Fork 0
Planned Modules
Will Badart edited this page Sep 1, 2018
·
6 revisions
Below is a list of planned modules, in no particular order, with some general notes for each. A checked box indicates that the first draft of the module has been committed (not that it's fully complete!).
-
Classes
- Unified predictor interface (a la
sklearn
fit/ predict/ score), see docs
- Unified predictor interface (a la
-
Metrics
- general measures of model quality for classifiers and regressions
- include cluster quality measures?
- WIP: needs to be cleaned up
- parallel confusion matrix?
- k-fold cv
-
Naive Bayes
- still needs some work, but initial draft is committed
-
Clustering
- encoding for categorical variables?
-
Decision Tree
- information gain/ entropy measures
- selection of pruning techniques
- make it modular, like
kmeans
, to mix and match techniques - implement pruning methods
-
Ensemble Methods
- Ada boost
- random forest (better in tree module?)
-
SVM
- kernel tricks
-
KNN
-
Neural Net
-
Linear Regression
- gradient descent
-
Frequent Pattern Mining
- FP growth
- frequent sequence
-
Utilities
- data loading and IO, data generation
- parallel strategies & pipelining
In addition to implementing these modules, I also plan to create a more portable benchmark suite that encapsulates the generation or loading of data. The benchmarks will be organized in a way that will make comparison to analogous methods in other languages (e.g. Python) will be easy to make.