Skip to content

Planned Modules

Will Badart edited this page Sep 1, 2018 · 6 revisions

Below is a list of planned modules, in no particular order, with some general notes for each. A checked box indicates that the first draft of the module has been committed (not that it's fully complete!).

  • Classes

    • Unified predictor interface (a la sklearn fit/ predict/ score), see docs
  • Metrics

    • general measures of model quality for classifiers and regressions
    • include cluster quality measures?
    • WIP: needs to be cleaned up
    • parallel confusion matrix?
    • k-fold cv
  • Naive Bayes

    • still needs some work, but initial draft is committed
  • Clustering

    • encoding for categorical variables?
  • Decision Tree

    • information gain/ entropy measures
    • selection of pruning techniques
    • make it modular, like kmeans, to mix and match techniques
    • implement pruning methods
  • Ensemble Methods

    • Ada boost
    • random forest (better in tree module?)
  • SVM

    • kernel tricks
  • KNN

  • Neural Net

  • Linear Regression

    • gradient descent
  • Frequent Pattern Mining

    • FP growth
    • frequent sequence
  • Utilities

    • data loading and IO, data generation
    • parallel strategies & pipelining

In addition to implementing these modules, I also plan to create a more portable benchmark suite that encapsulates the generation or loading of data. The benchmarks will be organized in a way that will make comparison to analogous methods in other languages (e.g. Python) will be easy to make.

Clone this wiki locally