Add examples of using other classifiers #1

ceholden · 2015-12-03T19:37:38Z

The tutorial uses RandomForest because that's what I know the best and use most frequently among the wide variety of machine learning classification methods, but there's probably situations when other methods are preferable or more accurate.

For example, the classification could be performed using Support Vector Machines (SVM) or Artificial Neural Networks (ANN). I personally don't have the time to update the lessons to include other classification methods right now, but I'll use this ticket to include some preliminary work for Python and R below:

Python

The tutorial currently uses the RandomForest implementation in the scikit-learn Python package. Thanks to scikit-learn's excellent unified interface for statistical estimators, the process of running something like SVM or ANN is rather straightforward.

All scikit-learn supervised classification estimators implement a few common methods, including fit and predict. Thus, one could easily replace the code for RandomForest

from sklearn.ensemble import RandomForestClassifier
# Initialize our model with 500 trees
rf = RandomForestClassifier(n_estimators=500, oob_score=True)
# Fit our model to training data
rf = rf.fit(X, y)
# Now predict for each pixel
class_prediction = rf.predict(img_as_array)

with the same code for scikit-learn's Support Vector Classifier.

SVM

For example, here is an example of how easy it is to run scikit-learn's Support Vector Classifier (see this excellent user guide for SVM):

from sklearn import svm
clf = svm.SVC()
# Fit our model to training data
rf = rf.fit(X, y)
# Now predict for each pixel
class_prediction = rf.predict(img_as_array)

Only the code that performs the estimator initialization changes! This clear and consistent API makes it easily one of the best machine learning toolkits around.

It is worth noting, however, that the various preprocessing methods (rescaling, standardization, etc.) required by one machine learning method may differ when using another estimator. SVMs are, for example, not scale invariant so standardizing your input data is highly recommended. See the Tips for practical use for SVM for more suggestions.

ANN

As of December 3rd, 2015, the stable release of scikit-learn does not have a supervised ANN implementation. They do, however, have an implementation of the Bernoulli Restricted Boltzmann Machine (RBM). See the Unsupervised ANN User Guide page for more information.

The development version of scikit-learn (on track for version 0.18) added a Multi-Layer Perceptron (MLP) supervised classifier that could be utilized in the same supervised classification workflow as RandomForest or SVC. Please see the Supervised ANN User Guide page for more information.

R

Being more of a Pythonista, I'm not quite sure what packages are useful for SVM or ANN in R!

I welcome any code contributions or tips for where to look in the comments.

The text was updated successfully, but these errors were encountered:

ceholden added the help wanted label Dec 3, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add examples of using other classifiers #1

Add examples of using other classifiers #1

ceholden commented Dec 3, 2015

Add examples of using other classifiers #1

Add examples of using other classifiers #1

Comments

ceholden commented Dec 3, 2015

Python

SVM

ANN

R