Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples of using other classifiers #1

Open
ceholden opened this issue Dec 3, 2015 · 0 comments
Open

Add examples of using other classifiers #1

ceholden opened this issue Dec 3, 2015 · 0 comments

Comments

@ceholden
Copy link
Owner

ceholden commented Dec 3, 2015

The tutorial uses RandomForest because that's what I know the best and use most frequently among the wide variety of machine learning classification methods, but there's probably situations when other methods are preferable or more accurate.

For example, the classification could be performed using Support Vector Machines (SVM) or Artificial Neural Networks (ANN). I personally don't have the time to update the lessons to include other classification methods right now, but I'll use this ticket to include some preliminary work for Python and R below:

Python

The tutorial currently uses the RandomForest implementation in the scikit-learn Python package. Thanks to scikit-learn's excellent unified interface for statistical estimators, the process of running something like SVM or ANN is rather straightforward.

All scikit-learn supervised classification estimators implement a few common methods, including fit and predict. Thus, one could easily replace the code for RandomForest

from sklearn.ensemble import RandomForestClassifier
# Initialize our model with 500 trees
rf = RandomForestClassifier(n_estimators=500, oob_score=True)
# Fit our model to training data
rf = rf.fit(X, y)
# Now predict for each pixel
class_prediction = rf.predict(img_as_array)

with the same code for scikit-learn's Support Vector Classifier.

SVM

For example, here is an example of how easy it is to run scikit-learn's Support Vector Classifier (see this excellent user guide for SVM):

from sklearn import svm
clf = svm.SVC()
# Fit our model to training data
rf = rf.fit(X, y)
# Now predict for each pixel
class_prediction = rf.predict(img_as_array)

Only the code that performs the estimator initialization changes! This clear and consistent API makes it easily one of the best machine learning toolkits around.

It is worth noting, however, that the various preprocessing methods (rescaling, standardization, etc.) required by one machine learning method may differ when using another estimator. SVMs are, for example, not scale invariant so standardizing your input data is highly recommended. See the Tips for practical use for SVM for more suggestions.

ANN

As of December 3rd, 2015, the stable release of scikit-learn does not have a supervised ANN implementation. They do, however, have an implementation of the Bernoulli Restricted Boltzmann Machine (RBM). See the Unsupervised ANN User Guide page for more information.

The development version of scikit-learn (on track for version 0.18) added a Multi-Layer Perceptron (MLP) supervised classifier that could be utilized in the same supervised classification workflow as RandomForest or SVC. Please see the Supervised ANN User Guide page for more information.

R

Being more of a Pythonista, I'm not quite sure what packages are useful for SVM or ANN in R!

I welcome any code contributions or tips for where to look in the comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant