Merge pull request #1074 from JDRomano2/nn

Update TPOT-NN documentation
EpistasisLab · May 29, 2020 · a9f4b21 · a9f4b21
2 parents 64fa0c5 + 0ee3eaf
commit a9f4b21
Show file tree

Hide file tree

Showing 3 changed files with 42 additions and 8 deletions.
diff --git a/docs_sources/examples.md b/docs_sources/examples.md
@@ -174,3 +174,23 @@ The corresponding Jupyter notebook, containing the associated data preprocessing
 
 ## MAGIC Gamma Telescope
 The corresponding Jupyter notebook, containing the associated data preprocessing and analysis, can be found [here](https://github.com/EpistasisLab/tpot/blob/master/tutorials/MAGIC%20Gamma%20Telescope/MAGIC%20Gamma%20Telescope.ipynb).
+
+## Neural network classifier using TPOT-NN
+By loading the <a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/config/classifier_nn.py">TPOT-NN configuration dictionary</a>, PyTorch estimators will be included for classification. Users can also create their own NN configuration dictionary that includes `tpot.builtins.PytorchLRClassifier` and/or `tpot.builtins.PytorchMLPClassifier`, or they can specify them using a template string, as shown in the following example:
+
+```Python
+from tpot import TPOTClassifier
+from sklearn.datasets import make_blobs
+from sklearn.model_selection import train_test_split
+
+X, y = make_blobs(n_samples=100, centers=2, n_features=3, random_state=42)
+X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, test_size=0.25)
+
+clf = TPOTClassifier(config_dict='TPOT NN', template='Selector-Transformer-PytorchLRClassifier', 
+                     verbosity=2, population_size=10, generations=10)
+clf.fit(X_train, y_train)
+print(clf.score(X_test, y_test))
+tpot.export('tpot_nn_demo_pipeline.py')
+```
+
+This example is somewhat trivial, but it should result in nearly 100% classification accuracy.
diff --git a/docs_sources/installing.md b/docs_sources/installing.md
@@ -18,16 +18,18 @@ TPOT is built on top of several existing Python libraries, including:
 
 * [joblib](https://joblib.readthedocs.io/en/latest/)
 
+* [PyTorch](https://pytorch.org/)
+
 Most of the necessary Python packages can be installed via the [Anaconda Python distribution](https://www.continuum.io/downloads), which we strongly recommend that you use. We also strongly recommend that you use of Python 3 over Python 2 if you're given the choice.
 
-You can install TPOP using `pip` or `conda-forge`.
+You can install TPOT using `pip` or `conda-forge`.
 
 ## pip
 
-NumPy, SciPy, scikit-learn, pandas and joblib can be installed in Anaconda via the command:
+NumPy, SciPy, scikit-learn, pandas, joblib, and PyTorch can be installed in Anaconda via the command:
 
 ```Shell
-conda install numpy scipy scikit-learn pandas joblib
+conda install numpy scipy scikit-learn pandas joblib pytorch
 ```
 
 DEAP, update_checker, tqdm and stopit can be installed with `pip` via the command:
@@ -73,7 +75,7 @@ conda install -c conda-forge tpot
 To install additional dependencies you can use:
 
 ```Shell
-conda install -c conda-forge tpot xgboost dask dask-ml scikit-mdr skrebate
+conda install -c conda-forge tpot xgboost dask dask-ml scikit-mdr skrebate pytorch
 ```
 
 ## Installation problems

diff --git a/docs_sources/using.md b/docs_sources/using.md
@@ -27,7 +27,7 @@ which means that roughly 100,000 models are fit and evaluated on the training da
 That's a time-consuming procedure, even for simpler models like decision trees.
 
 Typical TPOT runs will take hours to days to finish (unless it's a small dataset), but you can always interrupt
-the run partway through and see the best results so far. TPOT also [provides](/api/) a `warm_start` parameter that
+the run partway through and see the best results so far. TPOT also [provides](/tpot/api/) a `warm_start` parameter that
 lets you restart a TPOT run from where it left off.
 
 <h5>AutoML algorithms can recommend different solutions for the same dataset</h5>
@@ -61,7 +61,7 @@ pipeline_optimizer = TPOTClassifier()
 ```
 
 It's also possible to use TPOT for regression problems with the `TPOTRegressor` class. Other than the class name,
-a `TPOTRegressor` is used the same way as a `TPOTClassifier`. You can read more about the `TPOTClassifier` and `TPOTRegressor` classes in the [API documentation](/api/).
+a `TPOTRegressor` is used the same way as a `TPOTClassifier`. You can read more about the `TPOTClassifier` and `TPOTRegressor` classes in the [API documentation](/tpot/api/).
 
 Some example code with custom TPOT parameters might look like:
 
@@ -111,7 +111,7 @@ print(pipeline_optimizer.score(X_test, y_test))
 pipeline_optimizer.export('tpot_exported_pipeline.py')
 ```
 
-Check our [examples](examples/) to see TPOT applied to some specific data sets.
+Check our [examples](/tpot/examples/) to see TPOT applied to some specific data sets.
 
 # TPOT on the command line
 
@@ -447,6 +447,14 @@ This configuration works for both the TPOTClassifier and TPOTRegressor.</td>
 <a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/config/regressor_sparse.py">Regression</a></td>
 </tr>
 
+<tr>
+<td>TPOT-NN</td>
+<td>TPOT uses the same configuration as "Default TPOT" plus additional neural network estimators written in PyTorch (currently only `tpot.builtins.PytorchLRClassifier` and `tpot.builtins.PytorchMLPClassifier`).
+<br /><br />
+Currently only classification is supported, but future releases will include regression estimators.</td>
+<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/config/classifier_nn.py">Classification</a></td>
+</tr>
+
 </table>
 
 To use any of these configurations, simply pass the string name of the configuration to the `config_dict` parameter (or `-config` on the command line). For example, to use the "TPOT light" configuration:
@@ -673,7 +681,7 @@ See [dask's distributed joblib integration](https://distributed.readthedocs.io/e
 
 Support for neural network models and deep learning is an experimental feature newly added to TPOT. Available neural network architectures are provided by the `tpot.nn` module. Unlike regular `sklearn` estimators, these models need to be written by hand, and must also inherit the appropriate base classes provided by `sklearn` for all of their built-in modules. In other words, they need implement methods like `.fit()`, `fit_transform()`, `get_params()`, etc., as described in detail on [Developing scikit-learn estimators](https://scikit-learn.org/stable/developers/develop.html).
 
-## Telling TPOT to use `tpot.nn`
+## Telling TPOT to use built-in PyTorch neural network models
 
 Mainly due to the issues described below, TPOT won't use its neural network models unless you explicitly tell it to do so. This is done as follows:
 
@@ -689,8 +697,12 @@ tpot_config = {
 }
 ```
 
+- Alternatively, use a template string including `PytorchLRClassifier` or `PytorchMLPClassifier` while loading the TPOT-NN configuration dictionary.
+
 Neural network models are notorious for being extremely sensitive to their initialization parameters, so you may need to heavily adjust `tpot.nn` configuration dictionaries in order to attain good performance on your dataset.
 
+A simple example of using TPOT-NN is shown in [examples](/tpot/examples/).
+
 ## Important caveats
 
 - Neural network models (especially when they reach moderately large sizes) take a notoriously large amount of time and computing power to train. You should expect `tpot.nn` neural networks to train several orders of magnitude slower than their `sklearn` alternatives. This can be alleviated somewhat by training the models on computers with CUDA-enabled GPUs.