Skip to content

Commit

Permalink
Merge pull request #1074 from JDRomano2/nn
Browse files Browse the repository at this point in the history
Update TPOT-NN documentation
  • Loading branch information
weixuanfu authored May 29, 2020
2 parents 64fa0c5 + 0ee3eaf commit a9f4b21
Show file tree
Hide file tree
Showing 3 changed files with 42 additions and 8 deletions.
20 changes: 20 additions & 0 deletions docs_sources/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,3 +174,23 @@ The corresponding Jupyter notebook, containing the associated data preprocessing

## MAGIC Gamma Telescope
The corresponding Jupyter notebook, containing the associated data preprocessing and analysis, can be found [here](https://github.com/EpistasisLab/tpot/blob/master/tutorials/MAGIC%20Gamma%20Telescope/MAGIC%20Gamma%20Telescope.ipynb).

## Neural network classifier using TPOT-NN
By loading the <a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/config/classifier_nn.py">TPOT-NN configuration dictionary</a>, PyTorch estimators will be included for classification. Users can also create their own NN configuration dictionary that includes `tpot.builtins.PytorchLRClassifier` and/or `tpot.builtins.PytorchMLPClassifier`, or they can specify them using a template string, as shown in the following example:

```Python
from tpot import TPOTClassifier
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split

X, y = make_blobs(n_samples=100, centers=2, n_features=3, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, test_size=0.25)

clf = TPOTClassifier(config_dict='TPOT NN', template='Selector-Transformer-PytorchLRClassifier',
verbosity=2, population_size=10, generations=10)
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
tpot.export('tpot_nn_demo_pipeline.py')
```

This example is somewhat trivial, but it should result in nearly 100% classification accuracy.
10 changes: 6 additions & 4 deletions docs_sources/installing.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,18 @@ TPOT is built on top of several existing Python libraries, including:

* [joblib](https://joblib.readthedocs.io/en/latest/)

* [PyTorch](https://pytorch.org/)

Most of the necessary Python packages can be installed via the [Anaconda Python distribution](https://www.continuum.io/downloads), which we strongly recommend that you use. We also strongly recommend that you use of Python 3 over Python 2 if you're given the choice.

You can install TPOP using `pip` or `conda-forge`.
You can install TPOT using `pip` or `conda-forge`.

## pip

NumPy, SciPy, scikit-learn, pandas and joblib can be installed in Anaconda via the command:
NumPy, SciPy, scikit-learn, pandas, joblib, and PyTorch can be installed in Anaconda via the command:

```Shell
conda install numpy scipy scikit-learn pandas joblib
conda install numpy scipy scikit-learn pandas joblib pytorch
```

DEAP, update_checker, tqdm and stopit can be installed with `pip` via the command:
Expand Down Expand Up @@ -73,7 +75,7 @@ conda install -c conda-forge tpot
To install additional dependencies you can use:

```Shell
conda install -c conda-forge tpot xgboost dask dask-ml scikit-mdr skrebate
conda install -c conda-forge tpot xgboost dask dask-ml scikit-mdr skrebate pytorch
```

## Installation problems
Expand Down
20 changes: 16 additions & 4 deletions docs_sources/using.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ which means that roughly 100,000 models are fit and evaluated on the training da
That's a time-consuming procedure, even for simpler models like decision trees.

Typical TPOT runs will take hours to days to finish (unless it's a small dataset), but you can always interrupt
the run partway through and see the best results so far. TPOT also [provides](/api/) a `warm_start` parameter that
the run partway through and see the best results so far. TPOT also [provides](/tpot/api/) a `warm_start` parameter that
lets you restart a TPOT run from where it left off.

<h5>AutoML algorithms can recommend different solutions for the same dataset</h5>
Expand Down Expand Up @@ -61,7 +61,7 @@ pipeline_optimizer = TPOTClassifier()
```

It's also possible to use TPOT for regression problems with the `TPOTRegressor` class. Other than the class name,
a `TPOTRegressor` is used the same way as a `TPOTClassifier`. You can read more about the `TPOTClassifier` and `TPOTRegressor` classes in the [API documentation](/api/).
a `TPOTRegressor` is used the same way as a `TPOTClassifier`. You can read more about the `TPOTClassifier` and `TPOTRegressor` classes in the [API documentation](/tpot/api/).

Some example code with custom TPOT parameters might look like:

Expand Down Expand Up @@ -111,7 +111,7 @@ print(pipeline_optimizer.score(X_test, y_test))
pipeline_optimizer.export('tpot_exported_pipeline.py')
```

Check our [examples](examples/) to see TPOT applied to some specific data sets.
Check our [examples](/tpot/examples/) to see TPOT applied to some specific data sets.

# TPOT on the command line

Expand Down Expand Up @@ -447,6 +447,14 @@ This configuration works for both the TPOTClassifier and TPOTRegressor.</td>
<a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/config/regressor_sparse.py">Regression</a></td>
</tr>

<tr>
<td>TPOT-NN</td>
<td>TPOT uses the same configuration as "Default TPOT" plus additional neural network estimators written in PyTorch (currently only `tpot.builtins.PytorchLRClassifier` and `tpot.builtins.PytorchMLPClassifier`).
<br /><br />
Currently only classification is supported, but future releases will include regression estimators.</td>
<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/config/classifier_nn.py">Classification</a></td>
</tr>

</table>

To use any of these configurations, simply pass the string name of the configuration to the `config_dict` parameter (or `-config` on the command line). For example, to use the "TPOT light" configuration:
Expand Down Expand Up @@ -673,7 +681,7 @@ See [dask's distributed joblib integration](https://distributed.readthedocs.io/e

Support for neural network models and deep learning is an experimental feature newly added to TPOT. Available neural network architectures are provided by the `tpot.nn` module. Unlike regular `sklearn` estimators, these models need to be written by hand, and must also inherit the appropriate base classes provided by `sklearn` for all of their built-in modules. In other words, they need implement methods like `.fit()`, `fit_transform()`, `get_params()`, etc., as described in detail on [Developing scikit-learn estimators](https://scikit-learn.org/stable/developers/develop.html).

## Telling TPOT to use `tpot.nn`
## Telling TPOT to use built-in PyTorch neural network models

Mainly due to the issues described below, TPOT won't use its neural network models unless you explicitly tell it to do so. This is done as follows:

Expand All @@ -689,8 +697,12 @@ tpot_config = {
}
```

- Alternatively, use a template string including `PytorchLRClassifier` or `PytorchMLPClassifier` while loading the TPOT-NN configuration dictionary.

Neural network models are notorious for being extremely sensitive to their initialization parameters, so you may need to heavily adjust `tpot.nn` configuration dictionaries in order to attain good performance on your dataset.

A simple example of using TPOT-NN is shown in [examples](/tpot/examples/).

## Important caveats

- Neural network models (especially when they reach moderately large sizes) take a notoriously large amount of time and computing power to train. You should expect `tpot.nn` neural networks to train several orders of magnitude slower than their `sklearn` alternatives. This can be alleviated somewhat by training the models on computers with CUDA-enabled GPUs.
Expand Down

0 comments on commit a9f4b21

Please sign in to comment.