Skip to content

Commit

Permalink
new docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Weixuan committed May 29, 2020
1 parent a9f4b21 commit 74cc3f1
Show file tree
Hide file tree
Showing 7 changed files with 97 additions and 19 deletions.
20 changes: 20 additions & 0 deletions docs/examples/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,9 @@
<li class="toctree-l2"><a href="#magic-gamma-telescope">MAGIC Gamma Telescope</a></li>


<li class="toctree-l2"><a href="#neural-network-classifier-using-tpot-nn">Neural network classifier using TPOT-NN</a></li>


</ul>
</li>

Expand Down Expand Up @@ -357,6 +360,23 @@ <h2 id="portuguese-bank-marketing">Portuguese Bank Marketing</h2>
<p>The corresponding Jupyter notebook, containing the associated data preprocessing and analysis, can be found <a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/Portuguese%20Bank%20Marketing/Portuguese%20Bank%20Marketing%20Stratergy.ipynb">here</a>.</p>
<h2 id="magic-gamma-telescope">MAGIC Gamma Telescope</h2>
<p>The corresponding Jupyter notebook, containing the associated data preprocessing and analysis, can be found <a href="https://github.com/EpistasisLab/tpot/blob/master/tutorials/MAGIC%20Gamma%20Telescope/MAGIC%20Gamma%20Telescope.ipynb">here</a>.</p>
<h2 id="neural-network-classifier-using-tpot-nn">Neural network classifier using TPOT-NN</h2>
<p>By loading the <a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/config/classifier_nn.py">TPOT-NN configuration dictionary</a>, PyTorch estimators will be included for classification. Users can also create their own NN configuration dictionary that includes <code>tpot.builtins.PytorchLRClassifier</code> and/or <code>tpot.builtins.PytorchMLPClassifier</code>, or they can specify them using a template string, as shown in the following example:</p>
<pre><code class="Python">from tpot import TPOTClassifier
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split

X, y = make_blobs(n_samples=100, centers=2, n_features=3, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, test_size=0.25)

clf = TPOTClassifier(config_dict='TPOT NN', template='Selector-Transformer-PytorchLRClassifier',
verbosity=2, population_size=10, generations=10)
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))
tpot.export('tpot_nn_demo_pipeline.py')
</code></pre>

<p>This example is somewhat trivial, but it should result in nearly 100% classification accuracy.</p>

</div>
</div>
Expand Down
2 changes: 1 addition & 1 deletion docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -213,5 +213,5 @@

<!--
MkDocs version : 1.0
Build Date UTC : 2020-05-18 16:42:43
Build Date UTC : 2020-05-29 15:57:34
-->
11 changes: 7 additions & 4 deletions docs/installing/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -174,12 +174,15 @@
<li>
<p><a href="https://joblib.readthedocs.io/en/latest/">joblib</a></p>
</li>
<li>
<p><a href="https://pytorch.org/">PyTorch</a></p>
</li>
</ul>
<p>Most of the necessary Python packages can be installed via the <a href="https://www.continuum.io/downloads">Anaconda Python distribution</a>, which we strongly recommend that you use. We also strongly recommend that you use of Python 3 over Python 2 if you're given the choice.</p>
<p>You can install TPOP using <code>pip</code> or <code>conda-forge</code>.</p>
<p>You can install TPOT using <code>pip</code> or <code>conda-forge</code>.</p>
<h2 id="pip">pip</h2>
<p>NumPy, SciPy, scikit-learn, pandas and joblib can be installed in Anaconda via the command:</p>
<pre><code class="Shell">conda install numpy scipy scikit-learn pandas joblib
<p>NumPy, SciPy, scikit-learn, pandas, joblib, and PyTorch can be installed in Anaconda via the command:</p>
<pre><code class="Shell">conda install numpy scipy scikit-learn pandas joblib pytorch
</code></pre>

<p>DEAP, update_checker, tqdm and stopit can be installed with <code>pip</code> via the command:</p>
Expand Down Expand Up @@ -209,7 +212,7 @@ <h2 id="conda-forge">conda-forge</h2>
</code></pre>

<p>To install additional dependencies you can use:</p>
<pre><code class="Shell">conda install -c conda-forge tpot xgboost dask dask-ml scikit-mdr skrebate
<pre><code class="Shell">conda install -c conda-forge tpot xgboost dask dask-ml scikit-mdr skrebate pytorch
</code></pre>

<h2 id="installation-problems">Installation problems</h2>
Expand Down
2 changes: 1 addition & 1 deletion docs/search/search_index.json

Large diffs are not rendered by default.

20 changes: 10 additions & 10 deletions docs/sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,52 +2,52 @@
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://epistasislab.github.io/tpot/</loc>
<lastmod>2020-05-18</lastmod>
<lastmod>2020-05-29</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>http://epistasislab.github.io/tpot/installing/</loc>
<lastmod>2020-05-18</lastmod>
<lastmod>2020-05-29</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>http://epistasislab.github.io/tpot/using/</loc>
<lastmod>2020-05-18</lastmod>
<lastmod>2020-05-29</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>http://epistasislab.github.io/tpot/api/</loc>
<lastmod>2020-05-18</lastmod>
<lastmod>2020-05-29</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>http://epistasislab.github.io/tpot/examples/</loc>
<lastmod>2020-05-18</lastmod>
<lastmod>2020-05-29</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>http://epistasislab.github.io/tpot/contributing/</loc>
<lastmod>2020-05-18</lastmod>
<lastmod>2020-05-29</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>http://epistasislab.github.io/tpot/releases/</loc>
<lastmod>2020-05-18</lastmod>
<lastmod>2020-05-29</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>http://epistasislab.github.io/tpot/citing/</loc>
<lastmod>2020-05-18</lastmod>
<lastmod>2020-05-29</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>http://epistasislab.github.io/tpot/support/</loc>
<lastmod>2020-05-18</lastmod>
<lastmod>2020-05-29</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>http://epistasislab.github.io/tpot/related/</loc>
<lastmod>2020-05-18</lastmod>
<lastmod>2020-05-29</lastmod>
<changefreq>daily</changefreq>
</url>
</urlset>
Binary file modified docs/sitemap.xml.gz
Binary file not shown.
61 changes: 58 additions & 3 deletions docs/using/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,17 @@
<li class="toctree-l2"><a href="#parallel-training-with-dask">Parallel Training with Dask</a></li>


<li class="toctree-l2"><a href="#neural-networks-in-tpot-tpotnn">Neural Networks in TPOT (tpot.nn)</a></li>

<ul>

<li><a class="toctree-l3" href="#telling-tpot-to-use-built-in-pytorch-neural-network-models">Telling TPOT to use built-in PyTorch neural network models</a></li>

<li><a class="toctree-l3" href="#important-caveats">Important caveats</a></li>

</ul>


</ul>
</li>

Expand Down Expand Up @@ -193,7 +204,7 @@ <h5>AutoML algorithms can take a long time to finish their search</h5>
which means that roughly 100,000 models are fit and evaluated on the training data in one grid search.
That's a time-consuming procedure, even for simpler models like decision trees.</p>
<p>Typical TPOT runs will take hours to days to finish (unless it's a small dataset), but you can always interrupt
the run partway through and see the best results so far. TPOT also <a href="/api/">provides</a> a <code>warm_start</code> parameter that
the run partway through and see the best results so far. TPOT also <a href="/tpot/api/">provides</a> a <code>warm_start</code> parameter that
lets you restart a TPOT run from where it left off.</p>
<h5>AutoML algorithms can recommend different solutions for the same dataset</h5>

Expand All @@ -217,7 +228,7 @@ <h1 id="tpot-with-code">TPOT with code</h1>
</code></pre>

<p>It's also possible to use TPOT for regression problems with the <code>TPOTRegressor</code> class. Other than the class name,
a <code>TPOTRegressor</code> is used the same way as a <code>TPOTClassifier</code>. You can read more about the <code>TPOTClassifier</code> and <code>TPOTRegressor</code> classes in the <a href="/api/">API documentation</a>.</p>
a <code>TPOTRegressor</code> is used the same way as a <code>TPOTClassifier</code>. You can read more about the <code>TPOTClassifier</code> and <code>TPOTRegressor</code> classes in the <a href="/tpot/api/">API documentation</a>.</p>
<p>Some example code with custom TPOT parameters might look like:</p>
<pre><code class="Python">pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5,
random_state=42, verbosity=2)
Expand Down Expand Up @@ -254,7 +265,7 @@ <h1 id="tpot-with-code">TPOT with code</h1>
pipeline_optimizer.export('tpot_exported_pipeline.py')
</code></pre>

<p>Check our <a href="examples/">examples</a> to see TPOT applied to some specific data sets.</p>
<p>Check our <a href="/tpot/examples/">examples</a> to see TPOT applied to some specific data sets.</p>
<h1 id="tpot-on-the-command-line">TPOT on the command line</h1>
<p>To use TPOT via the command line, enter the following command with a path to the data file:</p>
<pre><code class="Shell">tpot /path_to/data_file.csv
Expand Down Expand Up @@ -581,6 +592,14 @@ <h1 id="built-in-tpot-configurations">Built-in TPOT configurations</h1>
<a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/config/regressor_sparse.py">Regression</a></td>
</tr>

<tr>
<td>TPOT-NN</td>
<td>TPOT uses the same configuration as "Default TPOT" plus additional neural network estimators written in PyTorch (currently only `tpot.builtins.PytorchLRClassifier` and `tpot.builtins.PytorchMLPClassifier`).
<br /><br />
Currently only classification is supported, but future releases will include regression estimators.</td>
<td align="center"><a href="https://github.com/EpistasisLab/tpot/blob/master/tpot/config/classifier_nn.py">Classification</a></td>
</tr>

</table>

<p>To use any of these configurations, simply pass the string name of the configuration to the <code>config_dict</code> parameter (or <code>-config</code> on the command line). For example, to use the "TPOT light" configuration:</p>
Expand Down Expand Up @@ -760,6 +779,42 @@ <h1 id="parallel-training-with-dask">Parallel Training with Dask</h1>
</code></pre>

<p>See <a href="https://distributed.readthedocs.io/en/latest/joblib.html">dask's distributed joblib integration</a> for more.</p>
<h1 id="neural-networks-in-tpot-tpotnn">Neural Networks in TPOT (<code>tpot.nn</code>)</h1>
<p>Support for neural network models and deep learning is an experimental feature newly added to TPOT. Available neural network architectures are provided by the <code>tpot.nn</code> module. Unlike regular <code>sklearn</code> estimators, these models need to be written by hand, and must also inherit the appropriate base classes provided by <code>sklearn</code> for all of their built-in modules. In other words, they need implement methods like <code>.fit()</code>, <code>fit_transform()</code>, <code>get_params()</code>, etc., as described in detail on <a href="https://scikit-learn.org/stable/developers/develop.html">Developing scikit-learn estimators</a>.</p>
<h2 id="telling-tpot-to-use-built-in-pytorch-neural-network-models">Telling TPOT to use built-in PyTorch neural network models</h2>
<p>Mainly due to the issues described below, TPOT won't use its neural network models unless you explicitly tell it to do so. This is done as follows:</p>
<ul>
<li>
<p>Use <code>import tpot.nn</code> before instantiating any TPOT estimators.</p>
</li>
<li>
<p>Use a configuration dictionary that includes one or more <code>tpot.nn</code> estimators, either by writing one manually, including one from a file, or by importing the configuration in <code>tpot/config/classifier_nn.py</code>. A very simple example that will force TPOT to only use a PyTorch-based logistic regression classifier as its main estimator is as follows:</p>
</li>
</ul>
<pre><code class="python">tpot_config = {
'tpot.nn.PytorchLRClassifier': {
'learning_rate': [1e-3, 1e-2, 1e-1, 0.5, 1.]
}
}
</code></pre>

<ul>
<li>Alternatively, use a template string including <code>PytorchLRClassifier</code> or <code>PytorchMLPClassifier</code> while loading the TPOT-NN configuration dictionary.</li>
</ul>
<p>Neural network models are notorious for being extremely sensitive to their initialization parameters, so you may need to heavily adjust <code>tpot.nn</code> configuration dictionaries in order to attain good performance on your dataset.</p>
<p>A simple example of using TPOT-NN is shown in <a href="/tpot/examples/">examples</a>.</p>
<h2 id="important-caveats">Important caveats</h2>
<ul>
<li>
<p>Neural network models (especially when they reach moderately large sizes) take a notoriously large amount of time and computing power to train. You should expect <code>tpot.nn</code> neural networks to train several orders of magnitude slower than their <code>sklearn</code> alternatives. This can be alleviated somewhat by training the models on computers with CUDA-enabled GPUs.</p>
</li>
<li>
<p>TPOT will occasionally learn pipelines that stack several <code>sklearn</code> estimators. Mathematically, these can be nearly identical to some deep learning models. For example, by stacking several <code>sklearn.linear_model.LogisticRegression</code>s, you end up with a very close approximation of a Multilayer Perceptron; one of the simplest and most well known deep learning architectures. TPOT's genetic programming algorithms generally optimize these 'networks' much faster than PyTorch, which typically uses a more brute-force convex optimization approach.</p>
</li>
<li>
<p>The problem of 'black box' model introspection is one of the most substantial criticisms and challenges of deep learning. This problem persists in <code>tpot.nn</code>, whereas TPOT's default estimators often are far easier to introspect.</p>
</li>
</ul>

</div>
</div>
Expand Down

0 comments on commit 74cc3f1

Please sign in to comment.