Skip to content

Commit

Permalink
Merge pull request #953 from EpistasisLab/development
Browse files Browse the repository at this point in the history
Version 0.11.0 release
  • Loading branch information
weixuanfu authored Nov 5, 2019
2 parents 815b0e2 + 8b71687 commit e473d73
Show file tree
Hide file tree
Showing 63 changed files with 2,033 additions and 3,947 deletions.
10 changes: 3 additions & 7 deletions .appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,7 @@ environment:
matrix:
- PYTHON_VERSION: 3.7
MINICONDA: C:/Miniconda36-x64
DASK_ML_VERSION: 0.13.0
- PYTHON_VERSION: 2.7
MINICONDA: C:/Miniconda-x64
DASK_ML_VERSION: 0.12.0
DASK_ML_VERSION: 1.0.0

platform:
- x64
Expand All @@ -21,10 +18,9 @@ install:
- conda config --set always_yes yes --set changeps1 no
- conda update -q conda
- conda info -a
- conda create -q -n test-environment python=%PYTHON_VERSION% numpy scipy scikit-learn nose cython pandas pywin32 joblib
- conda create -q -n test-environment python=%PYTHON_VERSION% numpy scipy scikit-learn nose cython pandas joblib
- activate test-environment
- pip install deap tqdm update_checker stopit dask[delayed] cloudpickle==0.5.6
- pip install dask_ml==%DASK_ML_VERSION%
- pip install deap tqdm update_checker stopit xgboost dask[delayed] dask[dataframe] cloudpickle==0.5.6 fsspec>=0.3.3 dask_ml==%DASK_ML_VERSION%


test_script:
Expand Down
11 changes: 3 additions & 8 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,19 @@ matrix:
include:
- name: "Python 3.7 on Xenial Linux"
dist: xenial # required for Python >= 3.7
env: PYTHON_VERSION="3.7" DASK_ML_VERSION="0.13.0"
env: PYTHON_VERSION="3.7" DASK_ML_VERSION="1.0.0"
before_install:
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
- name: "Python 3.7 on Xenial Linux with coverage"
dist: xenial # required for Python >= 3.7
env: PYTHON_VERSION="3.7" COVERAGE="true" DASK_ML_VERSION="0.13.0"
before_install:
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
- name: "Python 2.7 on Xenial Linux"
dist: xenial
env: PYTHON_VERSION="2.7" DASK_ML_VERSION="0.12.0"
env: PYTHON_VERSION="3.7" COVERAGE="true" DASK_ML_VERSION="1.0.0"
before_install:
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
- name: "Python 3.7 on macOS"
os: osx
osx_image: xcode10.2 # Python 3.7.2 running on macOS 10.14.3
language: shell # 'language: python' is an error on Travis CI macOS
env: PYTHON_VERSION="3.7" DASK_ML_VERSION="0.13.0"
env: PYTHON_VERSION="3.7" DASK_ML_VERSION="1.0.0"
before_install:
- wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh
install: source ./ci/.travis_install.sh
Expand Down
85 changes: 58 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,15 @@ Development status: [![Development Build Status - Mac/Linux](https://travis-ci.o
[![Development Build Status - Windows](https://ci.appveyor.com/api/projects/status/b7bmpwpkjhifrm7v/branch/development?svg=true)](https://ci.appveyor.com/project/weixuanfu/tpot?branch=development)
[![Development Coverage Status](https://coveralls.io/repos/github/EpistasisLab/tpot/badge.svg?branch=development)](https://coveralls.io/github/EpistasisLab/tpot?branch=development)

Package information: [![Python 2.7](https://img.shields.io/badge/python-2.7-blue.svg)](https://www.python.org/download/releases/2.7/)
[![Python 3.7](https://img.shields.io/badge/python-3.7-blue.svg)](https://www.python.org/downloads/release/python-370/)
Package information: [![Python 3.7](https://img.shields.io/badge/python-3.7-blue.svg)](https://www.python.org/downloads/release/python-370/)
[![License: LGPL v3](https://img.shields.io/badge/license-LGPL%20v3-blue.svg)](http://www.gnu.org/licenses/lgpl-3.0)
[![PyPI version](https://badge.fury.io/py/TPOT.svg)](https://badge.fury.io/py/TPOT)

<p align="center">
<img src="https://raw.githubusercontent.com/EpistasisLab/tpot/master/images/tpot-logo.jpg" width=300 />
</p>

Consider TPOT your **Data Science Assistant**. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
**TPOT** stands for **T**ree-based **P**ipeline **O**ptimization **T**ool. Consider TPOT your **Data Science Assistant**. TPOT is a Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

![TPOT Demo](https://github.com/EpistasisLab/tpot/blob/master/images/tpot-demo.gif "TPOT Demo")

Expand Down Expand Up @@ -55,7 +54,7 @@ Click on the corresponding links to find more information on TPOT usage in the d

### Classification

Below is a minimal working example with the practice MNIST data set.
Below is a minimal working example with the the optical recognition of handwritten digits dataset.

```python
from tpot import TPOTClassifier
Expand All @@ -64,32 +63,43 @@ from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
train_size=0.75, test_size=0.25)
train_size=0.75, test_size=0.25, random_state=42)

tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2)
tpot = TPOTClassifier(generations=5, population_size=50, verbosity=2, random_state=42)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_mnist_pipeline.py')
tpot.export('tpot_digits_pipeline.py')
```

Running this code should discover a pipeline that achieves about 98% testing accuracy, and the corresponding Python code should be exported to the `tpot_mnist_pipeline.py` file and look similar to the following:
Running this code should discover a pipeline that achieves about 98% testing accuracy, and the corresponding Python code should be exported to the `tpot_digits_pipeline.py` file and look similar to the following:

```python
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import make_pipeline, make_union
from sklearn.preprocessing import PolynomialFeatures
from tpot.builtins import StackingEstimator
from tpot.export_utils import set_param_recursive

# NOTE: Make sure that the class is labeled 'target' in the data file
# NOTE: Make sure that the outcome column is labeled 'target' in the data file
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64)
features = tpot_data.drop('target', axis=1).values
features = tpot_data.drop('target', axis=1)
training_features, testing_features, training_target, testing_target = \
train_test_split(features, tpot_data['target'].values, random_state=None)


exported_pipeline = KNeighborsClassifier(n_neighbors=6, weights="distance")

exported_pipeline.fit(training_features, training_classes)
train_test_split(features, tpot_data['target'], random_state=42)

# Average CV score on the training set was: 0.9799428471757372
exported_pipeline = make_pipeline(
PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
StackingEstimator(estimator=LogisticRegression(C=0.1, dual=False, penalty="l1")),
RandomForestClassifier(bootstrap=True, criterion="entropy", max_features=0.35000000000000003, min_samples_leaf=20, min_samples_split=19, n_estimators=100)
)
# Fix random state for all the steps in exported pipeline
set_param_recursive(exported_pipeline.steps, 'random_state', 42)

exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)
```

Expand All @@ -104,9 +114,9 @@ from sklearn.model_selection import train_test_split

housing = load_boston()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target,
train_size=0.75, test_size=0.25)
train_size=0.75, test_size=0.25, random_state=42)

tpot = TPOTRegressor(generations=5, population_size=20, verbosity=2)
tpot = TPOTRegressor(generations=5, population_size=50, verbosity=2, random_state=42)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_boston_pipeline.py')
Expand All @@ -117,20 +127,27 @@ which should result in a pipeline that achieves about 12.77 mean squared error (
```python
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures
from tpot.export_utils import set_param_recursive

# NOTE: Make sure that the class is labeled 'target' in the data file
# NOTE: Make sure that the outcome column is labeled 'target' in the data file
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR', dtype=np.float64)
features = tpot_data.drop('target', axis=1).values
features = tpot_data.drop('target', axis=1)
training_features, testing_features, training_target, testing_target = \
train_test_split(features, tpot_data['target'].values, random_state=None)
train_test_split(features, tpot_data['target'], random_state=42)

exported_pipeline = GradientBoostingRegressor(alpha=0.85, learning_rate=0.1, loss="ls",
max_features=0.9, min_samples_leaf=5,
min_samples_split=6)
# Average CV score on the training set was: -10.812040755234403
exported_pipeline = make_pipeline(
PolynomialFeatures(degree=2, include_bias=False, interaction_only=False),
ExtraTreesRegressor(bootstrap=False, max_features=0.5, min_samples_leaf=2, min_samples_split=3, n_estimators=100)
)
# Fix random state for all the steps in exported pipeline
set_param_recursive(exported_pipeline.steps, 'random_state', 42)

exported_pipeline.fit(training_features, training_classes)
exported_pipeline.fit(training_features, training_target)
results = exported_pipeline.predict(testing_features)
```

Expand All @@ -150,6 +167,20 @@ Please [check the existing open and closed issues](https://github.com/EpistasisL

If you use TPOT in a scientific publication, please consider citing at least one of the following papers:

Trang T. Le, Weixuan Fu and Jason H. Moore (2019). [Scaling tree-based automated machine learning to biomedical big data with a feature set selector](https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz470/5511404). *Bioinformatics*. 2019 Jun 4.

BibTeX entry:

```bibtex
@article{le2019scaling,
title={Scaling tree-based automated machine learning to biomedical big data with a feature set selector.},
author={Le, TT and Fu, W and Moore, JH},
journal={Bioinformatics (Oxford, England)},
year={2019}
}
```


Randal S. Olson, Ryan J. Urbanowicz, Peter C. Andrews, Nicole A. Lavender, La Creis Kidd, and Jason H. Moore (2016). [Automating biomedical data science through tree-based pipeline optimization](http://link.springer.com/chapter/10.1007/978-3-319-31204-0_9). *Applications of Evolutionary Computation*, pages 123-137.

BibTeX entry:
Expand Down
3 changes: 1 addition & 2 deletions ci/.travis_install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,7 @@ conda create -n testenv --yes python=$PYTHON_VERSION pip nose \
source activate testenv

pip install deap tqdm update_checker stopit \
dask[delayed] xgboost cloudpickle==0.5.6
pip install dask_ml==$DASK_ML_VERSION
dask[delayed] dask[dataframe] xgboost cloudpickle==0.5.6 dask_ml==$DASK_ML_VERSION fsspec>=0.3.3

if [[ "$COVERAGE" == "true" ]]; then
pip install coverage coveralls
Expand Down
24 changes: 12 additions & 12 deletions docs/404.html
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,11 @@

<link rel="stylesheet" href="/tpot/css/theme.css" type="text/css" />
<link rel="stylesheet" href="/tpot/css/theme_extra.css" type="text/css" />
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
<link rel="stylesheet" href="/tpot/css/highlight.css">

<script src="/tpot/js/jquery-2.1.1.min.js" defer></script>
<script src="/tpot/js/modernizr-2.8.3.min.js" defer></script>
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
<script src="/tpot/js/jquery-2.1.1.min.js"></script>
<script src="/tpot/js/modernizr-2.8.3.min.js"></script>
<script type="text/javascript" src="/tpot/js/highlight.pack.js"></script>

</head>

Expand All @@ -29,10 +28,10 @@

<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
<div class="wy-side-nav-search">
<a href="/tpot/." class="icon icon-home"> TPOT</a>
<a href="/tpot/" class="icon icon-home"> TPOT</a>
<div role="search">
<form id ="rtd-search-form" class="wy-form" action="/tpot/search.html" method="get">
<input type="text" name="q" placeholder="Search docs" title="Type search term here" />
<input type="text" name="q" placeholder="Search docs" />
</form>
</div>
</div>
Expand All @@ -43,7 +42,7 @@

<li class="toctree-l1">

<a class="" href="/tpot/.">Home</a>
<a class="" href="/tpot/">Home</a>
</li>

<li class="toctree-l1">
Expand Down Expand Up @@ -101,15 +100,15 @@

<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="/tpot/.">TPOT</a>
<a href="/tpot/">TPOT</a>
</nav>


<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="/tpot/.">Docs</a> &raquo;</li>
<li><a href="/tpot/">Docs</a> &raquo;</li>


<li class="wy-breadcrumbs-aside">
Expand Down Expand Up @@ -161,8 +160,9 @@ <h1 id="404-page-not-found">404</h1>
</span>
</div>
<script>var base_url = '/tpot';</script>
<script src="/tpot/js/theme.js" defer></script>
<script src="/tpot/search/main.js" defer></script>
<script src="/tpot/js/theme.js"></script>
<script src="/tpot/search/require.js"></script>
<script src="/tpot/search/search.js"></script>

</body>
</html>
Loading

0 comments on commit e473d73

Please sign in to comment.