Skip to content

Commit

Permalink
Merge pull request #258 from rhiever/development
Browse files Browse the repository at this point in the history
0.6 release
  • Loading branch information
rhiever authored Sep 2, 2016
2 parents 589a020 + 78e7197 commit 322bee5
Show file tree
Hide file tree
Showing 58 changed files with 2,226 additions and 1,011 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,5 @@ docs/sources/examples/.Rhistory
.idea

analyze-oj2-tpot-mdr.ipynb

tpot-mdr-demo.ipynb
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,15 +53,15 @@ Click on the corresponding links to find more information on TPOT usage in the d
Below is a minimal working example with the practice MNIST data set.

```python
from tpot import TPOT
from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.cross_validation import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
train_size=0.75, test_size=0.25)

tpot = TPOT(generations=5, population_size=20, verbosity=2)
tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_mnist_pipeline.py')
Expand Down
5 changes: 4 additions & 1 deletion ci/.travis_install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,17 @@ else
conda create -n testenv --yes python=$PYTHON_VERSION pip nose \
numpy=$NUMPY_VERSION scipy=$SCIPY_VERSION \
scikit-learn=$SKLEARN_VERSION \
cython
cython
fi

source activate testenv

if [[ "$LATEST" == "true" ]]; then
pip install deap
pip install xgboost
else
pip install deap==$DEAP_VERSION
pip install xgboost==$XGBOOST_VERSION
fi

pip install update_checker
Expand All @@ -62,6 +64,7 @@ python -c "import numpy; print('numpy %s' % numpy.__version__)"
python -c "import scipy; print('scipy %s' % scipy.__version__)"
python -c "import sklearn; print('sklearn %s' % sklearn.__version__)"
python -c "import deap; print('deap %s' % deap.__version__)"
python -c "import xgboost; print('xgboost %s ' % xgboost.__version__)"
python -c "import update_checker; print('update_checker %s' % update_checker.__version__)"
python -c "import tqdm; print('tqdm %s' % tqdm.__version__)"
python setup.py build_ext --inplace
1 change: 1 addition & 0 deletions ci/.travis_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ python -c "import numpy; print('numpy %s' % numpy.__version__)"
python -c "import scipy; print('scipy %s' % scipy.__version__)"
python -c "import sklearn; print('sklearn %s' % sklearn.__version__)"
python -c "import deap; print('deap %s' % deap.__version__)"
python -c "import xgboost; print('xgboost %s ' % xgboost.__version__)"
python -c "import update_checker; print('update_checker %s ' % update_checker.__version__)"
python -c "import tqdm; print('tqdm %s' % tqdm.__version__)"

Expand Down
1 change: 1 addition & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,6 @@ pages:
- examples/IRIS_Example.md
- examples/Titanic_Kaggle_Example.md
- Contributing: contributing.md
- Release Notes: releases.md
- Citing: citing.md
- Support: support.md
18 changes: 16 additions & 2 deletions docs/sources/contributing.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
We welcome you to [check the existing issues](https://github.com/rhiever/tpot/issues/) for bugs or enhancements to work on. If you have an idea for an extension to TPOT, please [file a new issue](https://github.com/rhiever/tpot/issues/new) so we can discuss it.

## Project layout

The latest stable release of TPOT is on the [master branch](https://github.com/rhiever/tpot/tree/master), whereas the latest version of TPOT in development is on the [development branch](https://github.com/rhiever/tpot/tree/development). Make sure you are looking at and working on the correct branch if you're looking to contribute code.

In terms of directory structure:

* All of TPOT's code sources are in the `tpot` directory
* The documentation sources are in the `docs` directory
* Images in the documentation are in the `images` directory
* Tutorials for TPOT are in the `tutorials` directory
* Unit tests for TPOT are in the `tests.py` file

Make sure to familiarize yourself with the project layout before making any major contributions, and especially make sure to send all code changes to the `development` branch.

## How to contribute

The preferred way to contribute to TPOT is to fork the
Expand Down Expand Up @@ -27,9 +41,9 @@ GitHub:

6. Once some changes are saved locally, you can use your tweaked version of TPOT by navigating to the project's base directory and running TPOT directly from the command line:

$ python -m tpot.tpot
$ python -m tpot.driver

or by running script that imports and uses the TPOT module with code similar to `from tpot import TPOT`
or by running script that imports and uses the TPOT module with code similar to `from tpot import TPOTClassifier`

7. To check your changes haven't broken any existing tests and to check new tests you've added pass run the following (note, you must have the `nose` package installed within your dev environment for this to work):

Expand Down
5 changes: 2 additions & 3 deletions docs/sources/examples/IRIS_Example.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
The following code illustrates the usage of TPOT with the IRIS data set.

```python
from tpot import TPOT
from tpot import TPOTClassifier
from sklearn.datasets import load_iris
from sklearn.cross_validation import train_test_split
import numpy as np
Expand All @@ -10,7 +10,7 @@ iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data.astype(np.float64),
iris.target.astype(np.float64), train_size=0.75, test_size=0.25)

tpot = TPOT(generations=5, population_size=20, verbosity=2)
tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_iris_pipeline.py')
Expand Down Expand Up @@ -44,5 +44,4 @@ exported_pipeline = make_pipeline(

exported_pipeline.fit(training_features, training_classes)
results = exported_pipeline.predict(testing_features)

```
4 changes: 2 additions & 2 deletions docs/sources/examples/MNIST_Example.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
Below is a minimal working example with the practice MNIST data set.

```python
from tpot import TPOT
from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.cross_validation import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
train_size=0.75, test_size=0.25)

tpot = TPOT(generations=5, population_size=20, verbosity=2)
tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_mnist_pipeline.py')
Expand Down
8 changes: 8 additions & 0 deletions docs/sources/installing.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ DEAP, update_checker, and tqdm (used for verbose TPOT runs) can be installed wit
pip install deap update_checker tqdm
```

Optionally, install XGBoost if you would like TPOT to use XGBoost. XGBoost is entirely optional, and TPOT will still function normally without XGBoost if you do not have it installed.

```Shell
pip install xgboost
```

If you have issues installing XGBoost, check the [XGBoost installation documentation](http://xgboost.readthedocs.io/en/latest/build.html).

Finally to install TPOT itself, run the following command:

```Shell
Expand Down
86 changes: 86 additions & 0 deletions docs/sources/releases.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Version 0.6

* **TPOT now supports regression problems!** We have created two separate `TPOTClassifier` and `TPOTRegressor` classes to support classification and regression problems, respectively. The [command-line interface](/using/#tpot-on-the-command-line) also supports this feature through the `-mode` parameter.

* TPOT now allows you to **specify a time limit** for the optimization process with the `max_time_mins` parameter, so you don't need to guess how long TPOT will take any more to recommend a pipeline to you.

* Added a new operator that performs feature selection using [ExtraTrees](http://scikit-learn.org/stable/modules/ensemble.html#extremely-randomized-trees) feature importance scores.

* **[XGBoost](https://github.com/dmlc/xgboost) has been added as an optional dependency to TPOT.** If you have XGBoost installed, TPOT will automatically detect your installation and use the `XGBoostClassifier` and `XGBoostRegressor` in its pipelines.

* TPOT now offers a verbosity level of 3 ("science mode"), which outputs the entire Pareto front instead of only the current best score. This feature may be useful for users looking to make a trade-off between pipeline complexity and score.

# Version 0.5

* Major refactor: Each operator is defined in a separate class file. Hooray for easier-to-maintain code!
* TPOT now **exports directly to scikit-learn Pipelines** instead of hacky code.
* Internal representation of individuals now uses scikit-learn pipelines.
* Parameters for each operator have been optimized so TPOT spends less time exploring useless parameters.
* We have removed pandas as a dependency and instead use numpy matrices to store the data.
* TPOT now uses **k-fold cross-validation** when evaluating pipelines, with a default k = 3. This k parameter can be tuned when creating a new TPOT instance.
* Improved **scoring function support**: Even though TPOT uses balanced accuracy by default, you can now have TPOT use [any of the scoring functions](http://scikit-learn.org/stable/modules/model_evaluation.html#common-cases-predefined-values) that `cross_val_score` supports.
* Added the scikit-learn [Normalizer](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html) preprocessor.
* [Minor text fixes.](http://knowyourmeme.com/memes/pokemon-go-updates-controversy)

# Version 0.4

In TPOT 0.4, we've made some major changes to the internals of TPOT and added some convenience functions. We've summarized the changes below.

<ul>
<li>Added new sklearn models and preprocessors

<ul>
<li>AdaBoostClassifier</li>
<li>BernoulliNB</li>
<li>ExtraTreesClassifier</li>
<li>GaussianNB</li>
<li>MultinomialNB</li>
<li>LinearSVC</li>
<li>PassiveAggressiveClassifier</li>
<li>GradientBoostingClassifier</li>
<li>RBFSampler</li>
<li>FastICA</li>
<li>FeatureAgglomeration</li>
<li>Nystroem</li>
</ul></li>
<li>Added operator that inserts virtual features for the count of features with values of zero</li>
<li>Reworked parameterization of TPOT operators
<ul>
<li>Reduced parameter search space with information from a scikit-learn benchmark</li>
<li>TPOT no longer generates arbitrary parameter values, but uses a fixed parameter set instead</li>
</ul></li>
<li>Removed XGBoost as a dependency
<ul>
<li>Too many users were having install issues with XGBoost</li>
<li>Replaced with scikit-learn's GradientBoostingClassifier</li>
</ul></li>
<li>Improved descriptiveness of TPOT command line parameter documentation</li>
<li>Removed min/max/avg details during fit() when verbosity &gt; 1

<ul>
<li>Replaced with tqdm progress bar</li>
<li>Added tqdm as a dependency</li>
</ul></li>
<li>Added <code>fit_predict()</code> convenience function</li>
<li>Added <code>get_params()</code> function so TPOT can operate in scikit-learn's <code>cross_val_score</code> & related functions</li>
</ul>

# Version 0.3

* We revised the internal optimization process of TPOT to make it more efficient, in particular in regards to the model parameters that TPOT optimizes over.

# Version 0.2

* TPOT now has the ability to export the optimized pipelines to sklearn code.

* Logistic regression, SVM, and k-nearest neighbors classifiers were added as pipeline operators. Previously, TPOT only included decision tree and random forest classifiers.

* TPOT can now use arbitrary scoring functions for the optimization process.

* TPOT now performs multi-objective Pareto optimization to balance model complexity (i.e., # of pipeline operators) and the score of the pipeline.

# Version 0.1

* First public release of TPOT.

* Optimizes pipelines with decision trees and random forest classifiers as the model, and uses a handful of feature preprocessors.
Loading

0 comments on commit 322bee5

Please sign in to comment.