Merge pull request #258 from rhiever/development

0.6 release
EpistasisLab · Sep 2, 2016 · 322bee5 · 322bee5
2 parents 589a020 + 78e7197
commit 322bee5
Show file tree

Hide file tree

Showing 58 changed files with 2,226 additions and 1,011 deletions.
diff --git a/.gitignore b/.gitignore
@@ -73,3 +73,5 @@ docs/sources/examples/.Rhistory
 .idea
 
 analyze-oj2-tpot-mdr.ipynb
+
+tpot-mdr-demo.ipynb
diff --git a/README.md b/README.md
@@ -53,15 +53,15 @@ Click on the corresponding links to find more information on TPOT usage in the d
 Below is a minimal working example with the practice MNIST data set.
 
 ```python
-from tpot import TPOT
+from tpot import TPOTClassifier
 from sklearn.datasets import load_digits
 from sklearn.cross_validation import train_test_split
 
 digits = load_digits()
 X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
                                                     train_size=0.75, test_size=0.25)
 
-tpot = TPOT(generations=5, population_size=20, verbosity=2)
+tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2)
 tpot.fit(X_train, y_train)
 print(tpot.score(X_test, y_test))
 tpot.export('tpot_mnist_pipeline.py')

diff --git a/ci/.travis_install.sh b/ci/.travis_install.sh
@@ -38,15 +38,17 @@ else
     conda create -n testenv --yes python=$PYTHON_VERSION pip nose \
         numpy=$NUMPY_VERSION scipy=$SCIPY_VERSION \
         scikit-learn=$SKLEARN_VERSION \
-	cython
+    cython
 fi
 
 source activate testenv
 
 if [[ "$LATEST" == "true" ]]; then
     pip install deap
+    pip install xgboost
 else
     pip install deap==$DEAP_VERSION
+    pip install xgboost==$XGBOOST_VERSION
 fi
 
 pip install update_checker
@@ -62,6 +64,7 @@ python -c "import numpy; print('numpy %s' % numpy.__version__)"
 python -c "import scipy; print('scipy %s' % scipy.__version__)"
 python -c "import sklearn; print('sklearn %s' % sklearn.__version__)"
 python -c "import deap; print('deap %s' % deap.__version__)"
+python -c "import xgboost; print('xgboost %s ' % xgboost.__version__)"
 python -c "import update_checker; print('update_checker %s' % update_checker.__version__)"
 python -c "import tqdm; print('tqdm %s' % tqdm.__version__)"
 python setup.py build_ext --inplace
diff --git a/ci/.travis_test.sh b/ci/.travis_test.sh
@@ -14,6 +14,7 @@ python -c "import numpy; print('numpy %s' % numpy.__version__)"
 python -c "import scipy; print('scipy %s' % scipy.__version__)"
 python -c "import sklearn; print('sklearn %s' % sklearn.__version__)"
 python -c "import deap; print('deap %s' % deap.__version__)"
+python -c "import xgboost; print('xgboost %s ' % xgboost.__version__)"
 python -c "import update_checker; print('update_checker %s ' % update_checker.__version__)"
 python -c "import tqdm; print('tqdm %s' % tqdm.__version__)"
 

diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
@@ -23,5 +23,6 @@ pages:
   - examples/IRIS_Example.md
   - examples/Titanic_Kaggle_Example.md
 - Contributing: contributing.md
+- Release Notes: releases.md
 - Citing: citing.md
 - Support: support.md
diff --git a/docs/sources/contributing.md b/docs/sources/contributing.md
@@ -1,5 +1,19 @@
 We welcome you to [check the existing issues](https://github.com/rhiever/tpot/issues/) for bugs or enhancements to work on. If you have an idea for an extension to TPOT, please [file a new issue](https://github.com/rhiever/tpot/issues/new) so we can discuss it.
 
+## Project layout
+
+The latest stable release of TPOT is on the [master branch](https://github.com/rhiever/tpot/tree/master), whereas the latest version of TPOT in development is on the [development branch](https://github.com/rhiever/tpot/tree/development). Make sure you are looking at and working on the correct branch if you're looking to contribute code.
+
+In terms of directory structure:
+
+* All of TPOT's code sources are in the `tpot` directory
+* The documentation sources are in the `docs` directory
+* Images in the documentation are in the `images` directory
+* Tutorials for TPOT are in the `tutorials` directory
+* Unit tests for TPOT are in the `tests.py` file
+
+Make sure to familiarize yourself with the project layout before making any major contributions, and especially make sure to send all code changes to the `development` branch.
+
 ## How to contribute
 
 The preferred way to contribute to TPOT is to fork the 
@@ -27,9 +41,9 @@ GitHub:
 
 6. Once some changes are saved locally, you can use your tweaked version of TPOT by navigating to the project's base directory and running TPOT directly from the command line:
 
-          $ python -m tpot.tpot
+          $ python -m tpot.driver
 
-    or by running script that imports and uses the TPOT module with code similar to `from tpot import TPOT`
+    or by running script that imports and uses the TPOT module with code similar to `from tpot import TPOTClassifier`
 
 7. To check your changes haven't broken any existing tests and to check new tests you've added pass run the following (note, you must have the `nose` package installed within your dev environment for this to work):
 

diff --git a/docs/sources/examples/IRIS_Example.md b/docs/sources/examples/IRIS_Example.md
@@ -1,7 +1,7 @@
 The following code illustrates the usage of TPOT with the IRIS data set.
 
 ```python
-from tpot import TPOT
+from tpot import TPOTClassifier
 from sklearn.datasets import load_iris
 from sklearn.cross_validation import train_test_split
 import numpy as np
@@ -10,7 +10,7 @@ iris = load_iris()
 X_train, X_test, y_train, y_test = train_test_split(iris.data.astype(np.float64),
     iris.target.astype(np.float64), train_size=0.75, test_size=0.25)
 
-tpot = TPOT(generations=5, population_size=20, verbosity=2)
+tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2)
 tpot.fit(X_train, y_train)
 print(tpot.score(X_test, y_test))
 tpot.export('tpot_iris_pipeline.py')
@@ -44,5 +44,4 @@ exported_pipeline = make_pipeline(
 
 exported_pipeline.fit(training_features, training_classes)
 results = exported_pipeline.predict(testing_features)
-
 ```
diff --git a/docs/sources/examples/MNIST_Example.md b/docs/sources/examples/MNIST_Example.md
@@ -1,15 +1,15 @@
 Below is a minimal working example with the practice MNIST data set.
 
 ```python
-from tpot import TPOT
+from tpot import TPOTClassifier
 from sklearn.datasets import load_digits
 from sklearn.cross_validation import train_test_split
 
 digits = load_digits()
 X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
                                                     train_size=0.75, test_size=0.25)
 
-tpot = TPOT(generations=5, population_size=20, verbosity=2)
+tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2)
 tpot.fit(X_train, y_train)
 print(tpot.score(X_test, y_test))
 tpot.export('tpot_mnist_pipeline.py')

diff --git a/docs/sources/installing.md b/docs/sources/installing.md
@@ -26,6 +26,14 @@ DEAP, update_checker, and tqdm (used for verbose TPOT runs) can be installed wit
 pip install deap update_checker tqdm
 ```
 
+Optionally, install XGBoost if you would like TPOT to use XGBoost. XGBoost is entirely optional, and TPOT will still function normally without XGBoost if you do not have it installed.
+
+```Shell
+pip install xgboost
+```
+
+If you have issues installing XGBoost, check the [XGBoost installation documentation](http://xgboost.readthedocs.io/en/latest/build.html).
+
 Finally to install TPOT itself, run the following command:
 
 ```Shell

diff --git a/docs/sources/releases.md b/docs/sources/releases.md
@@ -0,0 +1,86 @@
+# Version 0.6
+
+* **TPOT now supports regression problems!** We have created two separate `TPOTClassifier` and `TPOTRegressor` classes to support classification and regression problems, respectively. The [command-line interface](/using/#tpot-on-the-command-line) also supports this feature through the `-mode` parameter.
+
+* TPOT now allows you to **specify a time limit** for the optimization process with the `max_time_mins` parameter, so you don't need to guess how long TPOT will take any more to recommend a pipeline to you.
+
+* Added a new operator that performs feature selection using [ExtraTrees](http://scikit-learn.org/stable/modules/ensemble.html#extremely-randomized-trees) feature importance scores.
+
+* **[XGBoost](https://github.com/dmlc/xgboost) has been added as an optional dependency to TPOT.** If you have XGBoost installed, TPOT will automatically detect your installation and use the `XGBoostClassifier` and `XGBoostRegressor` in its pipelines.
+
+* TPOT now offers a verbosity level of 3 ("science mode"), which outputs the entire Pareto front instead of only the current best score. This feature may be useful for users looking to make a trade-off between pipeline complexity and score.
+
+# Version 0.5
+
+* Major refactor: Each operator is defined in a separate class file. Hooray for easier-to-maintain code!
+* TPOT now **exports directly to scikit-learn Pipelines** instead of hacky code.
+* Internal representation of individuals now uses scikit-learn pipelines.
+* Parameters for each operator have been optimized so TPOT spends less time exploring useless parameters.
+* We have removed pandas as a dependency and instead use numpy matrices to store the data.
+* TPOT now uses **k-fold cross-validation** when evaluating pipelines, with a default k = 3. This k parameter can be tuned when creating a new TPOT instance.
+* Improved **scoring function support**: Even though TPOT uses balanced accuracy by default, you can now have TPOT use [any of the scoring functions](http://scikit-learn.org/stable/modules/model_evaluation.html#common-cases-predefined-values) that `cross_val_score` supports.
+* Added the scikit-learn [Normalizer](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html) preprocessor.
+* [Minor text fixes.](http://knowyourmeme.com/memes/pokemon-go-updates-controversy)
+
+# Version 0.4
+
+In TPOT 0.4, we've made some major changes to the internals of TPOT and added some convenience functions. We've summarized the changes below.
+
+<ul>
+<li>Added new sklearn models and preprocessors
+
+<ul>
+<li>AdaBoostClassifier</li>
+<li>BernoulliNB</li>
+<li>ExtraTreesClassifier</li>
+<li>GaussianNB</li>
+<li>MultinomialNB</li>
+<li>LinearSVC</li>
+<li>PassiveAggressiveClassifier</li>
+<li>GradientBoostingClassifier</li>
+<li>RBFSampler</li>
+<li>FastICA</li>
+<li>FeatureAgglomeration</li>
+<li>Nystroem</li>
+</ul></li>
+<li>Added operator that inserts virtual features for the count of features with values of zero</li>
+<li>Reworked parameterization of TPOT operators
+<ul>
+<li>Reduced parameter search space with information from a scikit-learn benchmark</li>
+<li>TPOT no longer generates arbitrary parameter values, but uses a fixed parameter set instead</li>
+</ul></li>
+<li>Removed XGBoost as a dependency
+<ul>
+<li>Too many users were having install issues with XGBoost</li>
+<li>Replaced with scikit-learn's GradientBoostingClassifier</li>
+</ul></li>
+<li>Improved descriptiveness of TPOT command line parameter documentation</li>
+<li>Removed min/max/avg details during fit() when verbosity &gt; 1
+
+<ul>
+<li>Replaced with tqdm progress bar</li>
+<li>Added tqdm as a dependency</li>
+</ul></li>
+<li>Added <code>fit_predict()</code> convenience function</li>
+<li>Added <code>get_params()</code> function so TPOT can operate in scikit-learn's <code>cross_val_score</code> & related functions</li>
+</ul>
+
+# Version 0.3
+
+* We revised the internal optimization process of TPOT to make it more efficient, in particular in regards to the model parameters that TPOT optimizes over.
+
+# Version 0.2
+
+* TPOT now has the ability to export the optimized pipelines to sklearn code.
+
+* Logistic regression, SVM, and k-nearest neighbors classifiers were added as pipeline operators. Previously, TPOT only included decision tree and random forest classifiers.
+
+* TPOT can now use arbitrary scoring functions for the optimization process.
+
+* TPOT now performs multi-objective Pareto optimization to balance model complexity (i.e., # of pipeline operators) and the score of the pipeline.
+
+# Version 0.1
+
+* First public release of TPOT.
+
+* Optimizes pipelines with decision trees and random forest classifiers as the model, and uses a handful of feature preprocessors.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -73,3 +73,5 @@ docs/sources/examples/.Rhistory
		.idea

		analyze-oj2-tpot-mdr.ipynb

		tpot-mdr-demo.ipynb