-
Notifications
You must be signed in to change notification settings - Fork 48
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
c5379af
commit 615071d
Showing
1 changed file
with
33 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,13 @@ | ||
# `forest-confidence-interval`: Confidence intervals for Forest algorithms | ||
# `forestci`: confidence intervals for Forest algorithms | ||
|
||
[](https://travis-ci.org/scikit-learn-contrib/forest-confidence-interval) | ||
[](https://coveralls.io/r/scikit-learn-contrib/forest-confidence-interval) | ||
[](https://circleci.com/gh/scikit-learn-contrib/forest-confidence-interval/tree/master) | ||
[](http://joss.theoj.org/papers/b40f03cc069b43b341a92bd26b660f35) | ||
|
||
Forest algorithms are powerful | ||
[ensemble methods](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.ensemble) for classification and regression. However, predictions from these algorithms do contain some amount of error. Prediction variability can illustrate how influential | ||
Forest algorithms are powerful [ensemble methods](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.ensemble) for classification and regression. | ||
However, predictions from these algorithms do contain some amount of error. | ||
Prediction variability can illustrate how influential | ||
the training set is for producing the observed random forest predictions. | ||
|
||
`forest-confidence-interval` is a Python module that adds a calculation of | ||
|
@@ -15,45 +16,47 @@ implemented in scikit-learn random forest regression or classification objects. | |
The core functions calculate an in-bag and error bars for random forest | ||
objects. | ||
|
||
This module is based on R code from Stefan Wager (see important links below) | ||
and is licensed under the MIT open source license (see [LICENSE](LICENSE)) | ||
This module is based on R code from Stefan Wager | ||
([`randomForestCI`](https://github.com/swager/randomForestCI) deprecated in favor of [`grf`](https://github.com/swager/grf)) | ||
and is licensed under the MIT open source license (see [LICENSE](LICENSE)). | ||
The present project makes the algorithm compatible with [`scikit-learn`](https://scikit-learn.org/stable/). | ||
|
||
## Important Links | ||
`scikit-learn` - http://scikit-learn.org/ | ||
|
||
Stefan Wager's `randomForestCI` - https://github.com/swager/randomForestCI (deprecated in favor of `grf`: https://github.com/swager/grf) | ||
To get the proper confidence interval, you need to use a large number of trees (estimator). | ||
The [calibration routine](https://github.com/scikit-learn-contrib/forest-confidence-interval/pull/114) | ||
(which can be included or excluded on top of the algorithm) tries to extrapolate | ||
the results for infinite number of trees, but it is instable and it can cause numerical errors: | ||
if this is the case, the suggestion is to exclude it with `calibrate=False` | ||
and test increasing the number of trees in the model to reach convergence. | ||
|
||
## Installation and Usage | ||
Before installing the module you will need `numpy`, `scipy` and `scikit-learn`. | ||
Dependencies associated with the previous modules may need root privileges to install | ||
Consult the [API Reference](http://contrib.scikit-learn.org/forest-confidence-interval/reference/index.html) for documentation on core functionality | ||
|
||
``` | ||
pip install numpy scipy scikit-learn | ||
``` | ||
can also install dependencies with: | ||
|
||
``` | ||
pip install -r requirements.txt | ||
``` | ||
Before installing the module you will need `numpy`, `scipy` and `scikit-learn`. | ||
|
||
To install `forest-confidence-interval` execute: | ||
``` | ||
pip install forestci | ||
``` | ||
|
||
or, if you are installing from the source code: | ||
```shell | ||
python setup.py install | ||
``` | ||
|
||
If would like to install the development version of the software use: | ||
|
||
```shell | ||
pip install git+git://github.com/scikit-learn-contrib/forest-confidence-interval.git | ||
``` | ||
## Why use `forest-confidence-interval`? | ||
Our software is designed for individuals using `scikit-learn` random forest objects that want to add estimates of uncertainty to random forest predictors. Prediction variability demonstrates how much the training set influences results and is important for estimating standard errors. `forest-confidence-interval` is a Python module for calculating variance and adding confidence intervals to the popular Python library `scikit-learn`. The software is compatible with both `scikit-learn` random forest regression or classification objects. | ||
|
||
Usage: | ||
|
||
```python | ||
import import forestci as fci | ||
ci = fci.random_forest_error( | ||
forest=model, # scikit-learn Forest model fitted on X_train | ||
X_train_shape=X_train.shape, | ||
X_test=X, # the samples you want to compute the CI | ||
inbag=None, | ||
calibrate=True, | ||
memory_constrained=False, | ||
memory_limit=None, | ||
y_output=0 # in case of multioutput model, consider target 0 | ||
) | ||
``` | ||
|
||
## Examples | ||
|
||
|
@@ -81,10 +84,12 @@ Please write code that complies with the Python style guide, | |
E-mail [Ariel Rokem](mailto:[email protected]), [Kivan Polimis](mailto:[email protected]), or [Bryna Hazelton](mailto:[email protected] ) if you have any questions, suggestions or feedback. | ||
|
||
## Testing | ||
|
||
Requires installation of `nose` package. Tests are located in the `forestci/tests` folder | ||
and can be run with the `nosetests` command in the main directory. | ||
|
||
## Citation | ||
|
||
Click on the JOSS status badge for the Journal of Open Source Software article on this project. | ||
The BibTeX citation for the JOSS article is below: | ||
|
||
|