We are happy to accept contributions of methods, as well as updates to the benchmarking framework. Below we specify minimal requirements for contributing a method to this benchmark.
- In general you should submit pull requests to the dev branch.
- Make the PR detailed and reference specific issues if the PR is meant to address any.
- Please be kind and please be patient. We will be, too.
To contribute a symbolic regression method for benchmarking, fork the repo, make the changes listed below, and submit a pull request to the dev
branch.
Once your method passes the basic tests and we've reviewed it, congrats!
We will plan to benchmark your method on hundreds of regression problems.
Please note that the schedule for updating benchmarks is dependent on a lot of factors including availability of computing resources and availability of all our contributors. If you are on a tight schedule, it is better to plan to benchmark your method yourself. You can leverage this code base and previous experimental results to do so.
- An open-source method with a scikit-learn compatible API
- Your method should be compatible with Python 3.7 or higher to ensure compatibility with conda-forge.
- If your method uses a random seed, it should have a
random_state
attribute that can be set. - Methods must have their own folders in the
algorithms
directory (e.g.,algorithms/feat
). This folder should contain:metadata.yml
(required): A file describing your submission, following the descriptions in [submission/feat-example/metadata.yml][metadata].regressor.py
(required): a Python file that defines your method, named appropriately. See [submission/feat-example/regressor.py][regressor] for complete documentation. It should contain:est
: a sklearn-compatibleRegressor
object.model(est, X=None)
: a function that returns a sympy-compatible string specifying the final model. It can optionally take the training data as an input argument. See guidance below.eval_kwargs
(optional): a dictionary that can specify method-specific arguments toevaluate_model.py
.
LICENSE
(optional) A license fileenvironment.yml
(optional): a conda environment file that specifies dependencies for your submission. It will be used to update the baseline environment (environment.yml
in the root directory). To the extent possible, conda should be used to specify the dependencies you need. If your method is part of conda, great! You can just put that in here and leaveinstall.sh
blank.requirements.txt
(optional): a pypi requirements file. The script will runpip install -r requirements.txt
if this file is found, before proceeding.install.sh
(optional): a bash script that installs your method. **Note: scripts should not require sudo permissions. The library and include paths should be directed to conda environment; the environmental variable$CONDA_PREFIX
specifies the path to the environment.- do not include your source code. use
install.sh
to pull it from a stable source repository.
In order to check for exact solutions to problems with known, ground-truth models, each SR method returns a model string that can be manipulated in sympy. Assure the returned model meets these requirements:
- The variable names appearing in the model are identical to those in the training data,
X
, which is apd.Dataframe
. If your method names variables some other way, e.g.[x_0 ... x_m]
, you can specify a mapping in themodel
function such as:
def model(est, X):
mapping = {'x_'+str(i):k for i,k in enumerate(X.columns)}
new_model = est.model_
for k,v in reversed(mapping.items()):
new_model = new_model.replace(k,v)
- The operators/functions in the model are available in sympy's function set.