This project is no longer maintained. However, please feel free to contact me if you have questions about the associated paper or believe parts of the source code may be of use to you.
This project incorporates the auto-sklearn toolkit (asl
) into a solver runtime
prediction framework. The predictions are then passed to a second classification
model which yields a solution to the algorithm selection problem (as
).
This project is written in python3
and can be installed with pip
.
The project indirectly depends on the pyrfr
and, thus, also requires SWIG-3
.
pip3 install -r requirements.txt
pip3 install -r requirements.2.txt
N.B. If installing under anaconda, use pip
rather than pip3
Example usage for OASC
All usage requires a yaml configuration file. This needs to include a
base_path
key, and all results will be to that location.
An example (complete) config file:
base_path: /prj/oasc2017
Short test run
process-oasc-scenario oasc.yaml /mldb/oasc_scenarios/train/Bado/ /mldb/oasc_scenarios/test/Bado/ --total-training-time 30 --num-cpus 3 --logging-level INFO --max-feature-steps 1
Command for OASC-like run
process-oasc-scenario oasc.yaml /path/to/oasc_scenarios/train/Bado/ /path/to/oasc_scenarios/test/Bado/ --total-training-time 600 --num-cpus 8 --logging-level INFO
This command uses the training set to learn an algorithm selection scheduler. It
then uses that scheduler to create solving schedules for the test set. The
--total-training-time
parameter gives the approximate amount of time (in
seconds) allocated to auto-sklearn
for each internal model.
This script also performs feature set selection and determines a presolving
schedule. The --max-feature-steps
can be given to limit the number of feature
steps considered in the search.
Typical problem sets for OASC take about an hour or two with 8 CPUs when 10 minutes are used for each model.
This script also accepts other optional parameters controlling the behavior of
auto-sklearn
, BLAS, logging, etc. These were all kept at default values for
submissions.
The --help
flag can be given to see all options and their default values.
The schedule for the test instances is written to: <base_path>/schedule.asl.<scenario_name>.json
.
The learned scheduler is written to: <base_path>/model.asl.scheduler.<scenario_name>.pkl.gz
process-oasc-scenario oasc.yaml /path/to/oasc_scenarios/train/Bado/ /path/to/oasc_scenarios/test/Bado/ --use-random-forests --num-cpus 8 --logging-level INFO
This command differs from the command above because it includes the
--use-random-forests
flag. Rather than learn the internal models for the
scheduler using auto-sklearn
, it instead uses standard sklearn
random
forests (with 100 trees).
It ignores the auto-sklearn
parameters (such as --total-training-time
). It
tends to be somewhat faster since it avoids the Bayesian optimization. Still,
many of the steps (such as feature step selection) are the same for both, so the
typical time on OASC is similar to that mentioned above.
The schedule for the test instances is written to: <base_path>/schedule.rf.<scenario_name>.json
.
The learned scheduler is written to: <base_path>/model.rf.scheduler.<scenario_name>.pkl.gz
This project relies heavily on the automl_utils
module. Indeed, almost all of the logic for interacting with auto-sklearn
is
wrapped in that module.