safe-control-gym

For the IROS 2022 Safe Robot Learning Competition, check out branch beta-iros-competition

safe-control-gym

Physics-based CartPole and Quadrotor Gym environments (using PyBullet) with symbolic a priori dynamics (using CasADi) for learning-based control, and model-free and model-based reinforcement learning (RL).

These environments include (and evaluate) symbolic safety constraints and implement input, parameter, and dynamics disturbances to test the robustness and generalizability of control approaches. [PDF]

@article{brunke2021safe,
         title={Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning}, 
         author={Lukas Brunke and Melissa Greeff and Adam W. Hall and Zhaocong Yuan and Siqi Zhou and Jacopo Panerati and Angela P. Schoellig},
         journal = {Annual Review of Control, Robotics, and Autonomous Systems},
         year={2021},
         url = {https://arxiv.org/abs/2108.06266}}

Install on Ubuntu/macOS

Clone repo

git clone https://github.com/utiasDSL/safe-control-gym.git
cd safe-control-gym

Option A (recommended): using conda

Create and access a Python 3.8 environment using conda

conda create -n safe python=3.8.10
conda activate safe

Install the safe-control-gym repository

pip install --upgrade pip
pip install -e .

Option B: using venv and poetry

Create and access a Python 3.8 virtual environment using pyenv and venv

pyenv install 3.8.10
pyenv local 3.8.10
python3 -m venv safe
source safe/bin/activate
pip install --upgrade pip
pip install poetry
poetry install

Note:

You may need to separately install gmp, a dependency of pycddlib:

conda install -c anaconda gmp

or

sudo apt-get install libgmp-dev

Option C: using Colab

See this notebook where safe-control-gym is pre-installed

Architecture

Overview of safe-control-gym's API:

@misc{yuan2021safecontrolgym,
      title={safe-control-gym: a Unified Benchmark Suite for Safe Learning-based Control and Reinforcement Learning}, 
      author={Zhaocong Yuan and Adam W. Hall and Siqi Zhou and Lukas Brunke and Melissa Greeff and Jacopo Panerati and Angela P. Schoellig},
      year={2021},
      eprint={2109.06325},
      archivePrefix={arXiv},
      primaryClass={cs.RO}}

Configuration

Performance

We compare the sample efficiency of safe-control-gym with the original [OpenAI Cartpole][1] and [PyBullet Gym's Inverted Pendulum][2], as well as [gym-pybullet-drones][3]. We choose the default physic simulation integration step of each project. We report performance results for open-loop, random action inputs. Note that the Bullet engine frequency reported for safe-control-gym is typically much finer grained for improved fidelity. safe-control-gym quadrotor environment is not as light-weight as [gym-pybullet-drones][3] but provides the same order of magnitude speed-up and several more safety features/symbolic models.

Environment	GUI	Control Freq.	PyBullet Freq.	Constraints & Disturbances^	Speed-Up^^
Gym cartpole	True	50Hz	N/A	No	1.16x
InvPenPyBulletEnv	False	60Hz	60Hz	No	158.29x
cartpole	True	50Hz	50Hz	No	0.85x
cartpole	False	50Hz	1000Hz	No	24.73x
cartpole	False	50Hz	1000Hz	Yes	22.39x

gym-pyb-drones	True	48Hz	240Hz	No	2.43x
gym-pyb-drones	False	50Hz	1000Hz	No	21.50x
quadrotor	True	60Hz	240Hz	No	0.74x
quadrotor	False	50Hz	1000Hz	No	9.28x
quadrotor	False	50Hz	1000Hz	Yes	7.62x

^ Whether the environment includes a default set of constraints and disturbances

^^ Speed-up = Elapsed Simulation Time / Elapsed Wall Clock Time; on a 2.30GHz Quad-Core i7-1068NG7 with 32GB 3733MHz LPDDR4X; no GPU

Getting Started

Familiarize with APIs and environments with the scripts in examples/

$ cd ./examples/                                                                    # Navigate to the examples folder
$ python3 tracking.py --overrides ./tracking.yaml                                   # PID trajectory tracking with the 2D quadcopter
$ python3 verbose_api.py --task cartpole --overrides verbose_api.yaml             #  Printout of the extended safe-control-gym APIs

Systems Variables and 2D Quadrotor Lemniscate Trajectory Tracking

Verbose API Example

List of Implemented Controllers

Re-create the Results in "Safe Learning in Robotics" [arXiv link]

To stay in touch, get involved or ask questions, please open an issue on GitHub or contact us via e-mail ({jacopo.panerati, zhaocong.yuan, adam.hall, siqi.zhou, lukas.brunke, melissa.greeff}@robotics.utias.utoronto.ca).

Figure 6—Robust GP-MPC [1]

$ cd ../experiments/annual_reviews/figure6/                        # Navigate to the experiment folder
$ chmod +x create_fig6.sh                                          # Make the script executable, if needed
$ ./create_fig6.sh                                                 # Run the script (ca. 2')

This will use the models in safe-control-gym/experiments/figure6/trained_gp_model/ to generate

To also re-train the GP models from scratch (ca. 30' on a laptop)

$ chmod +x create_trained_gp_model.sh                              # Make the script executable, if needed
$ ./create_trained_gp_model.sh                                     # Run the script (ca. 30')

Note: this will backup and overwrite safe-control-gym/experiments/figure6/trained_gp_model/

Figure 7—Safe RL Exploration [2]

$ cd ../figure7/                                                   # Navigate to the experiment folder
$ chmod +x create_fig7.sh                                          # Make the script executable, if needed
$ ./create_fig7.sh                                                 # Run the script (ca. 5'')

This will use the data in safe-control-gym/experiments/figure7/safe_exp_results.zip/ to generate

To also re-train all the controllers/agents (warning: >24hrs on a laptop, if necessary, run each one of the loops in the Bash script—PPO, PPO with reward shaping, and the Safe Explorer—separately)

$ chmod +x create_safe_exp_results.sh                              # Make the script executable, if needed
$ ./create_safe_exp_results.sh                                     # Run the script (>24hrs)

Note: this script will (over)write the results in safe-control-gym/experiments/figure7/safe_exp_results/; if you do not run the re-training to completion, delete the partial results rm -r -f ./safe_exp_results/ before running ./create_fig7.sh again.

Figure 8—Model Predictive Safety Certification [3]

(required) Obtain MOSEK's license (free for academia). Once you have received (via e-mail) and downloaded the license to your own ~/Downloads folder, install it by executing

$ mkdir ~/mosek                                                    # Create MOSEK license folder in your home '~'
$ mv ~/Downloads/mosek.lic ~/mosek/                                # Copy the downloaded MOSEK license to '~/mosek/'

Then run

$ cd ../figure8/                                                   # Navigate to the experiment folder
$ chmod +x create_fig8.sh                                          # Make the script executable, if needed
$ ./create_fig8.sh                                                 # Run the script (ca. 1')

This will use the unsafe (pre-trained) PPO controller/agent in folder safe-control-gym/experiments/figure8/unsafe_ppo_model/ to generate

To also re-train the unsafe PPO controller/agent (ca. 2' on a laptop)

$ chmod +x create_unsafe_ppo_model.sh                              # Make the script executable, if needed
$ ./create_unsafe_ppo_model.sh                                     # Run the script (ca. 2')

Note: this script will (over)write the model in safe-control-gym/experiments/figure8/unsafe_ppo_model/

References

[1] Hewing L, Kabzan J, Zeilinger MN. 2020. Cautious model predictive control using Gaussian process regression. IEEE Transactions on Control Systems Technology 28:2736–2743
[2] Dalal G, Dvijotham K, Vecerik M, Hester T, Paduraru C, Tassa Y. 2018. Safe exploration in continuous action spaces. arXiv:1801.08757 [cs.AI]
[3] Wabersich KP, Zeilinger MN. 2018. Linear Model Predictive Safety Certification for Learning-Based Control. In 2018 IEEE Conference on Decision and Control (CDC), pp. 7130–7135

Related Open-source Projects

gym-pybullet-drones: single and multi-quadrotor environments
gym-marl-reconnaissance: multi-agent heterogeneous (UAV/UGV) environments
stable-baselines3: PyTorch reinforcement learning algorithms
bullet3: multi-physics simulation engine
gym: OpenAI reinforcement learning toolkit
safety-gym: environments for safe exploration in RL
realworldrl_suite: real-world RL challenge framework
casadi: symbolic framework for numeric optimization

TODOs (August 2022)

Publish to PyPI
Create resource list with papers, projects, blog posts (Cat's, etc.) using safe-control-gym

University of Toronto's Dynamic Systems Lab / Vector Institute for Artificial Intelligence

Name		Name	Last commit message	Last commit date
Latest commit History 282 Commits
.github/workflows		.github/workflows
examples		examples
experiments		experiments
figures		figures
safe_control_gym		safe_control_gym
tests		tests
walkthroughs		walkthroughs
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

safe-control-gym

Install on Ubuntu/macOS

Clone repo

Option A (recommended): using conda

Option B: using venv and poetry

Note:

Option C: using Colab

Architecture

Configuration

Performance

Getting Started

Systems Variables and 2D Quadrotor Lemniscate Trajectory Tracking

Verbose API Example

List of Implemented Controllers

Re-create the Results in "Safe Learning in Robotics" [arXiv link]

Figure 6—Robust GP-MPC [1]

Figure 7—Safe RL Exploration [2]

Figure 8—Model Predictive Safety Certification [3]

References

Related Open-source Projects

TODOs (August 2022)

About

Releases

Packages

Languages

License

UVA-BezzoRobotics-AMRLab/safe-control-gym

Folders and files

Latest commit

History

Repository files navigation

safe-control-gym

Install on Ubuntu/macOS

Clone repo

Option A (recommended): using conda

Option B: using venv and poetry

Note:

Option C: using Colab

Architecture

Configuration

Performance

Getting Started

Systems Variables and 2D Quadrotor Lemniscate Trajectory Tracking

Verbose API Example

List of Implemented Controllers

Re-create the Results in "Safe Learning in Robotics" [arXiv link]

Figure 6—Robust GP-MPC [1]

Figure 7—Safe RL Exploration [2]

Figure 8—Model Predictive Safety Certification [3]

References

Related Open-source Projects

TODOs (August 2022)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages