This repository is the implementation of paper "Leveraging Symmetries in Gaits for Reinforcement Learning: A Case Study on Quadrupedal Gaits", based on Isaac Gym and Isaac Gym Benchmark Environments.
Features:
- Symmetry-based Reward Design for RL: Incorporate three symmetries (temporal symmetry, time-reversal symmetry and morphological symmetry) into the reward function, and train 4 gaits for a quadrupedal robot Bittle.
Authors: Jiayu Ding ([email protected]), Xulin Chen ([email protected])
Affiliation: DLAR Lab
This projected was initially developed at Syracuse University (Dynamic Locomotion and Robotics Lab).
This work has been submitted to IROS 2024. If you use this work in an academic context, please cite the following publication: https://arxiv.org/submit/5474477.
Download Isaac Gym from the website and follow the installation instructions. Recommend using an individual conda environment.
Once Isaac Gym is properly installed, download this repository and run the following commands
cd Bittle_Leveraging_Symmetries_in_RL/
pip install -e .
pip install -r requirements.txt
cfg/task/DLARBittle_PRD_v2.yaml
: Parameters for creating a Bittle environment.cfg/train/DLARBittlePPO_LSTM.yaml
: The configuration of RL policy training (using PPO algorithms and LSTM network).tasks/dlar_bittle_PRD_v2.py
: The definition of Bittle environment in python.runs/
: Save the trained policies.
Before running any code, change the directory and activate the conda environment
cd isaacgymenvs/
conda activate your_conda_env_name
To train policies, run
./train.sh
The trained policies are saved under runs/DLARBittle_ww-xx-yy-zz
. In this directory,
nn/DLARBittle.pth
saves the policy parameters achieving the best performance.nn/last_DLARBittle_ep_x_rew_y.pth
files are the policy parameters at training epochx
achieving rewardy
.summaries/
includes the log file for training. To visualize the log file, install tensorboard by runtensorboard --logdir=/path/to/log/file
.
To visualize policies, change checkpoint=/path/to/your/policy
in visualize.sh
. Then run
./visualize.sh
We upload 4 pretrained policies for bounding (runs/DLARBittle_B2_0.1-0.8/
), galloping (runs/DLARBittle_GP_0.1-0.8/
), half-bounding (runs/DLARBittle_HB_H2_0.3-0.6/
) and pronking (runs/DLARBittle_PK_0.1-0.8/
) gait.
To visualize a policy, change checkpoint=/path/to/your/policy
in record_video.sh
. Then run
./record_video.sh
Sym_Guided_RL_Video_v2.mp4
ace04f2ae93e65340a0d10df2a615ed40966767c