Skip to content

Proximal Policy Optimization with Symmetric Entropy

License

Notifications You must be signed in to change notification settings

AnirudhMaiya/Symm-PPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Symm-PPO

Proximal Policy Optimization with Symmetric Entropy

Motivation

Reinforcement Learning is very sensitive to hyperparameters. I was stuck with a problem in my final year project where the agent wasn't making progress and was stuck in a sub-optimal policy when I was using PPO. This is when I increased the entropy coefficient value in the existing vanilla PPO framework. Choosing such coefficients is hard and daunting in general. Since PPO follows on the analogy of new policy not being too different from the old one, I tried to inculcate the same idea to the entropy but the entropy is symmetric.

Algorithm

Symm-PPO-Algorithm

I also decay the entropy coefficient here.

Prerequisites

  • Python 3
  • PyTorch (Tested on 1.12.1)
  • gym==0.17.3
  • pybullet==3.1.6
  • stable-baselines3==1.0
  • matplotlib
  • Installation

    Please clone this repository to your local machine

    git clone https://github.com/AnirudhMaiya/Symm-PPO
    

    After cloning, check into the src folder of the repository

    !pip install -r requirements.txt
    
    import urllib.request
    urllib.request.urlretrieve('http://www.atarimania.com/roms/Roms.rar','Roms.rar')
    !pip install unrar
    !unrar x Roms.rar
    !mkdir rars
    !mv "HC ROMS" rars
    !mv "ROMS" rars
    !python -m atari_py.import_roms rars
    
    #--algo ppo is actually Symm-PPO here!!!
    !python main.py --env-name "PongNoFrameskip-v4" --algo ppo --use-gae --lr 2.5e-4 --clip-param 0.1 --value-loss-coef 0.5 --num-processes 8 --num-steps 128 --num-mini-batch 1 --log-interval 1 --use-linear-lr-decay --entropy-coef 0.05

    The above code can also be executed through a jupyter notebook Run_Symm-PPO.ipynb

    Results

    PongNoFrameskip-v4

    PongNoFrameskip-v4-Median-Rewards PongNoFrameskip-v4-Diff-Median-Rewards

    Seaquest-v0

    Seaquest-v0-Median-Rewards Seaquest-v0-Diff-Median-Rewards

    Afterthought

    The symmetric entropy added serves as a regularizer. Hence the median rewards are less initially when compared to Vanilla PPO.

    Special thanks to pytorch-a2c-ppo-acktr-gail repository.

    About

    Proximal Policy Optimization with Symmetric Entropy

    Resources

    License

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published