Skip to content
/ MAX Public

Code for reproducing experiments in Model-Based Active Exploration, ICML 2019

Notifications You must be signed in to change notification settings

nnaisense/MAX

Folders and files

NameName
Last commit message
Last commit date
May 13, 2019
May 13, 2019
Jul 18, 2019
May 13, 2019
Jul 18, 2019
Jul 18, 2019
May 13, 2019
Jul 23, 2019
May 13, 2019
May 13, 2019
May 13, 2019
Jul 23, 2019
May 13, 2019
May 13, 2019
May 13, 2019
May 13, 2019
May 13, 2019

Repository files navigation

Model-Based Active Exploration (MAX)

Code for reproducing experiments in Model-Based Active Exploration, ICML 2019

Written in PyTorch v1.0.

Code relies on sacred for managing experiments and hyper-parameters.

Overview:

  • envs/: contains the environments used.
  • main.py: contains the main algorithm and baselines through modes.
  • models.py: a fast parallel implementation of an ensemble of models which can are trained with negative log-likelihood loss.
  • utilities.py: contains the all the utilities (exploration objectives) used in the paper.
  • imagination.py: contains code that constructs a virtual MDP using the model ensemble.
  • sac.py: contains a simple Soft Actor-Critic implementation.
  • sacred_fetcher.py: script to download experiment artifacts stored in MongoDB.

Installation

  • Install required dependencies:

    sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf
  • Create conda environment with required dependencies:

    conda env create -f conda_env.yml
  • Download and setup MuJoCo binaries. The project uses mujoco and mujoco_py version 1.50.

    mkdir ~/.mujoco/
    cd .mujoco/
    wget -c https://www.roboti.us/download/mjpro150_linux.zip
    unzip mjpro150_linux.zip
    rm mjpro150_linux.zip

    Obtain MuJoCo license key and place it .mujoco/ directory created above with filename mjkey.txt.

  • Append the following to ~/.bashrc:

    # MuJoCo
    export LD_LIBRARY_PATH=:/home/<USER>/.mujoco/mjpro150/bin
    
    if [ -f /usr/lib/x86_64-linux-gnu/libGLEW.so ]; then    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/<USER>/.mujoco/mjpro150/bin:/usr/lib/nvidia-390
        export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so
        export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-375
    fi
    
  • Quick test of MuJoCo installation

    >>> import gym
    >>> gym.make('HalfCheetah-v2')

Commands

Execute the commands listed below from the code directory to reproduce the results.

Half Cheetah

  • MAX:
python main.py with max_explore env_noise_stdev=0.02
  • Trajectory Variance Active Exploration:
python main.py with max_explore utility_measure=traj_stdev policy_explore_alpha=0.2 env_noise_stdev=0.02
  • Renyi Divergence Reactive Exploration:
python main.py with max_explore exploration_mode=reactive env_noise_stdev=0.02
  • Prediction Error Reactive Exploration:
python main.py with max_explore exploration_mode=reactive utility_measure=pred_err policy_explore_alpha=0.2 env_noise_stdev=0.02
  • Random Exploration:
python main.py with random_explore env_noise_stdev=0.02

Ant

  • MAX:
python main.py with max_explore env_name=MagellanAnt-v2 env_noise_stdev=0.02 eval_freq=1500 checkpoint_frequency=1500 ant_coverage=True
  • Trajectory Variance Active Exploration:
python main.py with max_explore env_name=MagellanAnt-v2 utility_measure=traj_stdev policy_explore_alpha=0.2 env_noise_stdev=0.02 eval_freq=1500 checkpoint_frequency=1500 ant_coverage=True
  • Renyi Divergence Reactive Exploration:
python main.py with max_explore env_name=MagellanAnt-v2 exploration_mode=reactive env_noise_stdev=0.02 eval_freq=1500 checkpoint_frequency=1500 ant_coverage=True
  • Prediction Error Reactive Exploration:
python main.py with max_explore env_name=MagellanAnt-v2 exploration_mode=reactive utility_measure=pred_err policy_explore_alpha=0.2 env_noise_stdev=0.02 eval_freq=1500 checkpoint_frequency=1500 ant_coverage=True
  • Random Exploration:
python main.py with random_explore env_name=MagellanAnt-v2 env_noise_stdev=0.02 eval_freq=1500 checkpoint_frequency=1500 ant_coverage=True

Magellan

Magellan is the internal code name of the project inspired by life of Ferdinand Magellan.