Delusions

A PyTorch Implementation for experiments in

"Rejecting Hallucinated State Targets during Planning"

authored by Mingde "Harry" Zhao, Tristan Sylvain, Romain Laroche, Doina Precup, Yoshua Bengio

This repo was implemented by Harry Zhao (@PwnerHarry), mostly adapted from Skipper

This work was done during Harry's Mitacs Internship at RBC Borealis (originally Borealis AI), under the mentorship of Tristan Sylvain (@TiSU32).

Python virtual environment configuration:

Create a virtual environment with conda or venv (we used Python 3.10)
Install PyTorch according to the official guidelines, make sure it recognizes your accelerators
pip install -r requirements.txt to install dependencies

To check the results with `tensorboard`:

tensorboard --logdir=tb_records

For experiments, write bash scripts to call these `python` scripts:

run_minigrid_mp.py: a multi-processed experiment initializer for Skipper variants

run_minigrid.py: a single-processed experiment initializer for Skipper variants

run_leap_pretrain_vae.py: a single-processed experiment initializer for pretraining generator for the LEAP agent

run_leap_pretrain_rl.py: a single-processed experiment initializer for pretraining distance estimator (policy) for the LEAP agent

Please read carefully the argument definitions in runtime.py and pass the desired arguments.

To control the HER variants:

Use --hindsight_strategy to specify the hindsight relabeling strategy. The options are:

future: same as "future" variant in paper
episode: same as "episode" variant in paper
pertask: same as "pertask" variant in paper
future+episode: correspond to "E" variant in paper
future+pertask: correspond to "P" variant in paper
[email protected]: correspond to "(E+P)" variant in paper, where 0.5 controls the mixture ratio of pertask

To use the "generate" strategy for estimator training, use --prob_relabel_generateJIT to specify the probability of replacing the relabeled target:

--hindsight_strategy future+episode --prob_relabel_generateJIT 1.0: correspond to "G" variant in paper
--hindsight_strategy future+episode --prob_relabel_generateJIT 0.5: correspond to "(E+G)" variant in paper
--hindsight_strategy [email protected] --prob_relabel_generateJIT 0.25: correspond to "(E+P+G)" variant in paper

To choose environment and training settings:

--game SwordShieldMonster --size_world 12 --num_envs_train 50: game can be switched with RandDistShift (RDS) and size_world should >= 8

Extras

There is a potential CUDA_INDEX_ASSERTION error that could cause hanging at the beginning of the *Skipper *runs. We don't know yet how to fix it
The Dynamic Programming solutions for environment ground truth are only compatible with deterministic experiments

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
HER.py		HER.py
LEAP_utils.py		LEAP_utils.py
LICENSE		LICENSE
README.md		README.md
RandDistShift.py		RandDistShift.py
SwordShieldMonster.py		SwordShieldMonster.py
agents.py		agents.py
baselines.py		baselines.py
dyna.py		dyna.py
minigrid.py		minigrid.py
models.py		models.py
modules.py		modules.py
pretrain_leap_rl.py		pretrain_leap_rl.py
pretrain_leap_vae.py		pretrain_leap_vae.py
requirements.txt		requirements.txt
run_minigrid.py		run_minigrid.py
run_minigrid_alongside_dyna_model.py		run_minigrid_alongside_dyna_model.py
run_minigrid_mp.py		run_minigrid_mp.py
runtime.py		runtime.py
utils.py		utils.py
utils_mp.py		utils_mp.py
visual_utils.py		visual_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Delusions

Python virtual environment configuration:

To check the results with `tensorboard`:

For experiments, write bash scripts to call these `python` scripts:

To control the HER variants:

To choose environment and training settings:

Extras

About

Releases

Packages

Languages

License

mila-iqia/Delusions

Folders and files

Latest commit

History

Repository files navigation

Delusions

Python virtual environment configuration:

To check the results with tensorboard:

For experiments, write bash scripts to call these python scripts:

To control the HER variants:

To choose environment and training settings:

Extras

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

To check the results with `tensorboard`:

For experiments, write bash scripts to call these `python` scripts:

Packages