Skip to content

Commit

Permalink
init
Browse files Browse the repository at this point in the history
  • Loading branch information
gregSchwartz18 committed Jul 2, 2019
0 parents commit 46bbdb6
Show file tree
Hide file tree
Showing 44 changed files with 5,554 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[flake8]
max-line-length = 101
120 changes: 120 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
.idea/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# project specific
/videos
/videos/*
42 changes: 42 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
language: python

cache: pip

python:
- "3.5"

os: linux

dist: trusty

sudo: required

before_install:
- sudo apt-get update
# Setup conda (needed for opencv, ray dependency)
# WARNING: enforces py3.5
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
- bash miniconda.sh -b -p $HOME/miniconda
- export PATH="$HOME/miniconda/bin:$PATH"
- hash -r
- conda config --set always_yes yes --set changeps1 no
- conda update -q conda
- conda info -a
- python -V

# Set up requirements for running tests
- conda env create -f environment.yml
- source activate causal

install:
- pip install flake8 .
- pip install pytest

before_script:
- flake8 --version
- flake8 --show-source

script:
- python setup.py install
- python -m pytest

64 changes: 64 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
[![Build Status](https://travis-ci.com/eugenevinitsky/sequential_social_dilemma_games.svg?branch=master)](https://travis-ci.com/eugenevinitsky/sequential_social_dilemma_games)

# Sequential Social Dilemma Games
This repo is an open-source implementation of DeepMind's Sequential Social Dilemma (SSD) multi-agent game-theoretic environments [[1]](https://arxiv.org/abs/1702.03037). SSDs can be thought of as analogous to spatially and temporally extended Prisoner's Dilemma-like games. The reward structure poses a dilemma because individual short-term optimal strategies lead to poor long-term outcomes for the group.

The implemented environments are structured to be compatible with OpenAIs gym environments (https://github.com/openai/gym) as well as RLlib's Multiagent Environment (https://github.com/ray-project/ray/blob/master/python/ray/rllib/env/multi_agent_env.py)

## Implemented Games

* **Cleanup**: A public goods dilemma in which agents get a reward for consuming apples, but must use a cleaning beam to clean a river in order for apples to grow. While an agent is cleaning the river, other agents can exploit it by consuming the apples that appear.

<img src="images/cleanup.png" alt="Image of the cleanup game" width="170" height="246"/>

* **Harvest**: A tragedy-of-the-commons dilemma in which apples regrow at a rate that depends on the amount of nearby apples. If individual agents employ an exploitative strategy by greedily consuming too many apples, the collective reward of all agents is reduced.

<img src="images/harvest.png" alt="Image of the Harvest game" width="483" height="187"/>

<img src="images/schelling.png" alt="Schelling diagrams for Harvest and Cleanup" width="953" height="352"/>

The above plot shows the empirical Schelling diagrams for both Cleanup (A) and Harvest (B) (from [[2]](https://arxiv.org/abs/1803.08884)). These diagrams show the payoff that an individual agent can expect if it follows a defecting/exploitative strategy (red) vs a cooperative strategy (blue), given the number of other agents that are cooperating. We can see that an individual agent can almost always greedily benefit from detecting, but the more agents that defect, the worse the outcomes for all agents.

## Relevant papers

1. Leibo, J. Z., Zambaldi, V., Lanctot, M., Marecki, J., & Graepel, T. (2017). [Multi-agent reinforcement learning in sequential social dilemmas](https://arxiv.org/abs/1702.03037). In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems (pp. 464-473).

2. Hughes, E., Leibo, J. Z., Phillips, M., Tuyls, K., Dueñez-Guzman, E., Castañeda, A. G., Dunning, I., Zhu, T., McKee, K., Koster, R., Tina Zhu, Roff, H., Graepel, T. (2018). [Inequity aversion improves cooperation in intertemporal social dilemmas](https://arxiv.org/abs/1803.08884). In Advances in Neural Information Processing Systems (pp. 3330-3340).

3. Jaques, N., Lazaridou, A., Hughes, E., Gulcehre, C., Ortega, P. A., Strouse, D. J., Leibo, J. Z. & de Freitas, N. (2018). [Intrinsic Social Motivation via Causal Influence in Multi-Agent RL](https://arxiv.org/abs/1810.08647). arXiv preprint arXiv:1810.08647.


# Setup instructions
Run `python setup.py develop`
Then, activate your environment by running `source activate causal`.

To then set up the branch of Ray on which we have built the causal influence code, clone the repo to your desired folder:
`git clone https://github.com/natashamjaques/ray.git`.

Next, go to the rllib folder:
` cd ray/python/ray/rllib ` and run the script `python setup-rllib-dev.py`. This will copy the rllib folder into the pip install of Ray and allow you to use the version of RLlib that is in your local folder by creating a softlink.

# Tests
Tests are located in the test folder and can be run individually or run by running `python -m pytest`. Many of the less obviously defined rules for the games can be understood by reading the tests, each of which outline some aspect of the game.

# Constructing new environments
Every environment that subclasses MapEnv probably needs to implement the following methods

```
def custom_reset(self):
"""Reset custom elements of the map. For example, spawn apples"""
pass
def custom_action(self, agent, action):
"""Execute any custom, non-move actions that may be defined, like fire or clean"""
pass
def custom_map_update(self):
"""Custom map updates that don't have to do with agent actions"""
pass
def setup_agents(self):
"""Construct all the agents for the environment"""
raise NotImplementedError
```
Empty file added __init__.py
Empty file.
42 changes: 42 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
FROM nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04

# Apt updates
RUN apt-get update \
&& apt-get install -y \
build-essential \
zlib1g-dev \
git \
libreadline-gplv2-dev \
libncursesw5-dev \
libssl-dev \
libsqlite3-dev \
tk-dev \
libgdbm-dev \
libc6-dev \
libbz2-dev \
tmux \
wget \
python3-tk

# Install python 3.6
RUN wget https://www.python.org/ftp/python/3.6.7/Python-3.6.7.tar.xz \
&& tar xvf Python-3.6.7.tar.xz \
&& cd Python-3.6.7 \
&& ./configure --with-zlib=/usr/include \
&& make \
&& make install \
&& ln -s /usr/local/bin/python3 /usr/local/bin/python \
&& ln -s /usr/local/bin/pip3 /usr/local/bin/pip

# Install project-specific python libraries
RUN pip install tensorflow-gpu==1.12.0 gym matplotlib opencv-python lz4 psutil flake8 ray

# Symlinking for making ray work
RUN rm -r /usr/local/lib/python3.6/site-packages/ray/rllib \
&& ln -s /ray/python/ray/rllib /usr/local/lib/python3.6/site-packages/ray/rllib \
&& rm -r /usr/local/lib/python3.6/site-packages/ray/tune \
&& ln -s /ray/python/ray/tune /usr/local/lib/python3.6/site-packages/ray/tune



WORKDIR /project
1 change: 1 addition & 0 deletions docker/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sudo docker build . -t multi-agent-empathy
34 changes: 34 additions & 0 deletions docker/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Docker environment

## Prerequisites
- docker-ce
- nvidia-docker

## Run from local build

### Building
```
sudo sh ./build.sh
```

Optional - push to docker hub
```
sudo docker tag multi-agent-empathy natashajaques/multi-agent-empathy
sudo docker push natashajaques/multi-agent-empathy
```

### Run

```
SEQ_SOC_PATH=/home/natasha/Developer/sequential_social_dilemma_games
RAY_PATH=/home/natasha/Developer/ray
RAY_RESULTS_PATH=/home/natasha/ray_results
sudo docker run --runtime=nvidia -v $SEQ_SOC_PATH:/project -v $RAY_PATH:/ray --rm multi-agent-empathy /bin/bash -c "python setup.py develop && python run_scripts/train_baseline_dqn_actions.py --use_gpu_for_driver --num_gpus=1"
```

## Run from Docker Hub
```
SEQ_SOC_PATH=/home/natasha/Developer/sequential_social_dilemma_games
RAY_PATH=/home/natasha/Developer/ray
sudo docker run --runtime=nvidia -v $SEQ_SOC_PATH:/project -v $RAY_PATH:/ray --rm natashajaques/multi-agent-empathy /bin/bash -c "python setup.py develop && python run_scripts/train_baseline_dqn_actions.py --use_gpu_for_driver --num_gpus=1"
```
16 changes: 16 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: causal

dependencies:
- python==3.6.8
- pip:
- numpy==1.16.0
- gym==0.10.9
- matplotlib==3.0.2
- opencv-python
- ray==0.6.4
- tensorflow==1.12.0
- scipy==1.2.0
- setproctitle
- psutil
- lz4
- boto3
Binary file added images/cleanup.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/harvest.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/schelling.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file added models/__init__.py
Empty file.
47 changes: 47 additions & 0 deletions models/conv_to_fc_net.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Model taken from https://arxiv.org/pdf/1810.08647.pdf,
# INTRINSIC SOCIAL MOTIVATION VIA CAUSAL
# INFLUENCE IN MULTI-AGENT RL


# model is a single convolutional layer with a kernel of size 3, stride of size 1, and 6 output
# channels. This is connected to two fully connected layers of size 32 each

import tensorflow as tf

from ray.rllib.models.misc import normc_initializer, flatten
from ray.rllib.models.model import Model
import tensorflow.contrib.slim as slim


class ConvToFCNet(Model):
def _build_layers_v2(self, input_dict, num_outputs, options):

inputs = input_dict["obs"]

hiddens = [32, 32]
with tf.name_scope("custom_net"):
inputs = slim.conv2d(
inputs,
6,
[3, 3],
1,
activation_fn=tf.nn.relu,
scope="conv")
last_layer = flatten(inputs)
i = 1
for size in hiddens:
label = "fc{}".format(i)
last_layer = slim.fully_connected(
last_layer,
size,
weights_initializer=normc_initializer(1.0),
activation_fn=tf.nn.relu,
scope=label)
i += 1
output = slim.fully_connected(
last_layer,
num_outputs,
weights_initializer=normc_initializer(0.01),
activation_fn=None,
scope="fc_out")
return output, last_layer
Loading

0 comments on commit 46bbdb6

Please sign in to comment.