Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BreakoutNoFrameskip-v4 does not advance to 3rd level, capping score at 864 #1618

Closed
dniku opened this issue Jul 25, 2019 · 6 comments
Closed
Assignees

Comments

@dniku
Copy link
Contributor

dniku commented Jul 25, 2019

This is a reopening of #309, as requested in that issue.

BreakoutNoFrameskip-v4 does not start a new level after all bricks are cleared twice. I was able to reproduce this with a well-trained cnn ppo2 Baselines model, although it seems that any model that can achieve a score of 864 will do (I have never seen a score of 864 exceeded).

Links:

  • reproducing script
  • model (download it as reproduce_gym_309.pkl and place next to the script)
  • gameplay video (the score of 864 is achieved at 03:49, and then it's just the paddle moving around trying to prevent the ball from falling down on an empty screen)
$ uname -srv
Linux 4.19.59-1-MANJARO #1 SMP PREEMPT Mon Jul 15 18:23:58 UTC 2019
$ python --version
Python 3.7.3

I ran all experiments in a virtualenv. Here are the commands that I executed to reproduce the issue:

virtualenv .env
source .env/bin/activate
pip install tensorflow-gpu gym[atari]
pip install git+https://github.com/openai/baselines.git
python reproduce_gym_309.py reproduce_gym_309.pkl

The script that I am providing simply loads the model and runs it, collecting gameplay frames, until the episode ends with a score of 864. Then it dumps the frames to a video file.

The output for me is (omitting log messages from Tensorflow and tqdm progress bar):

finished episode with reward=436.0, length=5799, elapsed_time=17.346772
finished episode with reward=735.0, length=4392, elapsed_time=30.163985
finished episode with reward=864.0, length=9447, elapsed_time=57.152439

pip list from virtualenv:

$ pip list 
Package              Version 
-------------------- --------
absl-py              0.7.1   
astor                0.8.0   
atari-py             0.2.6   
baselines            0.1.6   
Click                7.0     
cloudpickle          1.2.1   
future               0.17.1  
gast                 0.2.2   
google-pasta         0.1.7   
grpcio               1.22.0  
gym                  0.13.1  
h5py                 2.9.0   
joblib               0.13.2  
Keras-Applications   1.0.8   
Keras-Preprocessing  1.1.0   
Markdown             3.1.1   
numpy                1.16.4  
opencv-python        4.1.0.25
Pillow               6.1.0   
pip                  19.2.1  
protobuf             3.9.0   
pyglet               1.3.2   
scipy                1.3.0   
setuptools           41.0.1  
six                  1.12.0  
tensorboard          1.14.0  
tensorflow-estimator 1.14.0  
tensorflow-gpu       1.14.0  
termcolor            1.1.0   
tqdm                 4.32.2  
Werkzeug             0.15.5  
wheel                0.33.4  
wrapt                1.11.2
@dniku
Copy link
Contributor Author

dniku commented Jul 26, 2019

@ludwigschubert could you take a look maybe?

@dniku
Copy link
Contributor Author

dniku commented Jul 26, 2019

For the record, here is the list of actions taken by the model (one action per line). Each action should be passed to the envs in a single-element list because envs are wrapped in DummyVecEnv:

with args.load_path.open('r') as fp:
    for action in tqdm(fp, postfix='playing'):
        obs, reward, done, infos = eval_envs.step([action])
        # ...

@christopherhesse
Copy link
Contributor

Thanks for reopening this issue and providing more details! From looking at this video, I believe this is a bug in the game itself, and it looks like these sorts of bugs are being tracked on this issue: Farama-Foundation/Arcade-Learning-Environment#262 Could you post a comment there linking to this? I will likely close this issue later because this is a bug in ALE, not gym itself.

@christopherhesse
Copy link
Contributor

Closing this issue as mentioned earlier, thanks for all the information, but it looks like this is up to ALE to decide what to do here.

@dniku
Copy link
Contributor Author

dniku commented Jul 30, 2019

According to Wikipedia, this is by design:

Once the second screen of bricks is destroyed, the ball in play harmlessly bounces off empty walls until the player restarts the game, as no additional screens are provided.

The score of 864 can be seen achieved here on a hardware Atari 2600.

This post also provides disassembly of the original game, showing the code which switches to the next level. It is basically if score == 432: refill_blocks().

@dniku
Copy link
Contributor Author

dniku commented Aug 2, 2019

Here is a super dirty proof of concept of how to make Breakout infinite: dniku/atari-py@fc1dc14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants