Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of bugs in different games #262

Open
marintoro opened this issue May 4, 2019 · 4 comments
Open

List of bugs in different games #262

marintoro opened this issue May 4, 2019 · 4 comments
Labels
bug game Issues regarding specific games

Comments

@marintoro
Copy link

marintoro commented May 4, 2019

Hi, I have worked a lot with atari-py (I first tried to report this issue there openai/atari-py#41 but they told me this was likely to be coming from there) recently and discovered some bugs which I think can be damageable for the research. Here is the list I found :

  • Asterix : When my agent reach 999500 and take a last bonus which should lead him to 1000000 (the max score), it obtains a reward of -999500 and the game continue as usual (but the agent now got a total score of 0...). I think this issue can be seen on the score reported in Rainbow or Ape-X paper (the score goes up to 1M and then vary randomly around 500k).

  • Defender : On this game reward are really weird. I got a reward of 10 the first time step no matter what. And then all reward are multiplied by 100 (so I got a reward 15000 when the actual render of the score on the screen add 150 to the score). Moreover like in Asterix, my agent got a reward of -999000-ish when he got more than 1M score.

  • VideoPinball : Same than Asterix and Defender, agent receives a reward of -999000-ish when he reach 1M score.

  • BattleZone : Often my agent get stuck for ever for no reason. By stuck I mean than when this kind of state happen, even playing random action for 20 hours doesn't finish the game and the agent never receives reward different than 0. This is for me an issue particularly with algorithm relaying on replay memory, when this happen, the replay memory get filled with tons of useless transitions. I could report the random seed and the list of action leading to one of those state if needed (I am using sticky actions with probability 0.25).

  • Yar'sRevenge : Same than for BattleZone, sometimes the game get stuck for ever. But this happen way less often than in BattleZone though...

@mgbellemare
Copy link
Contributor

Normally the standard is to stop training about 30 minutes of real-time play (i.e. 108,000 frames). This would deal with some of this, in particular the stuck-forever question. I'm not sure about BattleZone though -- what you're reporting shouldn't be happening if playing at random.

Re: agents "rolling the score", I agree this is problematic. When the ALE was designed we didn't foresee agents playing forever and this issue coming up. A fine solution is to stop the episode when the maximum score is reached. Effectively, when that happens we might as well declare victory.

However, there's an issue (as you point out) that if we make that fix, we also need to re-run published results and publicize the change. In particular, someone might think their agent performs better, when in fact they're online using "improved" results. Something like this happened in the 2017 distributional RL paper (see erratum).

A better solution might be to flag games where this happen, and report the outcome in published papers. For example, report mean score, removing 'solved' games from the equation.

@nczempin
Copy link
Contributor

I don't think "never fix anything" is a long-term viable solution to the issue of reproducible results.

Published results need to specify the exact version they were using to get their results, then new publications can either use that exact same version to recreate the results (if they want direct comparability), and they can optionally do another run with a newer, better version of ALE (and check if there are any significant changes in results that are solely due to changes in ALE version).

@artofbeinghuman
Copy link

I opened an Issue in the OpenAI Gym, which actually stems from ALE.
Basically, in Frostbite, when the character dies from freezing on his last life, this life is not deducted and the game passes over into a "Demo Play" Mode, where the computer just plays by itself, not listening to external input, while never losing any lives. The game is then stuck in this mode indefinitely.

Another user commented:

It looks like gym just calls game_over which calls isTerminal on the environment here. This certainly looks like a bug, just not a bug in gym but in ALE.

@dniku
Copy link

dniku commented Jul 26, 2019

Described in detail in the issue referenced above, there is a bug in Breakout: it is impossible to gain a score of >864 because after the field is cleared of bricks twice, they do not respawn for the third time.

For the record, here is a scatterplot of results achieved by a Baselines cnn ppo2 model trained for 50M steps and evaluation step limit of 50k (color indicates elapsed time):

image

13.44% of rewards are exactly 864. This suggests that all Breakout results in the literature are strongly underreported.

EDIT: according to Wikipedia, this is by design:

Once the second screen of bricks is destroyed, the ball in play harmlessly bounces off empty walls until the player restarts the game, as no additional screens are provided.

The score of 864 can be seen achieved here on a hardware Atari 2600.

This post also provides disassembly of the original game, showing the code which switches to the next level. It is basically if score == 432: refill_blocks().

Perhaps a modification to the original game could be discussed, so that strong models could be more directly compared to each other.

EDIT: here is a very dirty proof of concept for such modification: dniku/atari-py@fc1dc14 The idea is to reset the score from 864 to 432. This is what gameplay looks like. The agent ends up scoring 1669 in that video.

A similar scatterplot for the patched version (step limit is 30k here):

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug game Issues regarding specific games
Projects
None yet
Development

No branches or pull requests

6 participants