List of bugs in different games #262

marintoro · 2019-05-04T15:15:16Z

Hi, I have worked a lot with atari-py (I first tried to report this issue there openai/atari-py#41 but they told me this was likely to be coming from there) recently and discovered some bugs which I think can be damageable for the research. Here is the list I found :

Asterix : When my agent reach 999500 and take a last bonus which should lead him to 1000000 (the max score), it obtains a reward of -999500 and the game continue as usual (but the agent now got a total score of 0...). I think this issue can be seen on the score reported in Rainbow or Ape-X paper (the score goes up to 1M and then vary randomly around 500k).
Defender : On this game reward are really weird. I got a reward of 10 the first time step no matter what. And then all reward are multiplied by 100 (so I got a reward 15000 when the actual render of the score on the screen add 150 to the score). Moreover like in Asterix, my agent got a reward of -999000-ish when he got more than 1M score.
VideoPinball : Same than Asterix and Defender, agent receives a reward of -999000-ish when he reach 1M score.
BattleZone : Often my agent get stuck for ever for no reason. By stuck I mean than when this kind of state happen, even playing random action for 20 hours doesn't finish the game and the agent never receives reward different than 0. This is for me an issue particularly with algorithm relaying on replay memory, when this happen, the replay memory get filled with tons of useless transitions. I could report the random seed and the list of action leading to one of those state if needed (I am using sticky actions with probability 0.25).
Yar'sRevenge : Same than for BattleZone, sometimes the game get stuck for ever. But this happen way less often than in BattleZone though...

mgbellemare · 2019-05-07T20:38:44Z

Normally the standard is to stop training about 30 minutes of real-time play (i.e. 108,000 frames). This would deal with some of this, in particular the stuck-forever question. I'm not sure about BattleZone though -- what you're reporting shouldn't be happening if playing at random.

Re: agents "rolling the score", I agree this is problematic. When the ALE was designed we didn't foresee agents playing forever and this issue coming up. A fine solution is to stop the episode when the maximum score is reached. Effectively, when that happens we might as well declare victory.

However, there's an issue (as you point out) that if we make that fix, we also need to re-run published results and publicize the change. In particular, someone might think their agent performs better, when in fact they're online using "improved" results. Something like this happened in the 2017 distributional RL paper (see erratum).

A better solution might be to flag games where this happen, and report the outcome in published papers. For example, report mean score, removing 'solved' games from the equation.

nczempin · 2019-05-19T09:47:52Z

I don't think "never fix anything" is a long-term viable solution to the issue of reproducible results.

Published results need to specify the exact version they were using to get their results, then new publications can either use that exact same version to recreate the results (if they want direct comparability), and they can optionally do another run with a newer, better version of ALE (and check if there are any significant changes in results that are solely due to changes in ALE version).

artofbeinghuman · 2019-07-16T10:18:12Z

I opened an Issue in the OpenAI Gym, which actually stems from ALE.
Basically, in Frostbite, when the character dies from freezing on his last life, this life is not deducted and the game passes over into a "Demo Play" Mode, where the computer just plays by itself, not listening to external input, while never losing any lives. The game is then stuck in this mode indefinitely.

Another user commented:

It looks like gym just calls game_over which calls isTerminal on the environment here. This certainly looks like a bug, just not a bug in gym but in ALE.

dniku · 2019-07-26T17:52:22Z

Described in detail in the issue referenced above, there is a bug in Breakout: it is impossible to gain a score of >864 because after the field is cleared of bricks twice, they do not respawn for the third time.

For the record, here is a scatterplot of results achieved by a Baselines cnn ppo2 model trained for 50M steps and evaluation step limit of 50k (color indicates elapsed time):

13.44% of rewards are exactly 864. This suggests that all Breakout results in the literature are strongly underreported.

EDIT: according to Wikipedia, this is by design:

Once the second screen of bricks is destroyed, the ball in play harmlessly bounces off empty walls until the player restarts the game, as no additional screens are provided.

The score of 864 can be seen achieved here on a hardware Atari 2600.

This post also provides disassembly of the original game, showing the code which switches to the next level. It is basically if score == 432: refill_blocks().

Perhaps a modification to the original game could be discussed, so that strong models could be more directly compared to each other.

EDIT: here is a very dirty proof of concept for such modification: dniku/atari-py@fc1dc14 The idea is to reset the score from 864 to 432. This is what gameplay looks like. The agent ends up scoring 1669 in that video.

A similar scatterplot for the patched version (step limit is 30k here):

marintoro mentioned this issue May 20, 2019

Incredibly high score for Defender google/dopamine#104

Open

mgbellemare mentioned this issue May 24, 2019

ALE 0.6 openai/atari-py#49

Merged

christopherhesse mentioned this issue Jul 26, 2019

BreakoutNoFrameskip-v4 does not advance to 3rd level, capping score at 864 openai/gym#1618

Closed

JesseFarebro added bug game Issues regarding specific games labels May 16, 2021

JesseFarebro mentioned this issue Sep 15, 2021

Very high negative scores in some of Atari environments openai/gym#2233

Closed

brett-daley mentioned this issue Oct 13, 2021

Chopper Command does not always terminate when out of lives #434

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List of bugs in different games #262

List of bugs in different games #262

marintoro commented May 4, 2019 •

edited

Loading

mgbellemare commented May 7, 2019

nczempin commented May 19, 2019

artofbeinghuman commented Jul 16, 2019

dniku commented Jul 26, 2019 •

edited

Loading

List of bugs in different games #262

List of bugs in different games #262

Comments

marintoro commented May 4, 2019 • edited Loading

mgbellemare commented May 7, 2019

nczempin commented May 19, 2019

artofbeinghuman commented Jul 16, 2019

dniku commented Jul 26, 2019 • edited Loading

marintoro commented May 4, 2019 •

edited

Loading

dniku commented Jul 26, 2019 •

edited

Loading