-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List of bugs in different games #262
Comments
Normally the standard is to stop training about 30 minutes of real-time play (i.e. 108,000 frames). This would deal with some of this, in particular the stuck-forever question. I'm not sure about BattleZone though -- what you're reporting shouldn't be happening if playing at random. Re: agents "rolling the score", I agree this is problematic. When the ALE was designed we didn't foresee agents playing forever and this issue coming up. A fine solution is to stop the episode when the maximum score is reached. Effectively, when that happens we might as well declare victory. However, there's an issue (as you point out) that if we make that fix, we also need to re-run published results and publicize the change. In particular, someone might think their agent performs better, when in fact they're online using "improved" results. Something like this happened in the 2017 distributional RL paper (see erratum). A better solution might be to flag games where this happen, and report the outcome in published papers. For example, report mean score, removing 'solved' games from the equation. |
I don't think "never fix anything" is a long-term viable solution to the issue of reproducible results. Published results need to specify the exact version they were using to get their results, then new publications can either use that exact same version to recreate the results (if they want direct comparability), and they can optionally do another run with a newer, better version of ALE (and check if there are any significant changes in results that are solely due to changes in ALE version). |
I opened an Issue in the OpenAI Gym, which actually stems from ALE. Another user commented:
|
Described in detail in the issue referenced above, there is a bug in Breakout: it is impossible to gain a score of >864 because after the field is cleared of bricks twice, they do not respawn for the third time. For the record, here is a scatterplot of results achieved by a Baselines cnn ppo2 model trained for 50M steps and evaluation step limit of 50k (color indicates elapsed time): 13.44% of rewards are exactly 864. This suggests that all Breakout results in the literature are strongly underreported. EDIT: according to Wikipedia, this is by design:
The score of 864 can be seen achieved here on a hardware Atari 2600. This post also provides disassembly of the original game, showing the code which switches to the next level. It is basically Perhaps a modification to the original game could be discussed, so that strong models could be more directly compared to each other. EDIT: here is a very dirty proof of concept for such modification: dniku/atari-py@fc1dc14 The idea is to reset the score from 864 to 432. This is what gameplay looks like. The agent ends up scoring 1669 in that video. A similar scatterplot for the patched version (step limit is 30k here): |
Hi, I have worked a lot with atari-py (I first tried to report this issue there openai/atari-py#41 but they told me this was likely to be coming from there) recently and discovered some bugs which I think can be damageable for the research. Here is the list I found :
Asterix : When my agent reach 999500 and take a last bonus which should lead him to 1000000 (the max score), it obtains a reward of -999500 and the game continue as usual (but the agent now got a total score of 0...). I think this issue can be seen on the score reported in Rainbow or Ape-X paper (the score goes up to 1M and then vary randomly around 500k).
Defender : On this game reward are really weird. I got a reward of 10 the first time step no matter what. And then all reward are multiplied by 100 (so I got a reward 15000 when the actual render of the score on the screen add 150 to the score). Moreover like in Asterix, my agent got a reward of -999000-ish when he got more than 1M score.
VideoPinball : Same than Asterix and Defender, agent receives a reward of -999000-ish when he reach 1M score.
BattleZone : Often my agent get stuck for ever for no reason. By stuck I mean than when this kind of state happen, even playing random action for 20 hours doesn't finish the game and the agent never receives reward different than 0. This is for me an issue particularly with algorithm relaying on replay memory, when this happen, the replay memory get filled with tons of useless transitions. I could report the random seed and the list of action leading to one of those state if needed (I am using sticky actions with probability 0.25).
Yar'sRevenge : Same than for BattleZone, sometimes the game get stuck for ever. But this happen way less often than in BattleZone though...
The text was updated successfully, but these errors were encountered: