-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial bias towards action=1? #7
Comments
During the initial stage of the training, the agent is simply perform random exploration...The network should able to learn "don't flap too much" after training a while.. |
@yanpanlau if the initial exploration is random, i would expect both actions to be equally likely initially? that isn't the case. |
How long is "a while"? |
I just re-test it and it should converge after like 100,000 steps. Can you try with the latest code? |
Same issue here as @wobeert described. It didn't change even after 622000 steps. See below |
I fixed it! I introduced a bug by mistake to the original code when I was creating multigpu version for gpu keras. Thanks! |
why does the network start with such a strong bias towards trying action 1 every timestep?
i only occasionally see action=0.
i looks like it would be difficult to break out of this pattern since it receives reward = 0.1 for it before encountering the first pipe-gate..
The text was updated successfully, but these errors were encountered: