Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Issue 1 #2

Open
RyanRizzo96 opened this issue Feb 23, 2020 · 1 comment
Open

Question about Issue 1 #2

RyanRizzo96 opened this issue Feb 23, 2020 · 1 comment

Comments

@RyanRizzo96
Copy link

RyanRizzo96 commented Feb 23, 2020

I am struggling to understand your reasoning here:

Issue - The paper states that the number of sequences of actions should be 2^N. But I could only find the one sequence of right actions and N other sequences that terminate by the wrong action and the number of transitions in the replay memory to be (N(N+1)/2 + N)

Can you show how this holds for a simlpe case such as N = 3?

Here is mine:

IMG_AD65C9CCFDD9-1

This will form our replay memory. In total, there will be (N*(N+1)/2 + N) transitions in the list.

This also doesn't match what the paper reports. According to the paper:

The replay memory contains all therelevant experience (the total number of transitions is 2^(n+1) - 2)

In the paper they show that returing from state N to state 1 can either give a reward of 1 (green arrow) or 0 (dashed red arrow). How did you decide to implement this?

image

@gxywy
Copy link

gxywy commented Jul 4, 2020

image
My result also doesn't match the paper.
I think there should be N+1 sequences and N(N+1)/2 transitions in the replay memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants