Question about Issue 1 #2

RyanRizzo96 · 2020-02-23T17:32:41Z

I am struggling to understand your reasoning here:

Issue - The paper states that the number of sequences of actions should be 2^N. But I could only find the one sequence of right actions and N other sequences that terminate by the wrong action and the number of transitions in the replay memory to be (N(N+1)/2 + N)

Can you show how this holds for a simlpe case such as N = 3?

Here is mine:

This will form our replay memory. In total, there will be (N*(N+1)/2 + N) transitions in the list.

This also doesn't match what the paper reports. According to the paper:

The replay memory contains all therelevant experience (the total number of transitions is 2^(n+1) - 2)

In the paper they show that returing from state N to state 1 can either give a reward of 1 (green arrow) or 0 (dashed red arrow). How did you decide to implement this?

gxywy · 2020-07-04T10:52:38Z

My result also doesn't match the paper.
I think there should be N+1 sequences and N(N+1)/2 transitions in the replay memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Issue 1 #2

Question about Issue 1 #2

RyanRizzo96 commented Feb 23, 2020 •

edited

Loading

gxywy commented Jul 4, 2020

Question about Issue 1 #2

Question about Issue 1 #2

Comments

RyanRizzo96 commented Feb 23, 2020 • edited Loading

gxywy commented Jul 4, 2020

RyanRizzo96 commented Feb 23, 2020 •

edited

Loading