Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do the program only use two state? #4

Open
guotong1988 opened this issue Mar 7, 2017 · 4 comments
Open

Why do the program only use two state? #4

guotong1988 opened this issue Mar 7, 2017 · 4 comments

Comments

@guotong1988
Copy link

guotong1988 commented Mar 7, 2017

I read from here.
Why do the program only use the current state and the next state?
Why only using the two state can work?
Thank you @songrotek

@guotong1988
Copy link
Author

反过来想,为什么不只用1个state呢,而用了2个state

@guotong1988
Copy link
Author

关键这两个state是紧挨着的,
就是说第二个state有情况,是前若干步决定的啊

@saselovejulie
Copy link

执行前的画面, 执行的动作, reward, 执行后的画面, terminal. 这5个元素组成一个训练集.
http://blog.csdn.net/songrotek/article/details/50580904 这个里面写了这个这个算法的要素, 我也不是很清楚. 可以一起探讨下

@saselovejulie
Copy link

saselovejulie commented Sep 7, 2017

@guotong1988 我看代码是这样的, 每次执行操作获得一帧画面.
currentState = [画面1, 画面2, 画面3, 画面4]
newState = np.append(self.currentState[:,:,1:],nextObservation,axis = 2)
执行完的newState = [画面2, 画面3, 画面4, 画面5]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants