Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using MANN in A3C (reinforcement learning model) #21

Open
TheMnBN opened this issue Jan 16, 2019 · 3 comments
Open

Using MANN in A3C (reinforcement learning model) #21

TheMnBN opened this issue Jan 16, 2019 · 3 comments

Comments

@TheMnBN
Copy link

TheMnBN commented Jan 16, 2019

Hi there,

I'm trying to integrate a memory network into an A3C agent. For reference, I followed closely this implementation of A3C: https://github.com/awjuliani/DeepRL-Agents/blob/master/A3C-Doom.ipynb

My aim is to replace the LSTM layer with a MANN module. This might be a far-fetched question but do you have any advice for me when refactoring your MANN implementation for my particular purpose?

@snowkylin
Copy link
Owner

Generally speaking, MANN is not so easy to get converged as other RNN models are, and a blind combination can result in severe instability of training. I take a lot of time to finally get it converged on the omniglot dataset demonstrated in the original paper. So please prepare enough time and patience, and you may need to adjust the model to fit your task. Good luck!

@TheMnBN
Copy link
Author

TheMnBN commented Jan 16, 2019

Thanks so much for replying!
You're absolutely correct. RL by itself can already go horribly wrong under various (and usually unknown) circumstances. I couldn't find any working implementation of memory-augmented RL models (open-sourced or from authors of original papers) so I have to do it myself. Naively combining memory net to RL is not technically a well-motivated approach but I'm still implementing it as baseline for my research.

If you don't mind keeping this issue thread open, I would like to continue this discussion here.

@TheMnBN
Copy link
Author

TheMnBN commented Jan 19, 2019

I have 1 operation tf.nn.dynamic_rnn in my computation graph. I'm thinking of replacing that op with tf.while_loop whose body is the MANN operations. Do you think this approach makes sense?
I'm aware that you used 'for' loop in your model so I will try both and see which works. Either way, I need to find a way to terminate the loop, i.e. define a condition for tf.while_loop or a sequence length for 'for' loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants