You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 24, 2023. It is now read-only.
At the moment the location tensor l_t is never detached from the computational graph in spite of both being produced by and 'consumed' by trainable modules. As far as I understand the code this enables the gradients to 'backpropagate through time' in a way that the authors of RAM did not intend: the gradients that originated in the action_network and reached the fc2 layer inside the glimpse network would travel back to the previous timestep's location_network and alter its weights and only stop once they reach the detached RNN memory vector h_t. As far as I understand the authors intended the location_network to only be trained using reinforcement learning.
This could be a bug or it could be an accidental improvement to the network; either way please let me know if my understanding is correct in here as I am still learning Pytorch and my project is heavily reliant on your code :)
The text was updated successfully, but these errors were encountered:
Note aside from stop at h_t, the gradient originated from action_network will continue recursively through g_t in core_network to modify all previous time l_t. Meanwhile, I wonder why location_network and baseline_network have to detach from h_t? Anywhere in the paper suggested core_network is only trained via classification loss? @Pozimek@yxiao54
At the moment the location tensor l_t is never detached from the computational graph in spite of both being produced by and 'consumed' by trainable modules. As far as I understand the code this enables the gradients to 'backpropagate through time' in a way that the authors of RAM did not intend: the gradients that originated in the action_network and reached the fc2 layer inside the glimpse network would travel back to the previous timestep's location_network and alter its weights and only stop once they reach the detached RNN memory vector h_t. As far as I understand the authors intended the location_network to only be trained using reinforcement learning.
This could be a bug or it could be an accidental improvement to the network; either way please let me know if my understanding is correct in here as I am still learning Pytorch and my project is heavily reliant on your code :)
The text was updated successfully, but these errors were encountered: