You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Synopsis: The paper proposes a sequence-to-sequence model with an additional decoder. The '2nd pass' decoder is conditioned on both the information from the source, the global information from the '1st pass' decoder and attends over the source and the 1st pass decoder.
Paper contains some hand waving mumbo-jumbo of the processes by which humans understand/genreate language - through polishing / deliberation
The paper uses a concatenation of the hidden states of the decoder along with the prodcuced output target words as the information for the second pass decoder.
Getting both the decoder hidden states and decoder outputs require sampling output sequence - this is a challenging problem, so they approximate using an unbiased sample(s) and propose a 'generation lower bound'
It states that they use montecarlo sampling (and in experiments they mention beam size sampling). It is not clear what happens when there are n-samples (where n > 1).
The text was updated successfully, but these errors were encountered:
The paper seemingly has a bug -> if the 2nd pass decoder is initialized with 1st pass decoder's parameters then most probably it has similar architecture. But the input to the 2nd pass contains ctx vectors with 2x originial dimensionality!!!
Paper
Synopsis: The paper proposes a sequence-to-sequence model with an additional decoder. The '2nd pass' decoder is conditioned on both the information from the source, the global information from the '1st pass' decoder and attends over the source and the 1st pass decoder.
Paper contains some hand waving mumbo-jumbo of the processes by which humans understand/genreate language - through polishing / deliberation
The paper uses a concatenation of the hidden states of the decoder along with the prodcuced output target words as the information for the second pass decoder.
Getting both the decoder hidden states and decoder outputs require sampling output sequence - this is a challenging problem, so they approximate using an unbiased sample(s) and propose a 'generation lower bound'
It states that they use montecarlo sampling (and in experiments they mention beam size sampling). It is not clear what happens when there are n-samples (where n > 1).
The text was updated successfully, but these errors were encountered: