Deliberation Networks: Sequence Generation Beyond One-Pass Decoding #2

pmadhyastha · 2018-12-04T13:46:52Z

Synopsis: The paper proposes a sequence-to-sequence model with an additional decoder. The '2nd pass' decoder is conditioned on both the information from the source, the global information from the '1st pass' decoder and attends over the source and the 1st pass decoder.
Paper contains some hand waving mumbo-jumbo of the processes by which humans understand/genreate language - through polishing / deliberation
The paper uses a concatenation of the hidden states of the decoder along with the prodcuced output target words as the information for the second pass decoder.
Getting both the decoder hidden states and decoder outputs require sampling output sequence - this is a challenging problem, so they approximate using an unbiased sample(s) and propose a 'generation lower bound'
It states that they use montecarlo sampling (and in experiments they mention beam size sampling). It is not clear what happens when there are n-samples (where n > 1).

pmadhyastha · 2018-12-06T19:35:25Z

The paper seemingly has a bug -> if the 2nd pass decoder is initialized with 1st pass decoder's parameters then most probably it has similar architecture. But the input to the 2nd pass contains ctx vectors with 2x originial dimensionality!!!

pmadhyastha self-assigned this Dec 4, 2018

pmadhyastha added the Notation Notational Problems label Dec 6, 2018

Provide feedback