Scheduled sampling for sequence prediction with recurrent neural networks
Photo by
- traditional seq2seq model is trained by teacher-forcing, which is totally different during inference.
- In order to optimize the test score (i.e., BLEU or word-error-rate), they propose to sampling $y_t$ during training to minimise the gap between training and testing.
- the sampling function they propose is $\exp$ family that decreased with time, other forms of sampling strategy is not tried. (important!)
References
- Scheduled sampling for sequence prediction with recurrent neural networks.