The model
- the log-bilinear (LBL) model is perhaps the simplest neural language model.
- Given the previous context words \(\{w_1,\cdots,w_{n-1}\}\), the LBL model first predicts the representation for the next word \(w_n\) by linearly combining the representations of the context words:
where, \(r_w\) is the real-valued vector representing word \(w\).
- Then the distribution for the next word is computed based on the similarity between the predicted representation and the representations of all words in the vocabulary:
why it is called bilinear
?
- \(\tilde r\) is the combination of the previous word embeddings
- \(r_w\) is the embedding of word $w$.
- the multiplication of them \(\tilde r^\top r_w\) is called bilinear map. As such, there only exists one embedding looking-up table.
Bilinear map
In mathematics, a bilinear map
is a function combining elements of two vector spaces to yield an element of a third vector space, and is linear in each of its arguments. Matrix multiplication is an example.
A bilinear function (see https://en.wikipedia.org/wiki/Bilinear_map ) is a function which is linear in each variable when all other variables are fixed. For instance: