one sample:
\[x_i \to [y_i^0,\cdots,y_{i}^{k}]\]where \(y_i^0\) are true labeled words , and \(y_i^1,\cdots,y_i^{k}\) are noise samples word index, which is generated by unigram distribution \(q(w)\) of the dataset.
- the probability of true data:
- the noise sample probability:
- the cost function of this sample:
- the overall cost function of the dataset:
References
- Noise-Contrastive Estimation of Unnormalized Statistical Models with Applications to Natural Image Statistics
- Word2vec Parameter Learning Explained
- Efficient Estimation of Word Representation in Vector Space
- Distributed Representations of Words and Phrases and their Compositionality
- Notes on Noise Contrastive Estimation and Negative Sampling