It makes that short-term memory last for a long time.
1. Three Types of Gate
- Input Gate: Controls how much of the current input \(x_t\) and the previous output \(h_{t-1}\) will enter into the new cell.
- Forget Gate: Decide whether to erase (set to zero) or keep individual components of the memory.
- Cell Update: Transforms the input and previous state to be taken into account into the current state.
- Output Gate: Scales the output from the cell.
- Internal State update: Computes the current timestep’s state using the gated previous state and the gated input.
- Hidden Layer: Output of the LSTM scaled by a \(\tanh\) (squashed) transformations of the current state.
where “\(\cdot\)” denotes element-wise matrix multiplication, \(\phi(x)=\tanh(x),\sigma(x)=sigmoid(x)\)
\[\phi(x)=\frac{e^x-e^{-x}}{e^x+e^{-x}},\sigma(x)=\frac{1}{1+e^{-x}}\]2. Paralleled Computing
input gate, forget gate, cell update, output gate can be computed in parallel.
\[\begin{bmatrix} i^t\\ f^t\\g^t\\o^t \end{bmatrix} =\begin{bmatrix}\sigma\\ \sigma\\\phi\\\sigma\end{bmatrix}\times W\times[x^t,h^{t-1}]\]3. LSTM network for Semantic Analysis
-
Model Architecture: LSTM layer –> Averaging Pooling –> Logistic Regession
-
Input sequence: \(x_0,x_1,x_2,\cdots,x_n\)
-
representation sequence: \(h_0,h_1,h_2,\cdots,h_n\)
This representation sequence is then averaged over all timesteps resulting in representation h:
\[h=\sum\limits_i^n{h_i}\]