The main feature of an RNN is its hidden state, which captures some information about a sequence. the same task for every element of a sequence has a “memory” shares the same parameters (U, V, W) across all steps
- DNN: all inputs (and outputs) are independent of each other
- RNN: the output being depended on the previous computations
y = model.next_step(x, R)
- y - output
- x - input
- R - memory
have a different way of computing the hidden state R
are much better at capturing long-term dependencies than vanilla RNNs are
is a method to calculate the gradient of the loss function. used, for example, in the gradient descent algorithm