In this tutorial, we will cover recurrent neural networks in detail and it’s implementation on a real-world dataset.
In the previous tutorial, we discuss about the convolutional neural network which feedforward neural network which takes a fixed size vector as input and generates fixed size outputs.
What makes Recurrent Networks so special?
As we have seen in Convolutional Neural Networks (also known as feedforward Neural Networks) is that they accept a fixed-sized vector as input (for example- an image) and produce a fixed-sized vector as output (for example- probabilities of different classes). In addition: These models perform this mapping using a fixed amount of computational steps (e.g. the number of layers in the model).
The main reason that recurrent nets are more amazing is that they allow us to operate over sequences of vectors: Sequences in the input, the output, or in the most general case both.
After reading this tutorial you will be able to answer all these questions.
What is Recurrent neural network
Suppose your senior chef has made a schedule for you. The menu’s are repeated after every third day. On the first day which is Monday you have to cook pizza, on the second day you have to make Omelette, on third day which is Wednesday you have to make chicken.
Suppose on any day you are not available. So on the next day, you have to recall what we cook on day before yesterday, so that’s way the sequence is mentioned.
There is one more situation in which you are unavailable for more than a week. So in that case you have to keep track of menu that you cook on last day (means you have to rollback over time to keep track of what you cook on that day).
In deep learning, we use a recurrent neural network to solve this problem. They consist of networks with loops in them, that will help the information to flow.
In a recurrent neural network, you will get the new information and you have to use previous information at t-1 days to make the prediction at time t.
Let us understand this thing mathematically
Here X0, X1, X2 are the three inputs at timestamps t0, t1, t2
If you want to calculate H0, y0 the equation is given as follows
Here wi is the weight matrix at the input layer, wy is the weight matrix at the output layer, h is the hidden state, y is the output and g is the activation function. The complete equation looks like this
H(t) = Activation function( input * weights + W * Hidden state at previous timestamp i.e at t-1)
y(t) = softmax(weight * Hidden state at timestamp t)
There are various activation function we can use like
- Sigmoid activation function
- Tanh activation function
- Rectified linear unit activation function
Some of the important points that you can note are
- U and V are weight vectors, different for every time step.
- We can even calculate the hidden layer( at all time steps ) first then calculate y values.
- Weight vectors are random initially.
Once Feed forwarding is done then we need to calculate the error and backpropagate the error using backpropagation.
Training a Recurrent Neural Network
RNN uses a back-propagation algorithm for training purposes. It is also called Back-propagation through time because the back-propagation algorithm is applied at every timestamp.
In backpropogation algorithm calculate the error with respect to all the weights in the network.
Implementation of RNN in PyTorch framework
RNN applications
Each block is a vector and arrows represent functions. The red arrows represent input vectors, and the blue & green arrows represent the output vector that holds the RNN’s state.
There are total 5 types of RNN are there:
1- The first one is having a fixed-size vector as input and a fixed-size vector as output like a convolutional neural network.
2- The second one takes images as input and outputs sentences of words. It is also called one to many model.
3- The third one in which we take input as a sentence and output it as positive or negative sentiment.
4- The fourth one is we take input as a sentence and in output another sentences like translate a sentence from english to french.
5- The fifth one is Synced sequence input and output (e.g. video classification where we wish to label each frame of the video).
Wrap up the Session
In this tutorial we will learned the following things like
- Difference between recurrent neural network and convolutional neural network
- What is a Recurrent neural network
- Mathematics behind RNN
- Then implementation of RNN in PyTorch framework
- Applications of Recurrent neural network
LSTMs were a big step in what we can accomplish with RNNs. It’s natural to wonder: is there another big step? A common opinion among researchers is: “Yes! There is a next step and it’s attention!” The idea is to let every step of an RNN pick information to look at from some larger collection of information. For example, if you are using an RNN to create a caption describing an image, it might pick a part of the image to look at for every word it outputs. In fact, Xu, et al.(2015) do exactly this – it might be a fun starting point if you want to explore attention! There’s been a number of really exciting results using attention, and it seems like a lot more are around the corner…