In this tutorial, you will learn how to perform video classification using FastAI, Python, and Deep Learning.
Video is a combination of multiple frames so the normal approach to solve this problem is:
- Loop over all frames in the video file and convert them in images.
- For each frame, pass the frame through the CNN
- Classify each frame separately
- Choose the label with the maximum probability
- Mark the frame and write the output frame to disk
Introduction
Let us learn about what is FastAI
FastAI is a Deep Learning library that is built on the top of Pytorch. There are freely available tutorials/courses for FastAI. I am also currently enrolled in ,Practical Deep Learning for Coders course.
In this course, there are a series of tutorials from image classification to neural style transfer. The main benefit of learning this course is you can build a model in just a few lines of code.
Videos are often understood as a series of individual images; and thus, many deep learning practitioners would be quick to treat video classification as performing image classification a complete of N times, where N is the total number of frames in a video.
There’s a problem with that approach though.
Video classification is just a similar version of image classification — with video, we will typically make the idea that subsequent frames during a video are correlated with reference to their semantic contents.
In this article, we’ll find out how to use FastAI to figure through a computer vision example. After this article you will know how to perform the following steps:
1- Download the video data
2- Convert the video data into frames of images using ffmpeg library
3- Load the data and view it
4- Creating a model and initial training
5- Interpret the results
6- Make predictions on the video data
7- Summary
Installation
The FastAI environment can be setup by either using conda or pip.
conda install -c pytorch pytorch-cpu torchvision
conda install -c fastai fastaiorpip install http://download.pytorch.org/whl/cpu/torch-1.0.0-cp36-cp36m-linux_x86_64.whl
pip install fastai
For more information about the installation visit the official guide.
If you setup everything correctly you can check the version of FastAI.
You have to also install ffmpeg library to convert the video into frames by using pip command
from fastai.vision import *
pip install ffmpeg
Download the video data
The dataset name is Indian Sign language dataset. You can download this data and code in summary section.
To save time, computational resources, and to demonstrate the actual video classification algorithm (the actual point of this tutorial), we’ll be training on a subset of the sign language type dataset:
1- A
2- B
3- C
Convert the video data into frames
To convert the video data into frames/images we use ffmpeg library.
The procedure are as follows
1- We loop over through all the video and by using ffmpeg we convert them into images.
2- We make train and test folder and put the images on to that.
Load the data and view it
FastAI has specific data objects called databunches which are needed to train a classification model. These databunches are often created in two main ways.
The first way is to use problem-specific methods just like the ImageDataBunch.from_folder which may be wont to load data that has the subsequent structure.
Project Structure- It will look like that
data-dir
-Train
-Class1
-Class2
-...
-Valid
-Class1
-Class2
-...
Due to FastAI library loading the data is very easy. We can load the data through ImageDataBunch.from_folder or we can use another way to loading our data using data block api. Now we explore the data.
np.random.seed(42)
data=ImageDataBunch.from_folder(path, train='.',valid_pct=0.2 ,
ds_tfms=get_transforms(),size=224, num_workers=4).normalize(imagenet_stats)
data.classes
#Out: ['A', 'B', 'C']
Now we display the random batch of images that is present in our dataset
data.show_batch(rows=3, figsize=(7, 8))
Creating a model and initial training
The FastAI library is meant to allow you to create models (FastAi calls them learners) with only a couple of lines of code. They provide a method called create_cnn, which can be used to create a convolutional neural network.
The method needs two arguments, the data, and the architecture, but also supports many other parameters that can be used to customize the model for a given problem.
from fastai.metrics import error_rate# 1 - accuracy
learn=create_cnn(data, models.resnet34, metrics=error_rate)
The model is using resnet34 architecture, with weights pretrained on the imagenet dataset.
Now we make sure that we are using the gpu.
To now train the model using differential learning rates we’d like to pass the max_lr argument to the fit_one_cycle method.
defaults.device=torch.device('cuda') # makes sure the gpu is used
learn.fit_one_cycle(4)
As the data is perfectly clear we are getting an accuracy of more than 99%.
Now that the fully-connected layers are well trained we can unfreeze the other layers and train the whole network.
As mentioned at the beginning of the article, FastAI provides another technic to reinforce transfer learning called differential learning rates, which allows us to line different learning rates for various parts in the network.
To find the perfect learning rates we can use the lr_find. After that model can now be saved using the save method.
learn.unfreeze() # must be done before calling lr_find
learn.lr_find()
learn.save('model-I')
Interpret the results
Lastly, we can use FastAIs Classification Interpretation class to interpret our results.
To create an interpretation object we’d like to call the from_learner method and pass it our learner/model. Then we will use methods like plot_confusion_matrix, plot_top_losses or most_confused.
Plot confusion matrix:
interp =ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()
Figure 9: Confusion matrix
Make predictions on video data
Now we are going to make prediction on our video.
So the first step is to load the model
model=learn.load('model-1')
And then define video path
video_path='video/A.mp4'
After that capture the frames from video and make the prediction
cap = cv2.VideoCapture(video_path)
_, frame= cap.read()
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
img_t = pil2tensor(frame, np.float32)
img_t.div_(255.0)
image = Image(img_t)
pred = model.predict(image)
Out: [‘A’: 0.9988]
Wrap up the Session
In this blog, we have learned about how to perform video classification using FastAI library.
You can download the code and the dataset using this ,,link.
You’ve come a long way in understanding one of the most important areas of deep learning! If you have questions or comments, then please put them in the comments section below.
So if you like this blog post, please like it and subscribe to our data spoof community to get real-time updates. You can follow our Facebook page to get notifications whenever we upload any post so you can never miss any update from us.