You are currently viewing Video classification with FastAI and Deep Learning

Video classification with FastAI and Deep Learning

Loading

In this tutorial, you will learn how to perform video classification using FastAI, Python, and Deep Learning.

Video is a combination of multiple frames so the normal approach to solve this problem is:

  1. Loop over all frames in the video file and convert them in images.
  2. For each frame, pass the frame through the CNN
  3. Classify each frame separately
  4. Choose the label with the maximum probability
  5. Mark the frame and write the output frame to disk

Introduction

Let us learn about what is FastAI

FastAI is a Deep Learning library that is built on the top of Pytorch. There are freely available tutorials/courses for FastAI. I am also currently enrolled in ,Practical Deep Learning for Coders course.

In this course, there are a series of tutorials from image classification to neural style transfer. The main benefit of learning this course is you can build a model in just a few lines of code.

Videos are often understood as a series of individual images; and thus, many deep learning practitioners would be quick to treat video classification as performing image classification a complete of N times, where N is the total number of frames in a video.

There’s a problem with that approach though.

Video classification is just a similar version of image classification — with video, we will typically make the idea that subsequent frames during a video are correlated with reference to their semantic contents.

In this article, we’ll find out how to use FastAI to figure through a computer vision example. After this article you will know how to perform the following steps:

1- Download the video data

2- Convert the video data into frames of images using ffmpeg library

3- Load the data and view it

4- Creating a model and initial training

5- Interpret the results

6- Make predictions on the video data

7- Summary

Installation

The FastAI environment can be setup by either using conda or pip.

conda install -c pytorch pytorch-cpu torchvision

conda install -c fastai fastaiorpip install http://download.pytorch.org/whl/cpu/torch-1.0.0-cp36-cp36m-linux_x86_64.whl

pip install fastai

For more information about the installation visit the official guide.

If you setup everything correctly you can check the version of FastAI.

You have to also install ffmpeg library to convert the video into frames by using pip command

from fastai.vision import *

pip install ffmpeg

Download the video data

The dataset name is Indian Sign language dataset. You can download this data and code in summary section.

To save time, computational resources, and to demonstrate the actual video classification algorithm (the actual point of this tutorial), we’ll be training on a subset of the sign language type dataset:

1- A

2- B

3- C

Convert the video data into frames

To convert the video data into frames/images we use ffmpeg library.

The procedure are as follows

1- We loop over through all the video and by using ffmpeg we convert them into images.

2- We make train and test folder and put the images on to that.

Load the data and view it

FastAI has specific data objects called databunches which are needed to train a classification model. These databunches are often created in two main ways.

The first way is to use problem-specific methods just like the ImageDataBunch.from_folder which may be wont to load data that has the subsequent structure.

Project Structure- It will look like that

data-dir
  -Train
     -Class1
     -Class2
     -...
   -Valid
     -Class1
     -Class2
     -...

Due to FastAI library loading the data is very easy. We can load the data through ImageDataBunch.from_folder or we can use another way to loading our data using data block api. Now we explore the data.

np.random.seed(42) 
data=ImageDataBunch.from_folder(path, train='.',valid_pct=0.2 ,
                                ds_tfms=get_transforms(),size=224, num_workers=4).normalize(imagenet_stats)

data.classes

#Out: ['A', 'B', 'C']

Now we display the random batch of images that is present in our dataset

data.show_batch(rows=3, figsize=(7, 8))
Batch of images

Creating a model and initial training

The FastAI library is meant to allow you to create models (FastAi calls them learners) with only a couple of lines of code. They provide a method called create_cnn, which can be used to create a convolutional neural network.

The method needs two arguments, the data, and the architecture, but also supports many other parameters that can be used to customize the model for a given problem.

from fastai.metrics import error_rate# 1 - accuracy
learn=create_cnn(data, models.resnet34, metrics=error_rate)

The model is using resnet34 architecture, with weights pretrained on the imagenet dataset.

Now we make sure that we are using the gpu.

To now train the model using differential learning rates we’d like to pass the max_lr argument to the fit_one_cycle method.

defaults.device=torch.device('cuda') # makes sure the gpu is used
learn.fit_one_cycle(4)
train model

As the data is perfectly clear we are getting an accuracy of more than 99%.

Now that the fully-connected layers are well trained we can unfreeze the other layers and train the whole network.

As mentioned at the beginning of the article, FastAI provides another technic to reinforce transfer learning called differential learning rates, which allows us to line different learning rates for various parts in the network.

To find the perfect learning rates we can use the lr_find. After that model can now be saved using the save method.

learn.unfreeze() # must be done before calling lr_find
learn.lr_find()

learn.save('model-I')

Interpret the results

Lastly, we can use FastAIs Classification Interpretation class to interpret our results.

To create an interpretation object we’d like to call the from_learner method and pass it our learner/model. Then we will use methods like plot_confusion_matrix, plot_top_losses or most_confused.

Plot confusion matrix:

interp =ClassificationInterpretation.from_learner(learn)

interp.plot_confusion_matrix()

Confusion matrix

Figure 9: Confusion matrix

Make predictions on video data

Now we are going to make prediction on our video.

So the first step is to load the model

model=learn.load('model-1')

And then define video path

video_path='video/A.mp4'

After that capture the frames from video and make the prediction

cap = cv2.VideoCapture(video_path)
_, frame= cap.read()
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
img_t = pil2tensor(frame, np.float32)
img_t.div_(255.0)
image = Image(img_t)
pred = model.predict(image)

Out: [‘A’: 0.9988]

Wrap up the Session

In this blog, we have learned about how to perform video classification using FastAI library.

You can download the code and the dataset using this ,,link.

You’ve come a long way in understanding one of the most important areas of deep learning! If you have questions or comments, then please put them in the comments section below.

So if you like this blog post, please like it and subscribe to our data spoof community to get real-time updates. You can follow our Facebook page to get notifications whenever we upload any post so you can never miss any update from us.