How to Classify the paintings of an artist using CNN

In this article, I will explore how we can use Pytorch to solve an image classification problems of multiple classes. Pytorch comes with a lot of tools and libraries that help in solving our problem.

Pytorch provides modules in the range from a high level like torch.nn module( It is used for creating neural networks) to low-level autograd functions.

Most of the deep learning researchers use the PyTorch framework to do their tasks. Pytorch is written in c++, Python, and Scala language to make our life easier.

Recently I have visited an art gallery where 100 paintings are hanging on their wall. It is very difficult for normal human beings to classify paintings of different artists. So there comes deep learning technology to help in the identification of paintings by different artists.

Image by Pexels. : Link - https://www.pexels.com/photo/photography-of-brick-wall-with-graffiti-1022934/

The article is divided into seven sub-articles and each sub-article will focus on a particular aspect of this workflow including:

Table of Contents

Our multi-class classification dataset

The dataset that will be using in today’s PyTorch multi-class classification tutorial is we will be classified paintings of different painters.

The dataset is publicly available on Kaggle as Impressionist_Classifier_Data. You can download the dataset and proceed further.

Our dataset consists of 5000 images across 10 classes.

The goal of our convolutional neural network will be to predict which painter painting is it.

It will take you approx 5–10 minutes based on your internet speed.

Configure your Virtual environment

To Configure your environment, I will recommend you to follow these steps

How to install Pytorch on Windows

PyTorch without CUDA:

conda install pytorch torchvision cpuonly -c pytorch

PyTorch with CUDA 10.1:

conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

How to install Pytorch on macOS

pip install torch torchvision

You can use any of the methods based on your operating system.

Preparing an image classification Convolutional Neural Network (CNN) and train on the following architecture

A) Using pre-trained networks like ResNet18, VGG19, Alexnet, and many more.

The CNN architecture that we are using in this tutorial is ResNet18. There are many variants of ResNet models like ResNet36, ResNet34, and ResNet 54. Each ResNet block is either 2 layer deep(ResNet18, ResNet34) or 3 layer deep (ResNet50, ResNet101, ResNet152).

The reason for choosing ResNet is deeper networks start converging due to the depth in the network the accuracy gets saturated and degrades rapidly.

Import all the required libraries

So the first step is to import all the required libraries:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import torch
import torchvision
from torchvision import datasets, models, transforms
import torch.utils.data as data
from torch.utils.tensorboard import SummaryWriter
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import time, os, copy, argparse
import multiprocessing
from matplotlib import pyplot as plt
from torchvision import transforms

Load the Dataset

And then second step is to load our dataset:

# Loading the data
train_directory = '../input/impressionist-classifier-data/training/training'
valid_directory = '../input/impressionist-classifier-data/validation/validation'

Data Visualization

Let us see few of the images in our dataset. Now we will define a function plot_Images which is used to display five sample images.

def plotImages(images_arr):
    fig, axes = plt.subplots(1, 5, figsize=(20,20))
    axes = axes.flatten()
    for img, ax in zip( images_arr, axes):
        ax.imshow(img)
        ax.axis('off')
    plt.tight_layout()
    plt.show()

Images by Impressionist classifier dataset

Data Augmentation

After that, we apply the Data Transformation on the training and testing folder, so that the images have the right shape, for that we need to define transfomer. We use torchvision.transform class to convert the dataset.

image_transforms = { 
    'train': transforms.Compose([
        transforms.RandomResizedCrop(size=256, scale=(0.8, 1.0)),
        transforms.RandomRotation(degrees=15),
        transforms.RandomHorizontalFlip(),
        transforms.CenterCrop(size=224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ]),
    'valid': transforms.Compose([
        transforms.Resize(size=256),
        transforms.CenterCrop(size=224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])
    ])
}

1- transforms.RandomResizedCrop(): crops the images in a particular size.

2- transforms.RandomRotation(): we will rotate the image randomly by 15 degrees.

3- transforms.RandomHorizontalFlip(). It will flip the image horizontally with a given probability.

4- transforms.CenterCrop(): It will crop the given image at 224*224 pixels about the center.

5- transforms.ToTensor(): It transforms the dataset into the PyTorch tensor.

6- transforms.Normalize(): We will normalize the pixel values. It contains parameters like mean and standard deviation. In this case, we are passing 3 values of mean and 3 value of standard deviation because the image is in RGB format.

Data loader

The next step is to load the data from the training and test folder and then calculate the size of the training and testing data.

# Load data from folders
dataset = {
    'train': datasets.ImageFolder(root=train_directory, transform=image_transforms['train']),
    'valid': datasets.ImageFolder(root=valid_directory, transform=image_transforms['valid'])
}
 
# Size of train and validation data
dataset_sizes = {
    'train':len(dataset['train']),
    'valid':len(dataset['valid'])
}

The next step is to prepare a train and test loader. We further set the argument batch_size=64, so that in the training this will fetch the data in batches of 64.
For the train and test loader, we set the argument shuffle= True, so that in training biases can be removed and data became more generalized.

Now we set pin_memory= True to the train and test data loader will automatically fetch the data tensors in pinned memory, and this enables faster data transfer to Cuda GPU.

Now the set the last argument which is drop_last=True, by this, we drop the last incomplete batch if the dataset size is not divisible by the batch size.

After that, we print the target class name. Now we Set the default device as GPU if available else CPU.

# Create iterators for data loading
dataloaders = {
    'train':data.DataLoader(dataset['train'], batch_size=bs, shuffle=True, pin_memory=True, drop_last=True),
    'valid':data.DataLoader(dataset['valid'], batch_size=bs, shuffle=True, pin_memory=True, drop_last=True)
}

# Class names or target labels
class_names = dataset['train'].classes
print("Classes:", class_names)
 
# Print the train and validation data sizes
print("Training-set size:",dataset_sizes['train'],
      "\nValidation-set size:", dataset_sizes['valid'])

# Set default device as gpu, if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Transfer Learning

Now we will the load the pretrained model which is ResNet18 and pass an argument that the function can downlaod the weights of the resnet model.
Now we modify fully connected layers to match num_classes.

# Loading the pre-trained models
model_ft = models.resnet18(pretrained=True)

# Modify fully connected layers to match num_classes
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs,num_classes )

You will see a similar output as below.

Once the resnet weight is downloaded, we can proceed with the other steps. If you want you can also check the model summary you can torch summary function as follows and then we load the model to GPU.

#summary of the models
from torchsummary import summary
print('Model Summary:-\n')
for num, (name, param) in enumerate(model_ft.named_parameters()):
    print(num, name, param.requires_grad )

summary(model_ft, input_size=(3, 224, 224))
print(model_ft)
model_ft = model_ft.to(device)

You can see the output below:

Image by author: Model Summary of pre-trained weights

Now we Set Loss Criteria, Optimizer and Learning rate decay

As we know that this is a multiclass problem so we use CrossEntropyLoss().
We also need to define an optimizer, in our case, we will be using Stochastic Gradient Descent (SGD) optimizer with a learning rate =0.001 and momentum=0.9.

Additionally, we need to define Learning rate decay with parameters like step_size=7 and gamma=0.1.

# Loss function
criterion = nn.CrossEntropyLoss()

# Optimizer 
optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Learning rate decay
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)

Training

Now we define the train_model function in which we pass num_epochs=10 and print the loss and accuracy.

# Model training routine 
print("\nTraining:-\n")

def train_model(model, criterion, optimizer, scheduler, num_epochs=30):
    since = time.time()
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

# Tensorboard summary
    writer = SummaryWriter()
    
    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

# Each epoch has a training and validation phase
        for phase in ['train', 'valid']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode
running_loss = 0.0
            running_corrects = 0

# Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device, non_blocking=True)
                labels = labels.to(device, non_blocking=True)

# zero the parameter gradients
                optimizer.zero_grad()

# forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

# backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()
# statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 'train':
                scheduler.step()
epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]
print('{} Loss: {:.4f} Acc: {:.4f}'.format(
                phase, epoch_loss, epoch_acc))

# Record training loss and accuracy for each phase
            if phase == 'train':
                writer.add_scalar('Train/Loss', epoch_loss, epoch)
                writer.add_scalar('Train/Accuracy', epoch_acc, epoch)
                writer.flush()
            else:
                writer.add_scalar('Valid/Loss', epoch_loss, epoch)
                writer.add_scalar('Valid/Accuracy', epoch_acc, epoch)
                writer.flush()

# deep copy the model
            if phase == 'valid' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
print()
time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:4f}'.format(best_acc))

# load best model weights
    model.load_state_dict(best_model_wts)
    return model

model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,num_epochs=num_epochs)

# Save the model
PATH="model_1.pth" 
print("\nSaving the model...")
torch.save(model_ft, PATH)

And save the model as .pth file.

As you can see we trained the network for 10 epochs, achieving:
1- 87.96% multi-class classification on the training set
2- 77.17% multi-class classification on the testing set

Applying Pytorch multi-class classification to new images

Now that our multi-class classification PyTorch model is trained, let us apply it to new images of the painting.

On the first five lines, we import the necessary packages for the script.
Now we load the image and preprocess the input image for classification.
Now we load the saved model which is .pth.

import numpy as np
import torch
import torchvision
from torchvision import datasets, models, transforms
import torch.utils.data as data
import multiprocessing
from sklearn.metrics import confusion_matrix

#Loading the testing images
!git clone https://github.com/abhisingh007224/work.git

#Loading the saved model
EVAL_MODEL= '/kaggle/working/model/model/model_1.pth'
model = torch.load(EVAL_MODEL)
model.eval()

Next we pre-process the image and prepare a batch of size 8 to be passed through the network.

And then we classify the painting.

bs = 8
EVAL_DIR='/kaggle/working/work/testing/'


# Prepare the eval data loader
eval_transform=transforms.Compose([
        transforms.Resize(size=256),
        transforms.CenterCrop(size=224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])])

eval_dataset=datasets.ImageFolder(root=EVAL_DIR, transform=eval_transform)
eval_loader=data.DataLoader(eval_dataset, batch_size=bs, shuffle=True, pin_memory=True)

# Enable gpu mode, if cuda available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Number of classes and dataset-size
num_classes=len(eval_dataset.classes)
dsize=len(eval_dataset)

# Class label names
class_names=['Cezanne', 'Degas', 'Gauguin', 'Hassam', 'Matisse', 'Monet', 'Pissarro', 'Renoir', 'Sargent', 'VanGogh']

# Initialize the prediction and label lists
predlist=torch.zeros(0,dtype=torch.long, device='cpu')
lbllist=torch.zeros(0,dtype=torch.long, device='cpu')

# Evaluate the model accuracy on the dataset
correct = 0
total = 0
with torch.no_grad():
    for images, labels in eval_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        predlist=torch.cat([predlist,predicted.view(-1).cpu()])
        lbllist=torch.cat([lbllist,labels.view(-1).cpu()])

# Overall accuracy
overall_accuracy=100 * correct / total
print('Accuracy of the network on the {:d} test images: {:.2f}%'.format(dsize, 
    overall_accuracy))

# Confusion matrix
conf_mat=confusion_matrix(lbllist.numpy(), predlist.numpy())
print('Confusion Matrix')
print('-'*16)
print(conf_mat,'\n')

The output is shown below

Image by author: Accuracy and confusion matrix of test dataset

Wrap up the Session

In this tutorial, we learn about how to classify paintings using CNN in the PyTorch framework. You can download this notebook on my GitHub.

If you like the article please clap on the article and if you have any problem regarding implementation feel free to comment.

Some of the related research works are

Artist Identification with Convolutional Neural Networks

Art Painting Identification using Convolutional Neural Network