In this article, I will explore how we can use Pytorch to solve an image classification problems of multiple classes. Pytorch comes with a lot of tools and libraries that help in solving our problem.
Pytorch provides modules in the range from a high level like torch.nn module( It is used for creating neural networks) to low-level autograd functions.
Most of the deep learning researchers use the PyTorch framework to do their tasks. Pytorch is written in c++, Python, and Scala language to make our life easier.
Recently I have visited an art gallery where 100 paintings are hanging on their wall. It is very difficult for normal human beings to classify paintings of different artists. So there comes deep learning technology to help in the identification of paintings by different artists.

The article is divided into seven sub-articles and each sub-article will focus on a particular aspect of this workflow including:
Our multi-class classification dataset
The dataset that will be using in today’s PyTorch multi-class classification tutorial is we will be classified paintings of different painters.
The dataset is publicly available on Kaggle as Impressionist_Classifier_Data. You can download the dataset and proceed further.
Our dataset consists of 5000 images across 10 classes.
The goal of our convolutional neural network will be to predict which painter painting is it.
It will take you approx 5–10 minutes based on your internet speed.
Configure your Virtual environment
To Configure your environment, I will recommend you to follow these steps
How to install Pytorch on Windows
PyTorch without CUDA:
1 | conda install pytorch torchvision cpuonly - c pytorch |
PyTorch with CUDA 10.1:
1 | conda install pytorch torchvision cudatoolkit = 10.1 - c pytorch |
How to install Pytorch on macOS
1 | pip install torch torchvision |
You can use any of the methods based on your operating system.
Preparing an image classification Convolutional Neural Network (CNN) and train on the following architecture
A) Using pre-trained networks like ResNet18, VGG19, Alexnet, and many more.
The CNN architecture that we are using in this tutorial is ResNet18. There are many variants of ResNet models like ResNet36, ResNet34, and ResNet 54. Each ResNet block is either 2 layer deep(ResNet18, ResNet34) or 3 layer deep (ResNet50, ResNet101, ResNet152).
The reason for choosing ResNet is deeper networks start converging due to the depth in the network the accuracy gets saturated and degrades rapidly.
Import all the required libraries
So the first step is to import all the required libraries:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv) import os import torch import torchvision from torchvision import datasets, models, transforms import torch.utils.data as data from torch.utils.tensorboard import SummaryWriter import torch.nn as nn import torch.optim as optim from torch.optim import lr_scheduler import time, os, copy, argparse import multiprocessing from matplotlib import pyplot as plt from torchvision import transforms |
Load the Dataset
And then second step is to load our dataset:
16 17 18 | # Loading the data train_directory = '../input/impressionist-classifier-data/training/training' valid_directory = '../input/impressionist-classifier-data/validation/validation' |
Data Visualization
Let us see few of the images in our dataset. Now we will define a function plot_Images which is used to display five sample images.
19 20 21 22 23 24 25 26 | def plotImages(images_arr): fig, axes = plt.subplots( 1 , 5 , figsize = ( 20 , 20 )) axes = axes.flatten() for img, ax in zip ( images_arr, axes): ax.imshow(img) ax.axis( 'off' ) plt.tight_layout() plt.show() |

Data Augmentation
After that, we apply the Data Transformation on the training and testing folder, so that the images have the right shape, for that we need to define transfomer. We use torchvision.transform class to convert the dataset.
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | image_transforms = { 'train' : transforms.Compose([ transforms.RandomResizedCrop(size = 256 , scale = ( 0.8 , 1.0 )), transforms.RandomRotation(degrees = 15 ), transforms.RandomHorizontalFlip(), transforms.CenterCrop(size = 224 ), transforms.ToTensor(), transforms.Normalize([ 0.485 , 0.456 , 0.406 ], [ 0.229 , 0.224 , 0.225 ]) ]), 'valid' : transforms.Compose([ transforms.Resize(size = 256 ), transforms.CenterCrop(size = 224 ), transforms.ToTensor(), transforms.Normalize([ 0.485 , 0.456 , 0.406 ], [ 0.229 , 0.224 , 0.225 ]) ]) } |
1- transforms.RandomResizedCrop(): crops the images in a particular size.
2- transforms.RandomRotation(): we will rotate the image randomly by 15 degrees.
3- transforms.RandomHorizontalFlip(). It will flip the image horizontally with a given probability.
4- transforms.CenterCrop(): It will crop the given image at 224*224 pixels about the center.
5- transforms.ToTensor(): It transforms the dataset into the PyTorch tensor.
6- transforms.Normalize(): We will normalize the pixel values. It contains parameters like mean and standard deviation. In this case, we are passing 3 values of mean and 3 value of standard deviation because the image is in RGB format.
Data loader
The next step is to load the data from the training and test folder and then calculate the size of the training and testing data.
45 46 47 48 49 50 51 52 53 54 55 | # Load data from folders dataset = { 'train' : datasets.ImageFolder(root = train_directory, transform = image_transforms[ 'train' ]), 'valid' : datasets.ImageFolder(root = valid_directory, transform = image_transforms[ 'valid' ]) } # Size of train and validation data dataset_sizes = { 'train' : len (dataset[ 'train' ]), 'valid' : len (dataset[ 'valid' ]) } |
The next step is to prepare a train and test loader. We further set the argument batch_size=64, so that in the training this will fetch the data in batches of 64.
For the train and test loader, we set the argument shuffle= True, so that in training biases can be removed and data became more generalized.
Now we set pin_memory= True to the train and test data loader will automatically fetch the data tensors in pinned memory, and this enables faster data transfer to Cuda GPU.
Now the set the last argument which is drop_last=True, by this, we drop the last incomplete batch if the dataset size is not divisible by the batch size.
After that, we print the target class name. Now we Set the default device as GPU if available else CPU.
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | # Create iterators for data loading dataloaders = { 'train' :data.DataLoader(dataset[ 'train' ], batch_size = bs, shuffle = True , pin_memory = True , drop_last = True ), 'valid' :data.DataLoader(dataset[ 'valid' ], batch_size = bs, shuffle = True , pin_memory = True , drop_last = True ) } # Class names or target labels class_names = dataset[ 'train' ].classes print ( "Classes:" , class_names) # Print the train and validation data sizes print ( "Training-set size:" ,dataset_sizes[ 'train' ], "\nValidation-set size:" , dataset_sizes[ 'valid' ]) # Set default device as gpu, if available device = torch.device( "cuda:0" if torch.cuda.is_available() else "cpu" ) |
Transfer Learning
Now we will the load the pretrained model which is ResNet18 and pass an argument that the function can downlaod the weights of the resnet model.
Now we modify fully connected layers to match num_classes.
72 73 74 75 76 77 | # Loading the pre-trained models model_ft = models.resnet18(pretrained = True ) # Modify fully connected layers to match num_classes num_ftrs = model_ft.fc.in_features model_ft.fc = nn.Linear(num_ftrs,num_classes ) |
You will see a similar output as below.

Once the resnet weight is downloaded, we can proceed with the other steps. If you want you can also check the model summary you can torch summary function as follows and then we load the model to GPU.
78 79 80 81 82 83 84 85 86 | #summary of the models from torchsummary import summary print ( 'Model Summary:-\n' ) for num, (name, param) in enumerate (model_ft.named_parameters()): print (num, name, param.requires_grad ) summary(model_ft, input_size = ( 3 , 224 , 224 )) print (model_ft) model_ft = model_ft.to(device) |
You can see the output below:

Now we Set Loss Criteria, Optimizer and Learning rate decay
As we know that this is a multiclass problem so we use CrossEntropyLoss().
We also need to define an optimizer, in our case, we will be using Stochastic Gradient Descent (SGD) optimizer with a learning rate =0.001 and momentum=0.9.
Additionally, we need to define Learning rate decay with parameters like step_size=7 and gamma=0.1.
87 88 89 90 91 92 93 94 | # Loss function criterion = nn.CrossEntropyLoss() # Optimizer optimizer_ft = optim.SGD(model_ft.parameters(), lr = 0.001 , momentum = 0.9 ) # Learning rate decay exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size = 7 , gamma = 0.1 ) |
Training
Now we define the train_model function in which we pass num_epochs=10 and print the loss and accuracy.
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | # Model training routine print ( "\nTraining:-\n" ) def train_model(model, criterion, optimizer, scheduler, num_epochs = 30 ): since = time.time() best_model_wts = copy.deepcopy(model.state_dict()) best_acc = 0.0 # Tensorboard summary writer = SummaryWriter() for epoch in range (num_epochs): print ( 'Epoch {}/{}' . format (epoch, num_epochs - 1 )) print ( '-' * 10 ) # Each epoch has a training and validation phase for phase in [ 'train' , 'valid' ]: if phase = = 'train' : model.train() # Set model to training mode else : model. eval () # Set model to evaluate mode running_loss = 0.0 running_corrects = 0 # Iterate over data. for inputs, labels in dataloaders[phase]: inputs = inputs.to(device, non_blocking = True ) labels = labels.to(device, non_blocking = True ) # zero the parameter gradients optimizer.zero_grad() # forward # track history if only in train with torch.set_grad_enabled(phase = = 'train' ): outputs = model(inputs) _, preds = torch. max (outputs, 1 ) loss = criterion(outputs, labels) # backward + optimize only if in training phase if phase = = 'train' : loss.backward() optimizer.step() # statistics running_loss + = loss.item() * inputs.size( 0 ) running_corrects + = torch. sum (preds = = labels.data) if phase = = 'train' : scheduler.step() epoch_loss = running_loss / dataset_sizes[phase] epoch_acc = running_corrects.double() / dataset_sizes[phase] print ( '{} Loss: {:.4f} Acc: {:.4f}' . format ( phase, epoch_loss, epoch_acc)) # Record training loss and accuracy for each phase if phase = = 'train' : writer.add_scalar( 'Train/Loss' , epoch_loss, epoch) writer.add_scalar( 'Train/Accuracy' , epoch_acc, epoch) writer.flush() else : writer.add_scalar( 'Valid/Loss' , epoch_loss, epoch) writer.add_scalar( 'Valid/Accuracy' , epoch_acc, epoch) writer.flush() # deep copy the model if phase = = 'valid' and epoch_acc > best_acc: best_acc = epoch_acc best_model_wts = copy.deepcopy(model.state_dict()) print () time_elapsed = time.time() - since print ( 'Training complete in {:.0f}m {:.0f}s' . format ( time_elapsed / / 60 , time_elapsed % 60 )) print ( 'Best val Acc: {:4f}' . format (best_acc)) # load best model weights model.load_state_dict(best_model_wts) return model model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,num_epochs = num_epochs) # Save the model PATH = "model_1.pth" print ( "\nSaving the model..." ) torch.save(model_ft, PATH) |
And save the model as .pth file.
As you can see we trained the network for 10 epochs, achieving:
1- 87.96% multi-class classification on the training set
2- 77.17% multi-class classification on the testing set
,
Applying Pytorch multi-class classification to new images
Now that our multi-class classification PyTorch model is trained, let us apply it to new images of the painting.
On the first five lines, we import the necessary packages for the script.
Now we load the image and preprocess the input image for classification.
Now we load the saved model which is .pth.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import numpy as np import torch import torchvision from torchvision import datasets, models, transforms import torch.utils.data as data import multiprocessing from sklearn.metrics import confusion_matrix #Loading the testing images !git clone https: / / github.com / abhisingh007224 / work.git #Loading the saved model EVAL_MODEL = '/kaggle/working/model/model/model_1.pth' model = torch.load(EVAL_MODEL) model. eval () |
Next we pre-process the image and prepare a batch of size 8 to be passed through the network.
And then we classify the painting.
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | bs = 8 EVAL_DIR = '/kaggle/working/work/testing/' # Prepare the eval data loader eval_transform = transforms.Compose([ transforms.Resize(size = 256 ), transforms.CenterCrop(size = 224 ), transforms.ToTensor(), transforms.Normalize([ 0.485 , 0.456 , 0.406 ], [ 0.229 , 0.224 , 0.225 ])]) eval_dataset = datasets.ImageFolder(root = EVAL_DIR, transform = eval_transform) eval_loader = data.DataLoader(eval_dataset, batch_size = bs, shuffle = True , pin_memory = True ) # Enable gpu mode, if cuda available device = torch.device( "cuda:0" if torch.cuda.is_available() else "cpu" ) # Number of classes and dataset-size num_classes = len (eval_dataset.classes) dsize = len (eval_dataset) # Class label names class_names = [ 'Cezanne' , 'Degas' , 'Gauguin' , 'Hassam' , 'Matisse' , 'Monet' , 'Pissarro' , 'Renoir' , 'Sargent' , 'VanGogh' ] # Initialize the prediction and label lists predlist = torch.zeros( 0 ,dtype = torch. long , device = 'cpu' ) lbllist = torch.zeros( 0 ,dtype = torch. long , device = 'cpu' ) # Evaluate the model accuracy on the dataset correct = 0 total = 0 with torch.no_grad(): for images, labels in eval_loader: images, labels = images.to(device), labels.to(device) outputs = model(images) _, predicted = torch. max (outputs.data, 1 ) total + = labels.size( 0 ) correct + = (predicted = = labels). sum ().item() predlist = torch.cat([predlist,predicted.view( - 1 ).cpu()]) lbllist = torch.cat([lbllist,labels.view( - 1 ).cpu()]) # Overall accuracy overall_accuracy = 100 * correct / total print ( 'Accuracy of the network on the {:d} test images: {:.2f}%' . format (dsize, overall_accuracy)) # Confusion matrix conf_mat = confusion_matrix(lbllist.numpy(), predlist.numpy()) print ( 'Confusion Matrix' ) print ( '-' * 16 ) print (conf_mat, '\n' ) |
The output is shown below

Wrap up the Session
In this tutorial, we learn about how to classify paintings using CNN in the PyTorch framework. You can download this notebook on my GitHub.
If you like the article please clap on the article and if you have any problem regarding implementation feel free to comment.
Some of the related research works are
Artist Identification with Convolutional Neural Networks
Art Painting Identification using Convolutional Neural Network