Linear regression is one of the most basic algorithms in machine learning. In the last tutorial, we discuss the basics of operations in PyTorch.
In the machine learning series, we see the concepts of linear regression and its basic implementation using Sklearn. In this tutorial, we are going to perform a linear regression in the PyTorch framework.
So, let us start with the definition of linear regression.
What is Linear Regression
Linear regression is a statistical model that inspects the linear relationship between two or more variables which are dependent variables and independent variables.
For example, The price of milk is directly proportional to the quantity of milk that we have bought. Here in this situation, we can see a positive correlation between the price of milk and the quantity of milk.
Let us take a different situation in which the price of milk is inversely proportional to the production of milk. It means when the production of milk is more then the price of milk will be less, similarly vice-versa also true.
From the given figure we can conclude that linear regression can either be positive correlated or negative correlated.
When to use linear regression
Linear regression is used when we want to predict the dependency of one variable on another variable. The variable that we want to predict is called the dependent variable or outcome variable.
The mathematical equation of linear regression is Y ,= mX ,+ b
where
- Y is an outcome variable,
- X is the input variable which we are using to make the predictions,
- m is the slope that determines the effect of x on y,
- and b is the bias which means how much our prediction is differing from the actual output.
Assumptions of Linear Regression
There are four assumptions that are associated with Linear regression.
Independence of Errors
If the distribution of errors is arbitrary and not affected or correlated to the errors in the prior observations.
How to check: We can plot a scatterplot in between x and y to check these assumptions.
The Outlier Condition
An outlier is a data point that is different from the normal population. If you do have outliers in your dataset, it is a good analysis to run a regression test.
How to check: To detect the outliers you can use the box-plot method or you can use a mathematical approach called Elliptic envelope( In this method we assumes that the data is normally distributed. If the value is +1 then it is not an outlier and if the value is -1 then it is an outlier).
Homoscedasticity
In this type of situation in which error terms are the same across all values of independent variables.
How to check: We can plot a scatterplot of the normalized residuals against the fitted values. If homoscedasticity exists then the plot will be in a funnel shape. We can also use the Weisberg test to detect this phenomenon.
Normality of Error Distribution
It means we have to assume that errors have a normal distribution. And when the confidence interval becomes unstable, it leads to trouble in estimating coefficients based on the minimization of least squares.
How to check: We can make a QQ plot or we can perform statistical tests of normality such as the Kolmogorov-Smirnov test, Shapiro-Wilk test.
Linear Regression pipeline in PyTorch
Implementation of Linear Regression in PyTorch
Let us start coding!
Import all the required libraries.
# Importing all the required libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
- torch- This module is used to import all the torch dependencies.
- torch.nn- We use the nn module to create the neural network.
- torch.nn.functional- This module contains some useful functions like activation functions convolution operations you can use.
- torch.optim- In this package, we have many optimization algorithms like stochastic gradient descent, Adam optimizer, LBFGS, and many more.
- numpy- It is used to perform mathematical computation like mean, variance, array creation, and many more.
- matplotlib- It is used for visualization purposes.
- sklearn.datasets- we use sklearn.datasets class to make our toy dataset for our regression model.
Now we create a toy dataset for our regression model.
n_features = 1
n_samples = 100
X, y = make_regression(
n_samples=n_samples,
n_features=n_features,
noise=20,
random_state=42,
)
fix, ax = plt.subplots()
ax.plot(X, y, ".")
In the first two lines, we define no of features is equal to 1 and no of samples is equal to 100. Now we use the make_regression() function that will create a dataset with a linear relationship between features and the targets and then plot it using the matplotlib library.
Now we convert the numpy array into tensors by using torch.from_numpy() function.
X = torch.from_numpy(X).float()
y = torch.from_numpy(y.reshape((n_samples, n_features))).float()
Now we define the linear regression class.
class LinReg(nn.Module):
def __init__(self, input_dim):
super().__init__()
self.beta = nn.Linear(input_dim, 1)
def forward(self, X):
return self.beta(X)
model = LinReg(n_features)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.00001)
Now we define __init__ constructor in which we instantiate two nn.Linear module. Next, in the forward function, we accept a variable of input data and we must return a variable of output data. Now we call our linear regression model.
In the next line, we construct our loss function and an optimizer. And then we call to model.parameters() in the SGD constructor will contain the learnable parameters of the two nn.
Next we train our linear regression model.
# Training our linear regression model
model.train()
y_ = model(X)
loss = criterion(y_, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
Now we compute predicted y by passing X to the model and then we compute the loss. Zero gradients perform the backward pass and update the weights.
Now we evaluate our model and plot the result.
# Evaluation of our model
model.eval()
with torch.no_grad():
y_ = model(X)
# Visualising the result
fig, ax = plt.subplots()
ax.plot(X.numpy(), y_.numpy(), ".", label="pred")
ax.plot(X.numpy(), y.numpy(), ".", label="data")
ax.set_title(f"MSE: {loss.item():0.1f}")
ax.legend();
Wrap up the session
Finally, we made it to the end of the tutorial.
In this tutorial, we have learned about what is linear regression, examples, when to use linear regression, assumptions of linear regression, and implementation of linear regression in PyTorch.
If you have any problem regarding implementation feel free to comment.
You can also join our telegram channel to get free cheatsheets, projects, ebooks, study material related to machine learning, deep learning, data science, natural language processing, python programming, r programming, big data, and many more.