You are currently viewing Polynomial regression in R

Polynomial regression in R

Loading

In this article, we’ll explore how polynomial regression can enhance our machine learning projects in R. While linear regression is a staple in machine learning, there are situations where polynomial regression outshines it. Just like we have different tools for different tasks, having a variety of regression algorithms allows us to tackle diverse problems efficiently.

For nonlinear data, polynomial regression yields better results in less time and with less resources than alternative regression approaches. This article will discuss polynomial regression’s pros and cons and where to use it. Let’s dive in!

Need for polynomial regression

So, let us start with the discussion where the need for the application of polynomial regression arises. 

Linear regression falls short when dealing with non-linear data. If we try to apply it without adjustments, the results can be poor. That’s where polynomial regression comes in. It’s like having a more flexible tool in our machine learning toolbox.

if you want to learn more about Linear regression.

In polynomial regression, we model the relationship between variables using a polynomial equation. This allows us to capture non-linear relationships more accurately. By adding polynomial terms to the regression, we can better fit the data and improve our predictions.

     y = b0 ​+ b1​ × x + b2​ × x2 +b3​ × x3 +…......+ bn​×xn

Think of it this way:

  • Linear regression is like fitting a straight line to data points, while polynomial regression is like fitting a curve.
  • This flexibility makes polynomial regression suitable for scenarios where the relationship between variables is curved or nonlinear.

Whether it’s analyzing the cost of a product over time or predicting housing prices, different datasets require different approaches. Polynomial regression gives us another option to tackle regression problems in machine learning, alongside linear and multiple regression methods.

How polynomial regression algorithms work?

Polynomial regression in R, also known as polynomial linear regression, is implemented using various coefficients arranged linearly. To get started, we install essential packages like Caret for smoother workflow and Tidyverse for better visualization and manipulation.

Once the packages are installed, we organize our data by splitting it into training and testing datasets. We visualize the data using plots and graphs to understand its characteristics better.

To perform polynomial regression, we use the lm() function in R. We create a polynomial equation with terms like l and l 2 to represent predictor variables and their squared values, respectively.

For example, the polynomial regression equation could look like:

  m = b0​ + b1​ × l + b2​ × l 2

where m represents the median value of the dataset and l is the predictor variable.

We fit the polynomial regression model using the lm() function with the appropriate formula, such as:

    lm (m ∼l +I (l 2), data=train.data)

Once the model is fitted, we can visualize the polynomial regression graph using ggplot() function.

It’s important to remember that while working with polynomial regression, we make certain assumptions about the dataset. Although we can process data through simpler means, using polynomial regression allows for better judgment and more satisfactory results.

Use cases of polynomial regression

Polynomial regression is similar to linear regression but with the added benefit of handling non-linear data. This makes it useful in various scenarios where the relationship between variables isn’t straight.

For example, scientists can use polynomial regression to analyze experimental data, like studying isotopes in sediment samples. It’s also handy for understanding complex relationships, such as how diseases spread in populations.

In essence, polynomial regression provides a clear way to model relationships between variables, making it valuable for a wide range of research and analysis tasks

Features of the polynomial regression

In polynomial regression, the best-fit line is determined by the degree of the polynomial equation. The higher the degree, the more flexible the curve.

We use the Polynomialfeature() function to convert the polynomial equation into a feature matrix based on its degree. This helps us model complex relationships between variables.

By visualizing the data with scatter plots, we can understand the curve’s nature and how variables relate to each other. This visualization aids decision-making and helps us choose the appropriate degree for our polynomial regression model.

Implementation of Polynomial regression in R

The dataset that we are going to use is the pressure dataset, which comes as the default dataset in R. You can also download the dataset from this link. It has two columns which are temperature and pressure.

Step 1- Load the dataset

Import the dataset on which we need to apply the polynomial regression, and then we will install the necessary libraries required for the polynomial regression.

# install.packages('caTools')
library(caTools)
library(Metrics)


# Load the dataset
data = read.csv('dataset-47267.csv')  
head(data)

Step 2- Visualize the data

Next, to decide if a polynomial model is appropriate for our dataset, we use a scatter plot and visualize the relationship between dependent and independent variables.

# plotting the graph
library(ggplot2)
ggplot() +
  geom_point(aes(x = data$temperature, y = data$pressure),
             colour = 'blue')

From the above plot, we can observe that there is a nonlinear relationship between the dependent and independent variables. Therefore we can use the polynomial regression model.

Step 3- Data preprocessing

The third step is to preprocess the data like cleaning the missing values, scaling the dataset, and defining our input and the target variable. Then, divide the dataset into the training and testing datasets after the analysis of the dataset is done. The training dataset will be used for training the model, and then the testing of the model will be done with the help of the training dataset.

split = sample.split(data$pressure, SplitRatio = 2/3)
training = subset(data, split == TRUE)
testing = subset(data, split == FALSE)

Step 4- Applying polynomial regression model

The fourth step is to call our polynomial regression model. But before that, you have to add polynomial terms to the dataset and form a matrix.

data$temperature2= data$temperature ^  2
data$temperature3= data$temperature ^  3
data$temperature4 = data$temperature  ^  4

Next, we call the lm function and pass the parameters, like formula and the dataset. After that, we call the summary function. The results are shown below

polynomial_reg = lm(formula = pressure~ .,data = data)
summary(polynomial_reg)
Summary of Polynomial regression

Step 5- Plot and Evaluate the model

The fifth step is to Plot and forecast the model on the test data and evaluate the polynomial regression model using metrics like mean squared error, root means squared error, and mean absolute error.

library(ggplot2)
x_grid = seq(min(data$temperature), max(data$temperature), 0.1)
ggplot() +
  geom_point(aes(x = data$temperature, y = data$pressure),
             colour = 'red') +
  geom_line(aes(x = x_grid, y = predict(polynomial_reg,
                                        newdata = data.frame(temperature = x_grid,
                                                             temperature2 = x_grid^2,
                                                             temperature3 = x_grid^3,
                                                             temperature4 = x_grid^4))),
            colour = 'blue') +
  ggtitle('Real or Predicted (Polynomial Regression)') +
  xlab('temperature') +
  ylab('pressure')
Plotting the real and predicted

Now we make predictions on the test data and evaluate the model.

# Making prediction on the test data
poly_pred <- predict(object = polynomial_reg)

RMSE<- rmse(poly1_pred, testing$pressure)
RMSE

MAE<-mae(poly1_pred, testing$pressure)
MAE

Advantages and disadvantages of Polynomial regression

Advantages:

  1. Polynomial regression can work on a dataset of any size. 
  2. The polynomial regression might work very well on non-linear problems.

Disadvantages:

  1. One of the main disadvantages of using polynomial regression is that we need to choose the right polynomial degree for a good bias or variance trade-off.

Conclusion

From this article, you have figured out how to break down information using polynomial regression models in R. You have also learned about when to apply polynomial regression, what are the advantages and disadvantages of using polynomial regression.