How to evaluate the performance of regression model

One of the major steps involved in training a model is to evaluate its performance which can be done using the testing dataset and a few evaluation metrics. The regression models can be evaluated using the regression metrics which we have discussed here. These evaluation metrics are very powerful, with several advantages above the others, and the explanation of these metrics is provided here.

Table of Contents

Mean Absolute Error (MAE):

Mean absolute error is calculated as an average of the absolute differences between the predicted value of the model and the actual values. It can be given by the equation:

Where,

N is the total number of data points,
y_i is the actual value, and
Ŷi is the predicted value.

If the value of MAE obtained is small, it means that the model performs great at prediction and a large MAE value suggests the model is having trouble in several areas and doesn’t perform well. Moreover, if an MAE score of 0 is obtained, then it is a perfect prediction.

Mean Squared Error (MSE):

Mean squared error, or MSE, is the parameter to determine the efficiency of a regression model, which is done by calculating the squared differences between the predicted values and the actual values. The lower the value of MSE is obtained, the better the regression model performs. The value of MSE can be calculated by using the equation:

A black and white math symbol

Description automatically generated

Where,

N is the total number of data points,
y_i is the actual value, and
Ŷi is the predicted value.

The mean squared errors are not robust to outliers, and it penalizes the outliers most, resulting in a bigger value of MSE being calculated. Hence, MAE is at an advantage with this factor.

Root Mean Squared Error (RMSE):

It is a value that is obtained by taking the square root of the mean squared error (MSE) and is the average root squared difference calculated between the actual and the predicted value. To determine the efficiency of the prediction of the regression model, the value of RMSE is used where the lower value of RMSE denotes that the model is performing better and a higher value of RMSE determines a large deviation between the actual and the predicted values. RMSE can be calculated using the formula:

A mathematical equation with numbers and symbols

Description automatically generated

Where,

N is the total number of data points,
y_i is the actual value, and
Ŷi is the predicted value.

Max Error:

The max error metric is used as an evaluation metric for the regression model and is considered the worst-case error between the true value and the predicted value. It looks at the quantiles of the distribution of the absolute percentage errors. Max-error can be employed as an alternative to the RMSE score since it can be hard to interpret.

R² score aka coefficient of determination:

It is a coefficient that determines the extent to which the variance of one variable explains the variance of another variable. Simply put, the R² score measures the amount of variance of the dependent variable being explained through the independent variable. The R² score can be calculated using the following equation:

A black text on a white background

Description automatically generated

Where SSE is determined as the sum of the squares of the difference between the predicted value and the actual value, the SST term determines the total sum of the squares of the difference between the mean of actual value and the actual value. The SSE and SST can be determined using the following equations:

A mathematical equation with a white background

Description automatically generated with medium confidence

Where,

M represents the number of observations,
y-bar is the mean value,
y_i is the observed target value, and
Ŷi is the predicted value.

Adjusted R-Square:

The adjusted R² evaluation metric is similar to the standard R^2, where the adjusted one is calculated by penalizing the models when additional features are added. It counters the problems faced by the standard R² metric by penalizing, which adds more independent variables and does not increase the explanatory power of the regression model. The value of adjusted R² can be calculated by using the formula:

A math equations with numbers

Description automatically generated with medium confidence

Where,

N represents the number of data points, and
K is the number of independent variables present in the model.

Conclusion

In this blog we have learned how to evaluate the performance of the regression model using various metrics. You can implement them using the scikit-learn library.

If you like the article and would like to support me, make sure to:

🔔 Follow Me: LinkedIn| Youtube | Instagram | Twitter

👏 Like for this article and subscribe to our newsletter

📰 View more content on my DataSpoof website