You are currently viewing Leveraging the Bayesian Regression algorithm in Python

Leveraging the Bayesian Regression algorithm in Python

Statistics has helped mathematicians determine information and machine learning is heavily dependent on statistics for several tasks. Thus, there are two approaches to statistics in machine learning, the Bayesian approach and Frequentist approach. Both of these approaches are highly appreciated and it can be said that machine learning is dominated by statistics. There are several algorithms such as logistic regression and least square regression which is a part of the frequentist approach to statistics and the parameters are always constant with random predictor variables.

The Bayesian regression model is a powerful statistical technique in which the predictors are treated as constant with random parameter estimates following distribution with mean and variance. This blog introduces Bayesian Regression algorithm in Python as a statistical model for machine learning while comparing the frequentists approach and the Bayesian approach. It also lists the advantages of using the Bayesian approach over the frequentist approach and introduces the key concepts in Bayesian regression.

Understanding Bayesian Regression algorithm in Python

Bayesian regression is a very popular statistical modelling technique which implements Bayesian inference for estimating the parameters of a regression model. In the frequentist approach, the estimation of model parameters is done using likelihood estimation but, Bayesian regression incorporates prior knowledge regarding the parameters and this knowledge gets updated as more data gets gathered.

The Bayesian regression model assumes that parameters have prior probability distributions with random variables where the predictors are constants with random predictor variables. The prior distributions of the models express our prior knowledge or prior beliefs about the parameters and Bayes theorem can be used for updating the prior beliefs regarding parameters and to obtain posterior distribution to represent the updated knowledge for observed data.

Bayesian regression is based on Bayes theorem which is a widely used concept in machine learning and is a simple yet effective way of predicting classes with precision and accuracy. Bayes theorem helps in determining conditional probabilities in which the probability of an event occurring needs to be determined provided that the other event has already occurred.

The Rising Importance of the Bayesian Approach in Machine Learning

Bayesian approach is very important in machine learning and its use in machine learning has been increasing due to several reasons and advantages it provides:

  1. Incorporates Prior Knowledge: The Bayesian approach incorporates prior knowledge: The real-world problems carry the weightage or importance of working with machine learning models and these real-world problems contain some prior knowledge or beliefs about the problem which can help us solve the problem. The Bayesian approach helps us incorporate prior knowledge into the process of modelling leading to more efficient models and better results.
  2. Estimates Uncertainty: The Bayesian approach helps in uncertainty estimation: The Bayesian approach provides us with a probabilistic framework for model estimation, allowing estimation of the uncertainty of model parameters and predictions which is very useful for risk assessment and decision making.
  3. Updates Models with New Data: The Bayesian approach can update models: The Bayesian approach helps us update the model as soon as new data is available to us and can be useful for many applications like personalized medicine recommendations or adaptive control systems.
  4. Handles Small Datasets Efficiently: The Bayesian model can handle small datasets: Bayesian models are capable of handling small data sets more efficiently than other models and provides more accurate estimates of model parameters by incorporating prior knowledge even though limited data.
  5. Offers Flexible Modelling: The Bayesian approach provides flexible modelling: The Bayesian approach is flexible enough to be used in a wide range of problems and is capable of handling data structures and complex models that can be used for model comparison and selection.

Thus, the Bayesian approach provides a flexible and powerful framework for machine learning leading to more accurate and efficient models in situations where prior knowledge and limited data are available.

Bayesian vs Frequentist Regression

There are two very popular approaches in statistical modelling which are powerful and frequently used as a machine learning approach. They are the Frequentists approach and Bayesian approach which differs in underlying methods and assumptions. There are several algorithms such as logistic regression and least square regression which is a part of the frequentist approach to statistics and the parameters in the frequentists approach are always constant with random predictor variables. The Bayesian regression model is a powerful statistical technique in which the predictors are treated as constant with random parameter estimates following distribution with mean and variance. This section will help us compare Bayesian and Frequentist approaches through comparison of both approaches and provide advantages of Bayesian regression with its real-world application.

A. Comparison of Bayesian and Frequentist approaches

Both the Bayesian and Frequentists approach is popular and powerful when implemented in the required problems and can achieve accurate results using statistics. However, they differ in the methods and assumptions implemented to get the results. Some of the key differences between both methods are:

  1. Prior Knowledge: Both the regression and frequentist methods differ in prior knowledge where the Bayesian regression incorporates prior knowledge and beliefs about the model parameters while the frequentist model does not incorporate. Prior knowledge is expressed as a probability distribution over model parameters and allows more flexibility in the modelling process leading to more accurate estimates with a smaller sample size which provides an advantage over the frequentist approach.
  2. Probability interpretation: The term “Probability interpretation” has a major role to play in the comparison between these two approaches and is the fundamental difference between these two approaches. The Bayesian regression model interprets probability as a measure of belief or uncertainty about an event while the frequentists model assumes that probability reflects the frequency of an event in a huge number of trials.
  3. Probability estimation: frequentists’ approach uses maximum likelihood estimation for estimating model parameters while seeking to find parameter values which maximize the likelihood of observed data. On the other hand, the Bayesian regression model uses Bayesian inference for estimating model parameters that involve updating the prior beliefs of the parameters on the basis of observed data.
  4. Uncertainty estimation: Bayesian regression uses a probabilistic framework for estimating uncertainty and the frequentist regression doesn’t. the Bayesian model provides posterior distribution over model parameters which helps in estimating the uncertainty associated with model predictions and parameters. The frequentist model provides point estimates of the model parameters and is less powerful than the Bayesian model in this contrast.
  5. Selection of models: Bayesian regression provides a natural framework for the selection and comparison of models while the frequentists model doesn’t. Different models in Bayesian regression can be compared on the basis of their posterior probability which checks the complexity of the model and fit to the data.

Both the frequentists model and Bayesian models have their own strengths and weaknesses and provide advantages in several different fields. The Bayesian regression model allows more flexibility in the modelling process while providing more accurate estimates with uncertainty estimates in situations when prior knowledge is available or in cases when the sample size is smaller. On the other hand, the frequentists model is widely used for a simpler and straightforward approach to parameter estimation.

B. Advantages of Bayesian Regression algorithm in Python

Through the comparison of the Bayesian and frequentists approach done in the above sub-section, we have seen that Bayesian regression is more capable of solving complex problems with smaller datasets. there are several advantages of implementing Bayesian regression apart from the comparison points done in the section above. The advantages of Bayesian regression can be listed as:

  1. Bayesian regression is more flexible than frequentist regression as it allows the estimation of complex models which is not possible through the frequentists method.
  2. The Bayesian regression model allows incorporating the prior knowledge or belief about interested parameters and helps in improving the accuracy and reliability of estimates. This is more helpful when the sample size is small allowing data to be augmented with prior knowledge.
  3. The Bayesian model provides uncertainty estimation through the natural framework for assessing the robustness and reliability of the estimates. It provides posterior distributions of parameters of interest and can be used for estimating credible intervals and other uncertainty measures.
  4. Bayesian regression provides a natural framework for the selection and comparison of models, allowing the identification of best-performing models among a set of models.
  5. Bayesian regression is more effective in handling the missing data as compared to the frequentist approach and allows imputation techniques to account for the estimation of missing data.

C. Real-world applications of Bayesian Regression algorithm in Python

Bayesian Regression has a lot of real-world applications and can be used for solving a wide range of real-world problems. Some of the examples of real-world applications and problems which can be solved through Bayesian regression are:

  1. Finance: The finance sector requires dealing with time series analysis and Bayesian regression can be effectively used for risk management, portfolio optimization, and modelling stock prices which can be used for estimating the parameters of financial models and model volatility and correlation of financial returns.
  2. Marketing: The Bayesian model can be used in the marketing sector to model consumer behavior, optimize pricing strategy and estimate demands. It can also be used for modelling the relationship between sales and activities for predicting the impact of different strategies.
  3. Environmental modelling: Bayesian regression can be implemented for the prediction of weather, estimating impacts of pollution and modelling ecological systems. It can also be used for modelling the relationship between environmental variables and the occurrence of natural disasters.
  4. Image and speech recognition: Bayesian regression helps in modelling patterns and relationships among the dataset for image and speech recognition and modelling the features.
  5. Healthcare: Bayesian regression can be used in healthcare for the prediction of disease outcomes and estimating the effects of treatment. It can be used for estimating the probability of adverse events and finding the relationship between treatment outcomes and patient characteristics.

There are several other advantages of using the Bayesian regression approach in real-world problems which makes it quite useful for data scientists and makes it a reason for being a popular model.

Key Concepts in Bayesian Regression algorithm in Python

The Bayesian regression model works on several key concepts which define the working of Bayesian regression and it is important to understand these key concepts for understanding the working of Bayesian regression. Some of the key concepts are discussed below:

A. Bayes’ theorem

Bayes theorem is an important and fi=undamental concept of Bayesian regression as it provides the path for updating beliefs about the parameters of interest on the new dataset and is mathematically expressed using the following equation:

P(θ | D) = P(D | θ) * P(θ) / P(D)

where:

  • P(θ | D) represents the posterior probability distribution of the parameters θ given the data D
  • P(D | θ) represents the likelihood of the data D given the parameters θ
  • P(θ) represents the prior probability distribution of the parameters θ
  • P(D) represents the marginal likelihood of the data D, which acts as a normalization constant

Bayesian regression estimates the posterior distribution of model parameters where the likelihood function describes the probability of observed data with a provided set of parameter values. Thus, the Bayes theorem is a key concept in Bayesian regression and provides a principled way for incorporating prior knowledge and updating the beliefs based on new data to estimate the posterior distribution of model parameters.

B. Prior probability

Prior probability in Bayesian regression refers to probability distribution assigned to model parameters before observing data and reflects upon our knowledge or beliefs about the parameter values. Prior probability is chosen on a variety of considerations which are:

  1. Expert knowledge: The presence of prior knowledge for parameter values helps us incorporate this expert knowledge in prior distribution.
  2. Non-informative priors: In some cases, prior knowledge about the parameters might not be available to us, so we need to use non-informative priors in such cases and these priors are designed to be uninformative.
  3. Empirical Bayes: It is a technique to estimate prior distribution from the data and involves the estimation of parameters of the prior distribution on the basis of observed data and then using estimated prior distribution for computation of posterior distribution.

Thus, prior probability is very helpful and is a key point for the results of the analysis that can influence model fit, uncertainty intervals and estimated parameter values.

C. Likelihood

Likelihood deals with the probability distribution of observed data with provided model parameters and is denoted by P(D|θ) which describes the probability of observing data D for a particular set of parameter values θ. It is a key component of Bayesian regression and is used for updating prior beliefs about parameter values on the basis of observed data. The likelihood function gets multiplied by the prior probability distribution of the parameters which is then normalized using marginal likelihood for obtaining the posterior probability distribution of parameters. The likelihood function for linear regression can be written as:

P(D|θ) = N(y | Xθ, σ^2 I)

where:

  • y is the vector of observed response values
  • X is the design matrix of predictor variables
  • θ is the vector of unknown regression coefficients
  • σ^2 is the variance of the error term
  • I represent the identity matrix

The likelihood function provides a measure of the best fit of the model and describes how well the model predicts the observed data through the parameter values.

D. Posterior probability

Posterior probability refers to model parameters’ probability distribution post observing data and is obtained through the application of Bayes theorem allowing us to update our prior beliefs about parameter values on the basis of observed data.

The Posterior probability distribution is denoted as P(θ|D) and represented using the formula:

P(θ|D) = P(D|θ) * P(θ) / P(D)

where:

  • P(D|θ) is the likelihood function,
  • P(θ) is the prior probability distribution of the parameters,
  • P(D) is the marginal likelihood.

After observing the data, posterior probability distribution summarizes the parameter values and provides a complete description of uncertainty in parameter estimates to make predictions about future observations. It is characterized through mean and variance to provide point estimates and make predictions for future observations.

Bayesian Regression algorithm in Python
Step of Bayesian Regression algorithm in Python

E. Conjugate priors

In the Bayesian regression model, conjugate prior refers to a prior probability distribution that results in posterior distribution when combined with likelihood function and is useful as they allow easy computation for posterior distribution. The conjugate prior for regression coefficients in linear regression of normally distributed errors is a normal distribution. Using conjugate prior simplifies the computation of posterior distribution in Bayesian regression which allows for easy interpretation and updating of prior beliefs about model parameters.

F. Markov Chain Monte Carlo (MCMC) sampling

Markov Chain Monte Carlo (MCMC) sampling is a popular computational method to estimate the posterior probability distribution in Bayesian regression and is useful when a posterior distribution is complex. MCMC generates a sequence of samples from posterior distribution so that the distribution of the sample increases. The key idea involves using a Markov chain to generate samples where transition probabilities between states of the chain depend only on the current state and not on previous states. MCMC can be used for generating samples from the posterior distribution of model parameters for the observed data and prior distribution. It provides a flexible and computationally efficient approach for estimating posterior distribution in Bayesian regression and can be used for the generation of point estimates and full posterior distributions with measures of uncertainty. The user must see to it that the algorithm has converged to true posterior distribution and that samples are not biased or correlated through the mixing of the Markov chain.

Implementation of Bayesian Regression in Python

Problem Statement

The main problem in the world of financial investing is to create an ideal portfolio that balances risk and reward. When estimating asset returns and volatility using conventional approaches, such as Markowitz’s Modern Portfolio Theory, historical data is used, but these uncertainties are ignored.

The goal of this research is to use Bayesian regression to create a more reliable portfolio optimization method. This method will provide a more dynamic, probabilistic evaluation of portfolio risk and return, which will eventually result in more well-informed investment decisions. It will also account for uncertainty in asset returns.

Dataset Description:

The dataset contains information about Australian stock prices.

Link to the dataset- https://www.kaggle.com/datasets/ashbellett/australian-historical-stock-prices

Implementation:

The first step is to import all the required libraries

import numpy as np
import pandas as pd
import pymc3 as pm

The second step is to load the CSV dataset

# Read individual stock market datasets and combine them into one DataFrame
files = ['WOR.csv', 'WOW.csv', 'WPL.csv', 'WTC.csv']

The third step is to perfom outer join on all the files based on the Adjusted close column.

data = pd.DataFrame()

for file in files:
    stock_data = pd.read_csv(file, index_col='Date', parse_dates=True)
    stock_data = stock_data[['Adj Close']]  # Keep only the 'Adjusted Close' column
    stock_data.columns = [file.split('.')[0]]  # Rename the column to the company's name
    if data.empty:
        data = stock_data
    else:
        data = data.join(stock_data, how='outer')

The Fourth step is to compute daily returns, covariance matrix of the daily returns and define the bayesian regression model.

# Calculate daily returns and drop missing values
returns = data.pct_change().dropna()

# Calculate the covariance matrix of the daily returns
cov_matrix = returns.cov()

# Define Bayesian regression model for portfolio optimization
n_assets = len(data.columns)

The fifth step is to run the model in order to maximize the values of sharp ratio.

with pm.Model() as portfolio_model:
    # Define prior distributions for the asset weights
    weights = pm.Dirichlet('weights', a=np.ones(n_assets), shape=n_assets)
    
    # Calculate the portfolio returns using the weights
    portfolio_return = pm.Deterministic('portfolio_return', pm.math.dot(returns.mean(), weights))
    
    # Calculate the portfolio volatility using the weights and covariance matrix
    portfolio_volatility = pm.Deterministic('portfolio_volatility', 
                                            pm.math.sqrt(pm.math.dot(pm.math.dot(weights, cov_matrix), weights.T)))
    
    # Define the utility function (e.g., Sharpe ratio) to optimize
    sharpe_ratio = pm.Deterministic('sharpe_ratio', portfolio_return / portfolio_volatility)
    
    # Maximize the utility function
    pm.Potential('sharpe_ratio', sharpe_ratio)

Now perform the bayesian optimization in order to find the optimal weights of each portfolio

# Perform Bayesian optimization to find the optimal portfolio weights
with portfolio_model:
    trace = pm.sample(draws=2000, tune=1000, target_accept=0.95, return_inferencedata=False)

Last but not the least is to display the output of the bayesian regression model

# Display the results
for file, weight in zip(files, optimal_weights):
    print(f"{file.split('.')[0]}: {weight:.4f}")

print("\nOptimal Portfolio Weights: ", optimal_weights)

The output will look like as follows

Output of Portfolio optimization using Bayesian Regression

Conclusion

This blog is structured in a way to shows us the advantages of using Bayesian regression and compares the Bayesian method with the Frequentists method to show how the Bayesian regression is much more efficient as compared to the Frequentists methods. The advantages of using Bayesian regression are also listed and the implementation of Bayesian regression to solve real-world problems is also discussed. Finally, we discuss the key components of Bayesian regression which will help data scientists understand Bayesian regression more closely and they will be able to implement it with more confidence. This blog is very helpful for aspiring data scientists and also for experts who need knowledge regarding the application of Bayesian regression.

If you like the article and would like to support me, make sure to: