You are currently viewing Bayesian Linear Regression in Python

Bayesian Linear Regression in Python

Loading

Machine learning is heavily dependent on statistics for several tasks. There are two approaches to statistics in machine learning, the Bayesian approach and Frequentist approach. Both of these approaches are highly appreciated and widely used, they differ significantly in how they treat parameters and data.

  • Frequentist Approach: Assumes model parameters are fixed and the data varies.
  • Bayesian Approach: Assumes data is fixed and parameters are uncertain, described by probability distributions.

This blog explains Bayesian Linear Regression in Python and compares it with the Frequentist method, and discusses real-world applications.

Understanding Bayesian Linear Regression in Python

The Frequentist approach, which relies solely on observed data and does not incorporate any prior beliefs or external knowledge about the parameters. In contrast, Bayesian regression incorporates prior knowledge regarding the parameters and this knowledge gets updated as more data gets gathered.

In Bayesian regression, the prior distributions reflect our initial beliefs or existing knowledge about the model parameters. As new data becomes available, Bayes’ theorem is used to update these beliefs, resulting in a posterior distribution that represents the updated knowledge based on the observed data. While Frequentist models treat predictors as constants, Bayesian models allow for predictors to also be treated as random variables when necessary, offering greater flexibility and a more comprehensive understanding of uncertainty in model estimation.

In other words:

  • Frequentist: Parameters are fixed; data is random.
  • Bayesian: Parameters are random; data is fixed.

Bayesian regression is based on Bayes theorem which is a widely used concept in machine learning and is a simple yet effective way of predicting classes with precision and accuracy. Bayes theorem helps in determining conditional probabilities in which the probability of an event occurring needs to be determined provided that the other event has already occurred.

The Rising Importance of the Bayesian Approach in Machine Learning

Bayesian approach is very important in machine learning and its use in machine learning has been increasing due to several reasons and advantages it provides:

  • Uses Prior Knowledge: Bayesian models can include existing knowledge or beliefs, helping improve model accuracy in real-world problems.
  • Estimates Uncertainty: It gives a clear measure of uncertainty in predictions, which is useful for risk assessment and decision-making.
  • Learns from New Data: Models can be easily updated as new data comes in, making them suitable for applications like personalized medicine.
  • Works Well with Small Data: Even with limited data, Bayesian models perform well by using prior information.
  • Flexible and Powerful: The approach can handle complex data and model structures, making it suitable for a wide range of problems.

Thus, the Bayesian approach provides a flexible and powerful framework for machine learning leading to more accurate and efficient models in situations where prior knowledge and limited data are available.

Bayesian vs Frequentist Regression

There are two very popular approaches in statistical modelling which are powerful and frequently used as a machine learning approach. They are the Frequentists approach and Bayesian approach which differs in underlying methods and assumptions.

There are several algorithms such as logistic regression and least square regression which is a part of the frequentist approach to statistics and the parameters in the frequentists approach are always constant with random predictor variables.

The Bayesian regression model is a powerful statistical technique in which the predictors are treated as constant with random parameter estimates following distribution with mean and variance. This section will help us compare Bayesian and Frequentist approaches through comparison of both approaches and provide advantages of Bayesian regression with its real-world application.

Comparison of Bayesian and Frequentist approaches

Both the Bayesian and Frequentists approach is popular and powerful when implemented in the required problems and can achieve accurate results using statistics. However, they differ in the methods and assumptions implemented to get the results. Some of the key differences between both methods are:

AspectBayesian ApproachFrequentist Approach
Prior KnowledgeUsedNot used
Interpretation of ProbabilityDegree of beliefLong-run frequency
Parameter EstimationPosterior distributionPoint estimates (MLE)
Uncertainty EstimationExplicitly quantifiedLimited or asymptotic
Model UpdateEasy with new dataRetrain from scratch
Handling Small DataBetter with priorsNeeds larger samples

Real-world applications of Bayesian Linear Regression in Python

Bayesian Regression has a lot of real-world applications and can be used for solving a wide range of real-world problems. Some of the examples of real-world applications and problems which can be solved through Bayesian regression are:

DomainApplications of Bayesian Regression
Finance– Time series analysis
– Risk management and portfolio optimization
– Modeling stock prices
– Estimating parameters of financial models
– Modeling volatility and correlations of returns
Marketing– Modeling consumer behavior
– Optimizing pricing strategies
– Estimating product demand
– Predicting the impact of sales and marketing strategies
Environmental Modeling– Weather prediction
– Estimating pollution impacts
– Modeling ecological systems
– Predicting natural disasters based on environmental variables
Image and Speech Recognition– Modeling complex patterns in datasets
– Feature extraction and pattern recognition
– Enhancing recognition accuracy through probabilistic modeling
Healthcare– Predicting disease outcomes
– Estimating treatment effects
– Assessing probability of adverse events
– Modeling relationships between treatment and patient characteristics

There are several other advantages of using the Bayesian regression approach in real-world problems which makes it quite useful for data scientists and makes it a reason for being a popular model.

Key Concepts in Bayesian Linear Regression in Python

The Bayesian regression model works on several key concepts which define the working of Bayesian regression and it is important to understand these key concepts for understanding the working of Bayesian regression. Some of the key concepts are discussed below:

Bayes’ theorem

Bayes theorem is an important and fundamental concept of Bayesian regression as it provides the path for updating beliefs about the parameters of interest on the new dataset and is mathematically expressed using the following equation:

where:

  • P(θ | D) represents the posterior probability distribution of the parameters θ given the data D
  • P(D | θ) represents the likelihood of the data D given the parameters θ
  • P(θ) represents the prior probability distribution of the parameters θ
  • P(D) represents the marginal likelihood of the data D, which acts as a normalization constant

Bayesian regression estimates the posterior distribution of model parameters where the likelihood function describes the probability of observed data with a provided set of parameter values. Thus, the Bayes theorem is a key concept in Bayesian regression and provides a principled way for incorporating prior knowledge and updating the beliefs based on new data to estimate the posterior distribution of model parameters.

Prior probability

Prior probability in Bayesian regression refers to probability distribution assigned to model parameters before observing data and reflects upon our knowledge or beliefs about the parameter values. Prior probability is chosen on a variety of considerations which are:

  1. Expert knowledge: The presence of prior knowledge for parameter values helps us incorporate this expert knowledge in prior distribution.
  2. Non-informative priors: In some cases, prior knowledge about the parameters might not be available to us, so we need to use non-informative priors in such cases and these priors are designed to be uninformative.
  3. Empirical Bayes: It is a technique to estimate prior distribution from the data and involves the estimation of parameters of the prior distribution on the basis of observed data and then using estimated prior distribution for computation of posterior distribution.

Thus, prior probability is very helpful and is a key point for the results of the analysis that can influence model fit, uncertainty intervals and estimated parameter values.

Likelihood

Likelihood deals with the probability distribution of observed data with provided model parameters and is denoted by P(D|θ) which describes the probability of observing data D for a particular set of parameter values θ. It is a key component of Bayesian regression and is used for updating prior beliefs about parameter values on the basis of observed data. The likelihood function gets multiplied by the prior probability distribution of the parameters which is then normalized using marginal likelihood for obtaining the posterior probability distribution of parameters. The likelihood function for linear regression can be written as:

where:

  • y is the vector of observed response values
  • X is the design matrix of predictor variables
  • θ is the vector of unknown regression coefficients
  • σ^2 is the variance of the error term
  • I represent the identity matrix

The likelihood function provides a measure of the best fit of the model and describes how well the model predicts the observed data through the parameter values.

Posterior probability

Posterior probability refers to model parameters’ probability distribution post observing data and is obtained through the application of Bayes theorem allowing us to update our prior beliefs about parameter values on the basis of observed data.

The Posterior probability distribution is denoted as P(θ|D) and represented using the formula:

P(θ|D) = P(D|θ) * P(θ) / P(D)

where:

  • P(D|θ) is the likelihood function,
  • P(θ) is the prior probability distribution of the parameters,
  • P(D) is the marginal likelihood.

After observing the data, posterior probability distribution summarizes the parameter values and provides a complete description of uncertainty in parameter estimates to make predictions about future observations. It is characterized through mean and variance to provide point estimates and make predictions for future observations.

Bayesian Regression algorithm in Python
Step of Bayesian Regression algorithm in Python

Conjugate priors

In the Bayesian regression model, conjugate prior refers to a prior probability distribution that results in posterior distribution when combined with likelihood function and is useful as they allow easy computation for posterior distribution. The conjugate prior for regression coefficients in linear regression of normally distributed errors is a normal distribution. Using conjugate prior simplifies the computation of posterior distribution in Bayesian regression which allows for easy interpretation and updating of prior beliefs about model parameters.

Markov Chain Monte Carlo (MCMC) sampling

Markov Chain Monte Carlo (MCMC) sampling is a popular computational method to estimate the posterior probability distribution in Bayesian regression and is useful when a posterior distribution is complex. MCMC generates a sequence of samples from posterior distribution so that the distribution of the sample increases.

The key idea involves using a Markov chain to generate samples where transition probabilities between states of the chain depend only on the current state and not on previous states. MCMC can be used for generating samples from the posterior distribution of model parameters for the observed data and prior distribution.

It provides a flexible and computationally efficient approach for estimating posterior distribution in Bayesian regression and can be used for the generation of point estimates and full posterior distributions with measures of uncertainty. The user must see to it that the algorithm has converged to true posterior distribution and that samples are not biased or correlated through the mixing of the Markov chain.

Implementation of Bayesian Linear Regression in Python

Problem Statement

The main problem in the world of financial investing is to create an ideal portfolio that balances risk and reward. When estimating asset returns and volatility using conventional approaches, such as Markowitz’s Modern Portfolio Theory, historical data is used, but these uncertainties are ignored.

The goal of this research is to use Bayesian regression to create a more reliable portfolio optimization method. This method will provide a more dynamic, probabilistic evaluation of portfolio risk and return, which will eventually result in more well-informed investment decisions. It will also account for uncertainty in asset returns.

Dataset Description:

The dataset contains information about Australian stock prices.

Link to the dataset- https://www.kaggle.com/datasets/ashbellett/australian-historical-stock-prices

Note- Create a virtual environment and install the python version 3.11. After that run the below code.

Implementation:

The first step is to import all the required libraries

import numpy as np
import pandas as pd
import pymc3 as pm

The second step is to load the CSV dataset

# Read individual stock market datasets and combine them into one DataFrame
files = ['WOR.csv', 'WOW.csv', 'WPL.csv', 'WTC.csv']

The third step is to perform outer join on all the files based on the Adjusted close column.

data = pd.DataFrame()

for file in files:
    stock_data = pd.read_csv(file, index_col='Date', parse_dates=True)
    stock_data = stock_data[['Adj Close']]  # Keep only the 'Adjusted Close' column
    stock_data.columns = [file.split('.')[0]]  # Rename the column to the company's name
    if data.empty:
        data = stock_data
    else:
        data = data.join(stock_data, how='outer')

The Fourth step is to compute daily returns, covariance matrix of the daily returns and define the bayesian regression model.

# Calculate daily returns and drop missing values
returns = data.pct_change().dropna()

# Calculate the covariance matrix of the daily returns
cov_matrix = returns.cov()

# Define Bayesian regression model for portfolio optimization
n_assets = len(data.columns)

The fifth step is to run the model in order to maximize the values of sharp ratio.

with pm.Model() as portfolio_model:
    # Define prior distributions for the asset weights
    weights = pm.Dirichlet('weights', a=np.ones(n_assets), shape=n_assets)
    
    # Calculate the portfolio returns using the weights
    portfolio_return = pm.Deterministic('portfolio_return', pm.math.dot(returns.mean(), weights))
    
    # Calculate the portfolio volatility using the weights and covariance matrix
    portfolio_volatility = pm.Deterministic('portfolio_volatility', 
                                            pm.math.sqrt(pm.math.dot(pm.math.dot(weights, cov_matrix), weights.T)))
    
    # Define the utility function (e.g., Sharpe ratio) to optimize
    sharpe_ratio = pm.Deterministic('sharpe_ratio', portfolio_return / portfolio_volatility)
    
    # Maximize the utility function
    pm.Potential('sharpe_ratio', sharpe_ratio)

Now perform the Bayesian optimization in order to find the optimal weights of each portfolio

# Perform Bayesian optimization to find the optimal portfolio weights
with portfolio_model:
    trace = pm.sample(draws=2000, tune=1000, target_accept=0.95, return_inferencedata=False)

Last but not the least is to display the output of the Bayesian regression model.

# Display the results
for file, weight in zip(files, optimal_weights):
    print(f"{file.split('.')[0]}: {weight:.4f}")

print("\nOptimal Portfolio Weights: ", optimal_weights)

The output will look like as follows

Output of Portfolio optimization using Bayesian Regression

Conclusion

This blog is structured in a way to shows us the advantages of using Bayesian regression and compares the Bayesian method with the Frequentists method to show how the Bayesian regression is much more efficient as compared to the Frequentists methods. The advantages of using Bayesian regression are also listed and the implementation of Bayesian regression to solve real-world problems is also discussed.

Finally, we discuss the key components of Bayesian regression which will help data scientists understand Bayesian regression more closely and they will be able to implement it with more confidence. This blog is very helpful for aspiring data scientists and also for experts who need knowledge regarding the application of Bayesian regression.

If you like the article and would like to support me, make sure to: