Machine Learning - Bayesian Optimisation

Ifrim Ciprian
Mar 18, 2022
2 min read

Updated: Apr 7, 2022

The previous models were optimised with Random Forest and Grid Search, so I decided to use Bayesian Optimisation to imporve the hyperparameters, in order to increase the accuracy on the test dataset.

Bayesian optimization is a sequential design strategy for global optimization of black-box functions that does not assume any functional forms. It is usually employed to optimize expensive-to-evaluate functions.

Bayesian optimization is particularly advantageous for problems where f(x) is difficult to evaluate, is a black box with some unknown structure, relies upon less than 20 dimensions, and where derivatives are not evaluated.

Since the objective function is unknown, the Bayesian strategy is to treat it as a random function and place a prior over it. The prior captures beliefs about the behavior of the function. After gathering the function evaluations, which are treated as data, the prior is updated to form the posterior distribution over the objective function. The posterior distribution, in turn, is used to construct an acquisition function (often also referred to as infill sampling criteria) that determines the next query point.

There are several methods used to define the prior/posterior distribution over the objective function. The most common two methods use Gaussian Processes in a method called Kriging. Another less expensive method uses the Parzen-Tree Estimator to construct two distributions for 'high' and 'low' points, and then finds the location that maximizes the expected improvement.

Standard Bayesian optimization relies upon each {\displaystyle x\in A} being easy to evaluate, and problems that deviate from this assumption are known as exotic Bayesian optimization problems. Optimization problems can become exotic if it is known that there is noise, the evaluations are being done in parallel, the quality of evaluations relies upon a tradeoff between difficulty and accuracy, the presence of random environmental conditions, or if the evaluation involves derivatives.

Here is an animation to show the process:

It is actually quite faster compared to Grid Search of Random Search in finding the best accuracy and lowest loss.

Here is the XGBoost optimised:

Here is the RFC optimised:

The overall accuracy is higher than compared with the same estimators optimised with Grid Search or Random Search, however, the issue is that looking at the per class performance, the Rain and Cloudy datasets have very high accuracy, and every other class is 0 or very low.

Because of that, I will be keeping the Grid Search optimised estimators.

Machine Learning - Bayesian Optimisation

Recent Posts

Comments