Does Ensemble Models Always Improve Accuracy?

Published in

Analytics Vidhya

9 min readOct 12, 2020

Introduction

Machine Learning Models:

In general machine learning models work in a sequence wherein we provide an input, the model classifier fits on the data and comes up with an output. The input data comes from a single sample. The data generally consists of independent variables(X) and the dependent variable(Y). The Model predictions is represented as (ŷ).

In ensemble machine learning models, several samples are drawn from a single population and several classifiers are fit on these samples. Each classifier comes up with an output which is the predicted value for that model classifier. The final output of the ensemble model is the aggregate of the individual model predictions.

So how does aggregating the outputs of multiple model classifiers help? And why do we need an ensemble model?

Certain machine learning models like the decision trees are prone to over fitting (Eg: Models learn the train data very well but do not perform well on new data (test data). We say that the variance of such models is high, meaning the outputs and accuracy of such models vary a lot. Ensemble models tend to reduce this variance.

In real time scenarios, we generally have a single sample to input into the machine learning model. As discussed above, multiple samples of same size are taken from a population to fit multiple classifiers in ensemble models. So how do we get multiple samples from the same population? Here we would need to understand the statistical concept of bootstrapping.

Bootstrapping

Bootstrapping is a sampling technique in which multiple samples are taken from the same population with replacement. Mathematically it is proven that a bootstrap sample contains approximately 63% unique values. The central limit theorem also applies on these bootstrap samples (The central limit theorem states that if all possible samples from a population are taken, the sample means follow a normal distribution and the population mean is the mean of the sample means and the standard error of the sample mean is given by the population standard deviation divided by square root of sample size). In ensemble models, bootstrap samples which are of the same size as the training data is taken to fit individual model classifiers. Let’s look at two ensemble modelling techniques bagging and boosting.

The Bootstrap Aggregating (Bagging)

Since the problem of generating multiple samples from a single sample from a population is solved by bootstrapping, in the bagging model ,suppose we have a sample data of size n as our training dataset we can generate k bootstrap samples. On the k bootstrap samples generated we fit k classifiers. Each classifier would produce an output. The aggregate of all the outputs would be our output for the bagging algorithm.

Mathematically the bagging algorithm can be written as:

Where I is the identity function which is 1 if true and 0 if false. Vector x is the input vector and y is the predicted value from the i th classifier.

So how does bagging reduce variance? Intuitively we can understand it in this way, suppose if we have n random variables X1,X2..Xn which are independent and identically distributed and follows a normal distribution of N(μ, σ²) then

follows a N(μ, σ²/n). Here we can see from the central limit theorem that there is reduction in variance when there is an aggregation happening. So theoretically bagging improves accuracy. Bagging also makes interpretation difficult since it fits a number of classifiers on different bootstrap samples and gives and aggregated output. Let us now discuss another ensemble model called boosting.

Boosting

Boosting is another ensemble modelling technique which is used to improve accuracy of predictions by combining a number of classifiers together and making predictions based on the aggregated output of the individual classifiers. So why do we need another ensemble model? To answer this we will list out some of the shortcomings of the bagging algorithm

1. In bagging we fit a number of model classifiers on different bootstrap samples. Some of the classifiers would accurately classify observations and some of them would not be able to classify observations accurately. In the bagging algorithm all classifiers are given equal importance whether it is a good classifier or a bad classifier.

2. No additional emphasis is given to learn the difficult to classify observations so that it can be correctly classified

The boosting algorithm rectifies these shortcomings. Let us now understand how the boosting algorithm works. Let’s say we have a dataset with 6 observations. In round 1 of the boosting algorithm we take a probability sample unlike the bagging algorithm which takes bootstrap samples. Let’s say we fit 2 classifiers on this data C1 and C2 which gives the predicted values. At the start of the boosting equal weights are assigned to all observations. The classifier C1 in this case has predicted the values of X3, X4 and X5 correctly. While the Classifier C2 has predicted X2, X3, X4, X5, X6 correctly (Refer Table). So C2 turns out to be a better classifier since it has classified most of the observations correctly. Also, the observation X1 is the most problematic observation since none of the classifiers are able to classify it correctly. The classifier C3 classifies it correctly. In the next round of boosting, the weights of observations which have been misclassified are increased and the weights of the observations which have been classified correctly are decreased. Let’s say the weights of the Classifier C1, C2, C3 is α1, α2, α3. The final prediction in the boosting algorithm is the sign of the weighted sum of the predictions. Some of the key points for the boosting algorithm are

1. The boosting algorithm would consider the predictions done by classifier C2 as important because C2 predicted most observations correctly

2. It would also consider C3 as important since it was able to classify the most problematic observation correctly (X1)

3. It makes predictions made by C1 as least important since it misclassified most observations

4. Classifiers are weighted as important and not important based on how accurately it classifies observations

So, does ensemble models always improve accuracy?

The basic objective of ensemble models as we have seen above is to reduce variance in a model and improve the accuracy of predictions. So, do you think that ensemble models always improve the accuracy of our predictions? Let’s explore this by implementing the bagging and boosting algorithms on different datasets and checking weather this holds true or not. We will be using python for implementing these algorithms. First, we will see the IBM attrition data which contains 1470 observations and 35 variables. We will select attrition as our target variable since it is binary variable which is yes for an employee who is an attrition case and no for an employee who is not an attrition case. Some of the x variables are education, monthly income, total working years etc. We have imported all the necessary libraries in python, read the data, separated the x and y variables and split the data into train and test data using the sklearn library in python. Now we will fit a single decision tree classifier on the train data and make the predictions on the test data. Generate a confusion matrix using the sklearn package and calculate the accuracy of the predictions. The accuracy for the Decision tree model was 84.80 percent.

Now we will use the bagging algorithm on the same dataset using the sklearn package in python. The n_estimators hyperparameter is the number of trees the bagging algorithm would use in the ensemble model. Now we use this to predict our outputs. Here we can see that the accuracy of the bagged model is slightly lower than the decision tree model (83.90 percent)

Now let’s test the boosting algorithm on the same dataset. We have fit the Ada Boost model on the data. The accuracy of the model is better than the bagging model and is almost similar to the individual decision tree model.

Now let’s implement these algorithms on a smaller dataset. The iris dataset consists of 150 observations and 6 variables namely the Sepal length, Sepal width, petal length, petal width and the species of the flower which consists of 3 values. First, we import all the necessary libraries in python, read the data and divide the data into train and test data. We can see a snapshot of the data in the image. We have separated out the x and y variable and used stratified random sampling for the y variable so that we have proportional values of the y variable in our train and test data. Now we fit a Decision tree model on the training data using the Sklearn package in python, predict the values on the test data and generate our confusion matrix to calculate our model accuracy. The Decision tree classifier was able to classify 42 observations correctly out of the 45 observations with an accuracy of 93.33 percent.

Now let’s use the bagging algorithm on the same data and make predictions on the test data and calculate the accuracy of the model. Surprisingly the accuracy of the bagged model is same as our decision tree model!

Now let’s use our boosting algorithm here and see what happens. We can see here the accuracy of the model dropped by a huge margin. The accuracy of the ada boost model here is only 84.4 percent where as that of the decision tree and bagging model is 93.33 percent. The ada boost model classified only 8 observations of the third class correctly while the decision tree model and the bagging model were able to classify 12 observations of the third class correctly.

So, we started off this section with the question that does ensemble model always improve accuracy? The answer as we have seen from the above examples is no!

Conclusion

In this article we tried to understand why do we need ensemble models and how ensemble models help in reducing the variance of some models by aggregating multiple classifiers together and thereby improving the accuracy of prediction of a model. We explored two popular ensemble models bagging and boosting. We learnt how these algorithms work. We also understood a statistical sampling technique called bootstrapping. We finally came to the question that we were trying to answer, does ensemble models always improve accuracy? With the help of a few practical examples we were able to show that ensemble models do not always improve accuracy. In any modelling exercise it is important to start simple and then improvise with advanced modelling techniques. If a decision tree is adequately making predictions for your given data set, there is no need to use an ensemble model.

Does Ensemble Models Always Improve Accuracy?

Written by Satish Kumar