Bias-variance decomposition#
Resources:
[Hastie et al., 2009] (pp. 219-228)
ChatGPT stuff
Bias-variance decomposition is a concept that is closely related to ensembling methods in machine learning. Ensembling methods, such as bagging and boosting, are designed to address the bias-variance trade-off and can help reduce both bias and variance in predictive models. Here’s how bias-variance decomposition and ensembling are connected:
Bias-Variance Trade-Off:
Bias-variance decomposition explains the trade-off between bias and variance in predictive models. High bias results in underfitting (oversimplification), while high variance leads to overfitting (capturing noise).
Ensembling methods aim to strike a balance between bias and variance by combining multiple base models, each with its own bias and variance characteristics.
Ensemble Methods and Bias-Variance Decomposition:
Bagging (Bootstrap Aggregating):
Bagging reduces variance by training multiple base models independently on bootstrapped subsets of the training data and then averaging their predictions (for regression) or taking a majority vote (for classification).
By averaging or voting over multiple models, bagging tends to reduce the overall variance of the ensemble.
The base models in a bagging ensemble may have varying biases, but their combined predictions tend to have lower variance.
Boosting:
Boosting reduces bias by iteratively training base models to focus on data points that were misclassified by previous models. This process reduces the overall bias of the ensemble.
Boosting can also reduce variance to some extent by combining the predictions of multiple base models.
The iterative nature of boosting allows the ensemble to improve its overall performance by reducing bias while maintaining or even reducing variance.
Ensemble Models as a Solution:
Ensemble methods, such as Random Forest (a bagging ensemble) and Gradient Boosting (a boosting ensemble), are popular solutions to the bias-variance trade-off.
These ensemble models combine multiple decision trees (base models) to create a more accurate and robust final model.
Random Forest, for example, reduces variance by averaging over decision trees while maintaining a controlled level of bias.
Hyperparameter Tuning:
Hyperparameter tuning in ensembling methods, such as selecting the number of base models or adjusting their weights, can help control the balance between bias and variance.
Properly tuned ensembles can achieve lower overall bias and variance compared to individual models.
Ensemble Diversity:
The diversity of base models within an ensemble is essential for effectively reducing bias and variance. Diverse models are less likely to make the same errors.
Techniques like feature bagging (random feature selection) in Random Forests and adaptive boosting in boosting methods promote diversity among base models.
In summary, ensembling methods like bagging and boosting are powerful tools for addressing the bias-variance trade-off in machine learning. By combining multiple base models with varying bias and variance characteristics, ensembles can reduce both sources of error and improve overall predictive performance on a wide range of tasks. Properly designed and tuned ensemble models can achieve better generalization while mitigating the overfitting problem associated with high-variance models.
Bias—variance decomposition for MSE#
Consider a regression model
(cp. with probabilistic model for linear regression).
To estimate the error for prediction, calculate expectation of MSE:
Note that the prediction \(\widehat y\) is a random variable depending on the training dataset. Thus,
The bias-variance decomposition: