Bias-variance decomposition#
Resources:
[Hastie et al., 2009] (pp. 219-228)
Bias—variance decomposition for MSE#
Consider a regression model
\[
y = f(\boldsymbol x) + \varepsilon, \quad \mathbb E \varepsilon = 0,\quad \mathbb V \varepsilon = \sigma^2
\]
(cp. with probabilistic model for linear regression).
To estimate the error for prediction, calculate expectation of MSE:
\[
\mathbb E\big[(y(\boldsymbol x) - \widehat y(\boldsymbol x))^2 \big] = \mathbb E\big[(f(\boldsymbol x) + \varepsilon - \widehat y)^2 \big] = \mathbb E\big[(f(\boldsymbol x) - \widehat y)^2 \big] + \sigma^2.
\]
Note that the prediction \(\widehat y\) is a random variable depending on the training dataset. Thus,
\[
\mathbb E\big[(f(\boldsymbol x) - \widehat y)^2 \big]=
\mathbb E\big[(f(\boldsymbol x) - \mathbb E\widehat y + \mathbb E\widehat y - \widehat y)^2 \big] = \mathbb E\big(\underbrace{f(\boldsymbol x) - \mathbb E\widehat y}_{\mathrm{bias}}\big)^2 + \underbrace{\mathbb E\big[(\mathbb E\widehat y - \widehat y)^2\big]}_{\mathrm{variance}}.
\]
The bias-variance decomposition:
\[
\mathrm{MSE} = \mathrm{bias}^2 + \mathrm{variance} + \sigma^2.
\]