Bias-variance decomposition#

Resources:

Bias—variance decomposition for MSE#

Consider a regression model

\[ y = f(\boldsymbol x) + \varepsilon, \quad \mathbb E \varepsilon = 0,\quad \mathbb V \varepsilon = \sigma^2 \]

(cp. with probabilistic model for linear regression).

To estimate the error for prediction, calculate expectation of MSE:

\[ \mathbb E\big[(y(\boldsymbol x) - \widehat y(\boldsymbol x))^2 \big] = \mathbb E\big[(f(\boldsymbol x) + \varepsilon - \widehat y)^2 \big] = \mathbb E\big[(f(\boldsymbol x) - \widehat y)^2 \big] + \sigma^2. \]

Note that the prediction \(\widehat y\) is a random variable depending on the training dataset. Thus,

\[ \mathbb E\big[(f(\boldsymbol x) - \widehat y)^2 \big]= \mathbb E\big[(f(\boldsymbol x) - \mathbb E\widehat y + \mathbb E\widehat y - \widehat y)^2 \big] = \mathbb E\big(\underbrace{f(\boldsymbol x) - \mathbb E\widehat y}_{\mathrm{bias}}\big)^2 + \underbrace{\mathbb E\big[(\mathbb E\widehat y - \widehat y)^2\big]}_{\mathrm{variance}}. \]

The bias-variance decomposition:

\[ \mathrm{MSE} = \mathrm{bias}^2 + \mathrm{variance} + \sigma^2. \]