Polynomial regression#
An obious way to enhance the simple regression model (9) is to add more powers of predictor \(\boldsymbol x\). For example, consider quadratic regression
Now the model has three parameters \(\boldsymbol w = (w_0, w_1, w_2)\), which could be also fitted by optimizing of MSE:
Revisit Boston dataset#
The data look quite suitable for a quadratic regression. Let’s do a simple feature engineering and add new feature of squares. Now the design matrix has two columns:
To fit the linear regression on the new dataset, once again use sklearn
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
boston = pd.read_csv("../ISLP_datsets/Boston.csv")
x = boston['lstat']
y = boston['medv']
LR = LinearRegression()
x_reshaped = x.values.reshape(-1, 1)
x_train = np.hstack([x_reshaped, x_reshaped**2])
LR.fit(x_train, y)
print("intercept:", LR.intercept_)
print("coefficients:", LR.coef_)
print("r-score:", LR.score(x_train, y))
print("MSE:", np.mean((LR.predict(x_train) - y) ** 2))
intercept: 42.86200732816936
coefficients: [-2.3328211 0.04354689]
r-score: 0.6407168971636612
MSE: 30.330520075853713
Our metrics have improved, now plot the graphs:
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'svg'
plt.scatter(x, y, s=10, c='b', alpha=0.7)
xs = np.linspace(x.min(), x.max(), num=100)
plt.plot(xs, LR.intercept_ + LR.coef_[0]*xs + LR.coef_[1]*xs**2, c='r', lw=2)
General case#
Of course, the degree of the polynomial can be any number \(m\in\mathbb N\). The model of the polynomial regression is
Q. How many parameters does this model have?
In case of MSE loss the model is fitted via minimizing the function
