Multinomial logistic regression#
Let \(\mathcal Y\) be a finite set, e.g. \(\mathcal Y=\{1, 2, \ldots, K\}\), and the training dataset
Multinomial logistic regression predicts a vector of probabilities
How to predict these probabilities? We can use a linear model whose output is a vector of \(K\) different numbers:
Now convert the vector \(\boldsymbol z \in \mathbb R^K\) to the vector of probabilities \(\boldsymbol{\widehat y}\) via softmax:
If we need to pick a class, we can choose the most probable one:
The parameters \(\boldsymbol w_k\) naturally form a matrix
Q. What is the shape of this matrix? How many parameters does multinomial logistic regression have?
Loss function#
The optimal parameters \(\boldsymbol W\) are solutions of the following optimization problem:
If the targets \(y_k\) are one-hot encoded, then they from a matrix
Accordingly, the loss function (30) can be written as
and this is generally the cross-entropy loss (7), taken with opposite sign.
Regularized version:
Q. Why do we see minus before the regularization term?
Example: MNIST#
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.datasets import fetch_openml
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
%config InlineBackend.figure_format = 'svg'
X, Y = fetch_openml('mnist_784', return_X_y=True, parser='auto')
X = X.astype(float).values / 255
Y = Y.astype(int).values
Visualize data#
Visualize some random samples:
Splitting into train and test#
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=10000)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
((60000, 784), (10000, 784), (60000,), (10000,))
Check that the classes are balanced:
np.unique(y_test, return_counts=True)
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([ 945, 1233, 995, 1036, 972, 903, 970, 1000, 950, 996]))
Fit and evaluate#
Fit the logistic regression:
%%time
LR = LogisticRegression(max_iter=100)
LR.fit(X_train, y_train)
CPU times: user 35.3 s, sys: 641 ms, total: 36 s
Wall time: 19 s
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
LogisticRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LogisticRegression()
Make predictions:
y_hat = LR.predict(X_test)
y_hat
array([5, 4, 4, ..., 3, 2, 3])
We can also predict probabilities:
y_proba = LR.predict_proba(X_test)
y_proba[:3]
array([[1.65095773e-04, 1.43385260e-07, 1.14780168e-05, 2.99636017e-01,
5.98926346e-07, 6.39615667e-01, 1.80372814e-07, 1.76480310e-07,
5.86493420e-02, 1.92130044e-03],
[1.21475181e-04, 1.49673625e-06, 1.08891109e-03, 1.24093043e-05,
9.14920772e-01, 1.95035318e-04, 1.16798504e-04, 1.85831413e-04,
1.03925873e-02, 7.29646828e-02],
[9.63965076e-09, 9.52335705e-09, 1.80277138e-07, 4.04423620e-09,
9.99908083e-01, 2.32334829e-06, 7.00253101e-06, 8.06028699e-06,
6.62953819e-06, 6.76977960e-05]])
Calculate metrics:
print("Accuracy:", accuracy_score(y_test, y_hat))
Accuracy: 0.9215
Visualize performance#
plt.figure(figsize=(10, 8))
plt.title("Logistic regression on MNIST")
sns.heatmap(confusion_matrix(y_test, y_hat), annot=True);
Plot some samples with predictions and ground truths:
plot_digits(X_test, y_test, y_hat)
Exercises#
Denote \(\boldsymbol{\widehat Y} = (\widehat y_{ik}) = \mathrm{Softmax}(\boldsymbol {XW})\) (softmax is applied to each row). Rewrite the loss function (31) in matrix form.