Singular Value Decomposition (SVD)#

Any rectangular matrix \(\boldsymbol A \in \mathbb R^{m\times n}\) has a singular value decomposition

(59)#\[\begin{split}\boldsymbol A = \boldsymbol U \boldsymbol \Sigma \boldsymbol V^\mathsf{T} = [\boldsymbol u_1 \ldots \boldsymbol u_m] \begin{pmatrix} \boldsymbol \Lambda & \boldsymbol 0 \\ \boldsymbol 0 & \boldsymbol 0 \end{pmatrix} \begin{pmatrix} \boldsymbol v_1^\mathsf{T}\\ \vdots \\ \boldsymbol v_n^\mathsf{T} \end{pmatrix}.\end{split}\]

Here

  • \(\boldsymbol U = [\boldsymbol u_1 \ldots \boldsymbol u_m] \in\mathbb R^{m\times m}\) is an orthogonal matrix;

  • \(\boldsymbol \Sigma = \begin{pmatrix} \boldsymbol \Lambda & \boldsymbol 0 \\ \boldsymbol 0 & \boldsymbol 0 \end{pmatrix} \in \mathbb R^{m\times n}\);

  • \(\boldsymbol \Lambda = \mathrm{diag}\{\sigma_1, \ldots, \sigma_r\}\), \(r \leqslant \min\{m, n\}\), \(\sigma_1 \geqslant \sigma_2 \geqslant \ldots \geqslant \sigma_r > 0\);

  • \(\boldsymbol V = [\boldsymbol v_1 \ldots \boldsymbol v_n] \in\mathbb R^{n\times n}\) is an orthogonal matrix.

The numbers \(\sigma_1, \ldots, \sigma_r\) are called the singular values of \(\boldsymbol A\).

Q. What is the shape of all blocks of the matrix \(\boldsymbol \Sigma\)?

SVD in NumPy#

import numpy as np
A = np.arange(6).reshape(2, 3)
A
array([[0, 1, 2],
       [3, 4, 5]])

To find SVD, use np.linalg.svd:

U, S, V = np.linalg.svd(A)
print("U:")
print(U)
print("Singular values:", S)
print("V:")
print(V)
U:
[[-0.27472113 -0.96152395]
 [-0.96152395  0.27472113]]
Singular values: [7.34846923 1.        ]
V:
[[-0.39254051 -0.56077215 -0.7290038 ]
 [ 0.82416338  0.13736056 -0.54944226]
 [ 0.40824829 -0.81649658  0.40824829]]

Check the decomposition:

Sigma = np.hstack((np.diag(S), np.zeros(shape=(2, 1))))
U @ Sigma @ V
array([[9.6365988e-16, 1.0000000e+00, 2.0000000e+00],
       [3.0000000e+00, 4.0000000e+00, 5.0000000e+00]])

Check that \(\Vert A \Vert_F = \Vert S\Vert_2\):

np.linalg.norm(S), np.linalg.norm(A)
(7.416198487095663, 7.416198487095663)

If \(\boldsymbol A\) is a positive semidefinite square matrix (\(m=n\)) then according to the spectral theorem

\[ \boldsymbol A = \boldsymbol Q \boldsymbol \Lambda \boldsymbol Q^\mathsf{T} \]

where \(\boldsymbol Q\) is orthogonal, \(\boldsymbol \Lambda\) is diagonal with nonnegative elements on the main diagonal. Hence, the spectral decomposition is a special case of SVD.

A = np.array([[3, 2, 1], [2, 3, 2], [1, 2, 3]])
A
array([[3, 2, 1],
       [2, 3, 2],
       [1, 2, 3]])

Spectral decomposition can also be calculated by np.linalg.svd:

U, S, V = np.linalg.svd(A)
U, S, V
(array([[-5.41774320e-01,  7.07106781e-01,  4.54401349e-01],
        [-6.42620551e-01,  3.88578059e-16, -7.66184591e-01],
        [-5.41774320e-01, -7.07106781e-01,  4.54401349e-01]]),
 array([6.37228132, 2.        , 0.62771868]),
 array([[-5.41774320e-01, -6.42620551e-01, -5.41774320e-01],
        [ 7.07106781e-01,  5.55111512e-17, -7.07106781e-01],
        [ 4.54401349e-01, -7.66184591e-01,  4.54401349e-01]]))

Here \(\boldsymbol U = \boldsymbol V^\mathsf{T}\):

np.linalg.norm(U - V.T)
8.563843312570936e-16

Reduced form of SVD#

SVD (59) is equivalent to \(\boldsymbol{A} = \boldsymbol U_r \boldsymbol \Lambda \boldsymbol V_r^\mathsf{T}\) where

\[ \boldsymbol{U}_r = [\boldsymbol u_1 \ldots \boldsymbol u_r],\quad \boldsymbol{V}_r = [\boldsymbol v_1 \ldots \boldsymbol v_r]. \]

Also, this can be written as

\[\begin{split} \boldsymbol{A} = [\sigma_1 \boldsymbol {u}_1 \ldots \sigma_r \boldsymbol {u}_r] \begin{pmatrix} \boldsymbol v_1^\mathsf{T} \\ \vdots \\ \boldsymbol v_r^\mathsf{T} \end{pmatrix} = \sum\limits_{i=1}^r \sigma_i \boldsymbol u_i\boldsymbol v_i^\mathsf{T}. \end{split}\]

Hence, SVD allows to represent a matrix as a sum of one-rank matrices.

Truncated SVD#

Let \(\boldsymbol A = \boldsymbol U \boldsymbol \Sigma \boldsymbol V^\mathsf{T}\) be SVD of \(\boldsymbol A\). Truncated SVD is

\[ \hat{\boldsymbol A}_K = \boldsymbol U_K \boldsymbol \Sigma_K \boldsymbol V_K^\mathsf{T},\quad \Sigma_K = \mathrm{diag}\{\sigma_1, \ldots, \sigma_K\},\quad 1 \leqslant K \leqslant r. \]

Note that \(\hat{\boldsymbol A}_K\) has rank \(K\) and \(\hat{\boldsymbol A}_K = \boldsymbol A\) if \(K=r\). It turns out that \(\hat{\boldsymbol A}_K\) is the best approximation of \(\boldsymbol A\) among all matrices of rank \(K\):

\[ \min\limits_{\boldsymbol B} \Vert \boldsymbol A - \boldsymbol B\Vert_F^2 = \Vert \boldsymbol A - \hat{\boldsymbol A}_K\Vert_F^2 = \sum\limits_{i=K+1}^r \sigma_i^2. \]

Image compression#

Based on [Murphy, 2022], section 7.5.

import numpy as np
import matplotlib.pyplot as plt 
import requests
import io
from PIL import Image
from sys import getsizeof

%config InlineBackend.figure_format = 'svg'

r = requests.get('https://github.com/probml/probml-data/blob/main/data/clown.png?raw=true', stream=True)
img = Image.open(io.BytesIO(r.content))
plt.imshow(img);
../../_images/fc93897eda3c079a2f864ab020dd207d4e83e9542ca3029ffe19a9d6b4825eac.svg
X = np.array(img)
rk = np.linalg.matrix_rank(X)
print("Shape:", X.shape)
print("Rank:", rk)
Shape: (200, 320)
Rank: 200
def compress_img(X, r):
    U, sigma, V = np.linalg.svd(X, full_matrices=True)
    x_hat = np.dot(np.dot(U[:, :r], np.diag(sigma[:r])), V[:r, :])  
    plt.imshow(x_hat, cmap='gray')

1-rank approximation:

compress_img(X, 1)
../../_images/f2fcef63b1a69fb7c5ca87c9f7bc158487cd420d26fd6df5ae094245a330ca46.svg

\(2\)-rank approximaion:

compress_img(X, 2)
../../_images/468cd6c2d22dac37bc3bafe91f3f862fae5a4c879d16d3d8322fd1d987691615.svg

\(5\)-rank approximation:

compress_img(X, 5)
../../_images/f20b99a7b1606614e60e8e2c4afd1484e26e6d5927ea0b4159c7fff7f51044b4.svg

\(10\)-rank approximation:

compress_img(X, 10)
../../_images/302210e027d3073602fe0a6d48acbfa3fe1065ca6b781318e615feaf2c645521.svg

\(20\)-rank approximation is quite admittable:

compress_img(X, 20)
../../_images/9641e50745aea5f58e8c05ba96488ec0d113475b8effbe84fb72a83f522f3177.svg

Now plot the error of arroximation:

plot_SVD_approx(X)
../../_images/cf934fc067f3dd850abaa0fe7e79516b6fb7f1afeb1bec35940d4c2ab2c50248.svg

Exercises#

  1. Show that \(\Vert \boldsymbol A \Vert_F^2 = \sum\limits_{i = 1}^r \sigma_i^2\).

  2. Find SVD of a one-rank matrix \(\boldsymbol A = \boldsymbol{uv}^\mathsf{T}\), \(\boldsymbol u \in \mathbb R^m\), \(\boldsymbol v \in \mathbb R^n\).

  3. Prove that \(\boldsymbol u_i\) are eigenvectors of \(\boldsymbol {AA}^\mathsf{T}\), whereas \(\boldsymbol v_i\) are eigenvectors of \(\boldsymbol A^\mathsf{T} \boldsymbol A\). What about eigenvalues?

  4. Show that \(\Vert \boldsymbol A - \hat{\boldsymbol A}_K\Vert_F^2 = \sum\limits_{i=K+1}^r \sigma_i^2\).