Singular Value Decomposition (SVD)#

The SVD separates any matrix \(\boldsymbol A \in \mathbb R^{m\times n}\) into rank one pieces

\[ \boldsymbol{uv}^\mathsf{T} = \mathrm{column} \times \mathrm{row}, \quad \boldsymbol u \in \mathbb R^m, \quad \boldsymbol v \in \mathbb R^n. \]

The columns \(\boldsymbol u\) and rows \(\boldsymbol v^\mathsf{T}\) are eigenvectors of symmetric matrices \(\boldsymbol{AA}^\mathsf{T}\) and \(\boldsymbol A^\mathsf{T} \boldsymbol A\).

Singular value theorem#

Any rectangular matrix \(\boldsymbol A \in \mathbb R^{m\times n}\) has a singular value decomposition

(61)#\[\begin{split}\boldsymbol A = \boldsymbol U \boldsymbol \Sigma \boldsymbol V^\mathsf{T} = [\boldsymbol u_1 \ldots \boldsymbol u_m] \begin{pmatrix} \boldsymbol \Lambda_r & \boldsymbol 0 \\ \boldsymbol 0 & \boldsymbol 0 \end{pmatrix} \begin{pmatrix} \boldsymbol v_1^\mathsf{T}\\ \vdots \\ \boldsymbol v_n^\mathsf{T} \end{pmatrix} = \sum\limits_{i=1}^r \sigma_i \boldsymbol u_i\boldsymbol v_i^\mathsf{T}.\end{split}\]

Here

  • \(\boldsymbol U = [\boldsymbol u_1 \ldots \boldsymbol u_m] \in\mathbb R^{m\times m}\) is an orthogonal matrix;

  • \(\boldsymbol \Sigma = \begin{pmatrix} \boldsymbol \Lambda_r & \boldsymbol 0 \\ \boldsymbol 0 & \boldsymbol 0 \end{pmatrix} \in \mathbb R^{m\times n}\);

  • \(\boldsymbol \Lambda_r = \mathrm{diag}\{\sigma_1, \ldots, \sigma_r\}\), \(r \leqslant \min\{m, n\}\), \(\sigma_1 \geqslant \sigma_2 \geqslant \ldots \geqslant \sigma_r > 0\);

  • \(\boldsymbol V = [\boldsymbol v_1 \ldots \boldsymbol v_n] \in\mathbb R^{n\times n}\) is an orthogonal matrix.

The numbers \(\sigma_1, \ldots, \sigma_r\) are called the singular values of \(\boldsymbol A\).

Q. What is the shape of all blocks of the matrix \(\boldsymbol \Sigma\)?

SVD in NumPy#

import numpy as np
A = np.arange(6).reshape(2, 3)
A
array([[0, 1, 2],
       [3, 4, 5]])

To find SVD, use np.linalg.svd:

U, S, V = np.linalg.svd(A)
print("U:")
print(U)
print("Singular values:", S)
print("V:")
print(V)
U:
[[-0.27472113 -0.96152395]
 [-0.96152395  0.27472113]]
Singular values: [7.34846923 1.        ]
V:
[[-0.39254051 -0.56077215 -0.7290038 ]
 [ 0.82416338  0.13736056 -0.54944226]
 [ 0.40824829 -0.81649658  0.40824829]]

Check the decomposition:

Sigma = np.hstack((np.diag(S), np.zeros(shape=(2, 1))))
U @ Sigma @ V
array([[9.6365988e-16, 1.0000000e+00, 2.0000000e+00],
       [3.0000000e+00, 4.0000000e+00, 5.0000000e+00]])

Check that \(\Vert A \Vert_F = \Vert S\Vert_2\):

np.linalg.norm(S), np.linalg.norm(A)
(7.416198487095663, 7.416198487095663)

If \(\boldsymbol A\) is a positive semidefinite square matrix (\(m=n\)) then according to the spectral theorem

\[ \boldsymbol A = \boldsymbol Q \boldsymbol \Lambda \boldsymbol Q^\mathsf{T} \]

where \(\boldsymbol Q\) is orthogonal, \(\boldsymbol \Lambda\) is diagonal with nonnegative elements on the main diagonal.

A = np.array([[3, 2, 1], [2, 3, 2], [1, 2, 3]])
A
array([[3, 2, 1],
       [2, 3, 2],
       [1, 2, 3]])

Spectral decomposition can also be calculated by np.linalg.svd:

U, S, V = np.linalg.svd(A)
U, S, V
(array([[-5.41774320e-01,  7.07106781e-01,  4.54401349e-01],
        [-6.42620551e-01,  3.88578059e-16, -7.66184591e-01],
        [-5.41774320e-01, -7.07106781e-01,  4.54401349e-01]]),
 array([6.37228132, 2.        , 0.62771868]),
 array([[-5.41774320e-01, -6.42620551e-01, -5.41774320e-01],
        [ 7.07106781e-01,  5.55111512e-17, -7.07106781e-01],
        [ 4.54401349e-01, -7.66184591e-01,  4.54401349e-01]]))

Here \(\boldsymbol U = \boldsymbol V^\mathsf{T}\):

np.linalg.norm(U - V.T)
8.563843312570936e-16

Reduced form of SVD#

SVD (61) is equivalent to \(\boldsymbol{A} = \boldsymbol U_r \boldsymbol \Lambda \boldsymbol V_r^\mathsf{T}\) where

\[ \boldsymbol{U}_r = [\boldsymbol u_1 \ldots \boldsymbol u_r],\quad \boldsymbol{V}_r = [\boldsymbol v_1 \ldots \boldsymbol v_r]. \]

Also, this can be written as

\[\begin{split} \boldsymbol{A} = [\sigma_1 \boldsymbol {u}_1 \ldots \sigma_r \boldsymbol {u}_r] \begin{pmatrix} \boldsymbol v_1^\mathsf{T} \\ \vdots \\ \boldsymbol v_r^\mathsf{T} \end{pmatrix} = \sum\limits_{i=1}^r \sigma_i \boldsymbol u_i\boldsymbol v_i^\mathsf{T}. \end{split}\]

Hence, SVD allows to represent a matrix as a sum of one-rank matrices.

Truncated SVD#

Let \(\boldsymbol A = \boldsymbol U \boldsymbol \Sigma \boldsymbol V^\mathsf{T}\) be SVD of \(\boldsymbol A\). Truncated SVD is

\[ \hat{\boldsymbol A}_K = \boldsymbol U_K \boldsymbol \Sigma_K \boldsymbol V_K^\mathsf{T},\quad \Sigma_K = \mathrm{diag}\{\sigma_1, \ldots, \sigma_K\},\quad 1 \leqslant K \leqslant r. \]

Note that \(\hat{\boldsymbol A}_K\) has rank \(K\) and \(\hat{\boldsymbol A}_K = \boldsymbol A\) if \(K=r\). It turns out that \(\hat{\boldsymbol A}_K\) is the best approximation of \(\boldsymbol A\) among all matrices of rank \(K\):

\[ \min\limits_{\boldsymbol B} \Vert \boldsymbol A - \boldsymbol B\Vert_F^2 = \Vert \boldsymbol A - \hat{\boldsymbol A}_K\Vert_F^2 = \sum\limits_{i=K+1}^r \sigma_i^2. \]

Image compression#

Based on [Murphy, 2022], section 7.5.

import numpy as np
import matplotlib.pyplot as plt 
import requests
import io
from PIL import Image
from sys import getsizeof

%config InlineBackend.figure_format = 'svg'

r = requests.get('https://github.com/probml/probml-data/blob/main/data/clown.png?raw=true', stream=True)
img = Image.open(io.BytesIO(r.content))
plt.imshow(img);
../../_images/a3e9d3d8ace1a573d2ef18ac97b08ef0ed5bb6745bf89a4fbf3f173e2ac9f525.svg
X = np.array(img)
rk = np.linalg.matrix_rank(X)
print("Shape:", X.shape)
print("Rank:", rk)
Shape: (200, 320)
Rank: 200
def compress_img(X, r):
    U, sigma, V = np.linalg.svd(X, full_matrices=True)
    x_hat = np.dot(np.dot(U[:, :r], np.diag(sigma[:r])), V[:r, :])  
    plt.imshow(x_hat, cmap='gray')

1-rank approximation:

compress_img(X, 1)
../../_images/49c24f42d246c8e34fbaea20a39d53f2de17a02744c2a5892ad3739c293ad214.svg

\(2\)-rank approximaion:

compress_img(X, 2)
../../_images/e13c2e8cc9d8dc70d1173c05a7bf6619d765dbdaf25315c767f8c17b3c4fa48e.svg

\(5\)-rank approximation:

compress_img(X, 5)
../../_images/a83256d0a70572a1a31a5e3ac05671bc3d6e936776a531eb4aa20f294ed5ecc3.svg

\(10\)-rank approximation:

compress_img(X, 10)
../../_images/de8a75850a42e2aeb02040d222f5453efefe7bc6cd540b8257f6e0fc499d4939.svg

\(20\)-rank approximation is quite admittable:

compress_img(X, 20)
../../_images/644f24c38558aab0ec0b26b74e34560c4ab74f076f406b51853bcba39795bbd4.svg

Now plot the error of arroximation:

plot_SVD_approx(X)
../../_images/b812b7a86551db939013809b0be7f72f0f0b5191ed77f6391ac5d9b5873cf147.svg

Exercises#

  1. Show that \(\Vert \boldsymbol A \Vert_F^2 = \sum\limits_{i = 1}^r \sigma_i^2\).

  2. Find SVD of a one-rank matrix \(\boldsymbol A = \boldsymbol{uv}^\mathsf{T}\), \(\boldsymbol u \in \mathbb R^m\), \(\boldsymbol v \in \mathbb R^n\).

  3. Prove that \(\boldsymbol u_i\) are eigenvectors of \(\boldsymbol {AA}^\mathsf{T}\).

  4. Explain why \(\mathrm{span}(\boldsymbol v_1, \ldots, \boldsymbol v_r) = C(\boldsymbol A^\mathsf{T})\).

  5. Find SVD of matrices

    \[\begin{split} \begin{pmatrix} 3 & 0 \\ 4 & 5 \end{pmatrix}, \quad \begin{pmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \\ 0 & 0 & 0 & 0 \\ \end{pmatrix}, \quad \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \end{pmatrix}. \end{split}\]
  6. Show that \(\Vert \boldsymbol A - \hat{\boldsymbol A}_K\Vert_F^2 = \sum\limits_{i=K+1}^r \sigma_i^2\).