Singular Value Decomposition (SVD)

Singular Value Decomposition (SVD)#

The SVD separates any matrix \(\boldsymbol A \in \mathbb R^{m\times n}\) into rank one pieces

\[ \boldsymbol{uv}^\mathsf{T} = \mathrm{column} \times \mathrm{row}, \quad \boldsymbol u \in \mathbb R^m, \quad \boldsymbol v \in \mathbb R^n. \]

The columns \(\boldsymbol u\) and rows \(\boldsymbol v^\mathsf{T}\) are eigenvectors of symmetric matrices \(\boldsymbol{AA}^\mathsf{T}\) and \(\boldsymbol A^\mathsf{T} \boldsymbol A\).

Singular value theorem#

Any rectangular matrix \(\boldsymbol A \in \mathbb R^{m\times n}\) has a singular value decomposition

(61)#\[\begin{split}\boldsymbol A = \boldsymbol U \boldsymbol \Sigma \boldsymbol V^\mathsf{T} = [\boldsymbol u_1 \ldots \boldsymbol u_m] \begin{pmatrix} \boldsymbol \Lambda_r & \boldsymbol 0 \\ \boldsymbol 0 & \boldsymbol 0 \end{pmatrix} \begin{pmatrix} \boldsymbol v_1^\mathsf{T}\\ \vdots \\ \boldsymbol v_n^\mathsf{T} \end{pmatrix} = \sum\limits_{i=1}^r \sigma_i \boldsymbol u_i\boldsymbol v_i^\mathsf{T}.\end{split}\]

Here

\(\boldsymbol U = [\boldsymbol u_1 \ldots \boldsymbol u_m] \in\mathbb R^{m\times m}\) is an orthogonal matrix;
\(\boldsymbol \Sigma = \begin{pmatrix} \boldsymbol \Lambda_r & \boldsymbol 0 \\ \boldsymbol 0 & \boldsymbol 0 \end{pmatrix} \in \mathbb R^{m\times n}\);
\(\boldsymbol \Lambda_r = \mathrm{diag}\{\sigma_1, \ldots, \sigma_r\}\), \(r \leqslant \min\{m, n\}\), \(\sigma_1 \geqslant \sigma_2 \geqslant \ldots \geqslant \sigma_r > 0\);
\(\boldsymbol V = [\boldsymbol v_1 \ldots \boldsymbol v_n] \in\mathbb R^{n\times n}\) is an orthogonal matrix.

The numbers \(\sigma_1, \ldots, \sigma_r\) are called the singular values of \(\boldsymbol A\).

Q. What is the shape of all blocks of the matrix \(\boldsymbol \Sigma\)?

Proof of SVD

Let \(\mathrm{rank}(\boldsymbol A) = r\). Based on spectral theorem we can find an orthonormal system \(\boldsymbol v_1, \ldots, \boldsymbol v_r\) such that

\[ %\boldsymbol{AA}^\mathsf{T} \boldsymbol{u}_i = \sigma_i \boldsymbol u_i,\quad \boldsymbol A^\mathsf{T} \boldsymbol{Av}_i = \sigma_i^2 \boldsymbol v_i \]

for some values \(\sigma_1 \geqslant \ldots \geqslant \sigma_r > 0\). Vectors \(\boldsymbol u_i\) are defined by the equalities \(\boldsymbol{Av}_i = \sigma_i \boldsymbol u_i\), \(i=1,\ldots, r\). Now observe that

\[ \boldsymbol u_i^\mathsf{T} \boldsymbol u_j = \frac{(\boldsymbol{Av}_i)^\mathsf{T}}{\sigma_i}\frac{\boldsymbol{Av}_j}{\sigma_j} = \frac{\boldsymbol v_i^\mathsf{T}\boldsymbol A^\mathsf{T}\boldsymbol{Av}_j}{\sigma_i\sigma_j} = \frac{\sigma_j^2}{\sigma_i\sigma_j} \boldsymbol v_i^\mathsf{T}\boldsymbol v_j = \delta_{ij}. \]

Hence, system \(\boldsymbol u_1, \ldots, \boldsymbol u_r\) is also orthonormal. We can complete both systems \(\boldsymbol u_i\) and \(\boldsymbol v_j\) to orthonormal bases in \(\mathbb R^m\) and \(\mathbb R^n\) respectively. Since

\[ \mathrm{span}(\boldsymbol v_1, \ldots, \boldsymbol v_r) = C(\boldsymbol A^\mathsf{T}), \]

due to the fundamental theorem of linear algebra the rest part spans nullspace of \(\boldsymbol A\):

\[ \mathrm{span}(\boldsymbol v_{r+1}, \ldots, \boldsymbol v_n) = N(\boldsymbol A) \]

Therefore, \(\boldsymbol{Av}_i = \boldsymbol 0\) for all \(i = r+1, \ldots, n\).

Finally, establish the equality \(\boldsymbol{AV} = \boldsymbol{U \Sigma}\). LHS is

\[ \boldsymbol{AV} = \boldsymbol A [\boldsymbol v_1 \ldots \boldsymbol v_n] = [\boldsymbol{Av}_1 \ldots \boldsymbol {Av}_r \; \boldsymbol 0 \ldots \boldsymbol 0] = [\sigma_1 \boldsymbol u_1 \ldots \sigma_r \boldsymbol u_r \; \boldsymbol 0 \ldots \boldsymbol 0], \]

while RHS is

\[ \boldsymbol{U\Sigma} = \boldsymbol U [\sigma_1\boldsymbol e_1 \ldots \sigma_r\boldsymbol e_r \;\boldsymbol 0 \ldots \boldsymbol 0] = [\sigma_1\boldsymbol{Ue}_1 \ldots \sigma_r\boldsymbol {Ue}_r \; \boldsymbol 0 \ldots \boldsymbol 0] = [\sigma_1 \boldsymbol u_1 \ldots \sigma_r \boldsymbol u_r \; \boldsymbol 0 \ldots \boldsymbol 0]. \]

SVD in NumPy#

import numpy as np
A = np.arange(6).reshape(2, 3)
A

array([[0, 1, 2],
       [3, 4, 5]])

To find SVD, use np.linalg.svd:

U, S, V = np.linalg.svd(A)
print("U:")
print(U)
print("Singular values:", S)
print("V:")
print(V)

U:
[[-0.27472113 -0.96152395]
 [-0.96152395  0.27472113]]
Singular values: [7.34846923 1.        ]
V:
[[-0.39254051 -0.56077215 -0.7290038 ]
 [ 0.82416338  0.13736056 -0.54944226]
 [ 0.40824829 -0.81649658  0.40824829]]

Check the decomposition:

Sigma = np.hstack((np.diag(S), np.zeros(shape=(2, 1))))
U @ Sigma @ V

array([[9.6365988e-16, 1.0000000e+00, 2.0000000e+00],
       [3.0000000e+00, 4.0000000e+00, 5.0000000e+00]])

Check that \(\Vert A \Vert_F = \Vert S\Vert_2\):

np.linalg.norm(S), np.linalg.norm(A)

(7.416198487095663, 7.416198487095663)

If \(\boldsymbol A\) is a positive semidefinite square matrix (\(m=n\)) then according to the spectral theorem

\[ \boldsymbol A = \boldsymbol Q \boldsymbol \Lambda \boldsymbol Q^\mathsf{T} \]

where \(\boldsymbol Q\) is orthogonal, \(\boldsymbol \Lambda\) is diagonal with nonnegative elements on the main diagonal.

A = np.array([[3, 2, 1], [2, 3, 2], [1, 2, 3]])
A

array([[3, 2, 1],
       [2, 3, 2],
       [1, 2, 3]])

Spectral decomposition can also be calculated by np.linalg.svd:

U, S, V = np.linalg.svd(A)
U, S, V

(array([[-5.41774320e-01,  7.07106781e-01,  4.54401349e-01],
        [-6.42620551e-01,  3.88578059e-16, -7.66184591e-01],
        [-5.41774320e-01, -7.07106781e-01,  4.54401349e-01]]),
 array([6.37228132, 2.        , 0.62771868]),
 array([[-5.41774320e-01, -6.42620551e-01, -5.41774320e-01],
        [ 7.07106781e-01,  5.55111512e-17, -7.07106781e-01],
        [ 4.54401349e-01, -7.66184591e-01,  4.54401349e-01]]))

Here \(\boldsymbol U = \boldsymbol V^\mathsf{T}\):

np.linalg.norm(U - V.T)

8.563843312570936e-16

Reduced form of SVD#

SVD (61) is equivalent to \(\boldsymbol{A} = \boldsymbol U_r \boldsymbol \Lambda \boldsymbol V_r^\mathsf{T}\) where

\[ \boldsymbol{U}_r = [\boldsymbol u_1 \ldots \boldsymbol u_r],\quad \boldsymbol{V}_r = [\boldsymbol v_1 \ldots \boldsymbol v_r]. \]

Also, this can be written as

\[\begin{split} \boldsymbol{A} = [\sigma_1 \boldsymbol {u}_1 \ldots \sigma_r \boldsymbol {u}_r] \begin{pmatrix} \boldsymbol v_1^\mathsf{T} \\ \vdots \\ \boldsymbol v_r^\mathsf{T} \end{pmatrix} = \sum\limits_{i=1}^r \sigma_i \boldsymbol u_i\boldsymbol v_i^\mathsf{T}. \end{split}\]

Hence, SVD allows to represent a matrix as a sum of one-rank matrices.

Truncated SVD#

Let \(\boldsymbol A = \boldsymbol U \boldsymbol \Sigma \boldsymbol V^\mathsf{T}\) be SVD of \(\boldsymbol A\). Truncated SVD is

\[ \hat{\boldsymbol A}_K = \boldsymbol U_K \boldsymbol \Sigma_K \boldsymbol V_K^\mathsf{T},\quad \Sigma_K = \mathrm{diag}\{\sigma_1, \ldots, \sigma_K\},\quad 1 \leqslant K \leqslant r. \]

Note that \(\hat{\boldsymbol A}_K\) has rank \(K\) and \(\hat{\boldsymbol A}_K = \boldsymbol A\) if \(K=r\). It turns out that \(\hat{\boldsymbol A}_K\) is the best approximation of \(\boldsymbol A\) among all matrices of rank \(K\):

\[ \min\limits_{\boldsymbol B} \Vert \boldsymbol A - \boldsymbol B\Vert_F^2 = \Vert \boldsymbol A - \hat{\boldsymbol A}_K\Vert_F^2 = \sum\limits_{i=K+1}^r \sigma_i^2. \]

Image compression#

Based on [Murphy, 2022], section 7.5.

import numpy as np
import matplotlib.pyplot as plt 
import requests
import io
from PIL import Image
from sys import getsizeof

%config InlineBackend.figure_format = 'svg'

r = requests.get('https://github.com/probml/probml-data/blob/main/data/clown.png?raw=true', stream=True)
img = Image.open(io.BytesIO(r.content))
plt.imshow(img);

../../_images/a3e9d3d8ace1a573d2ef18ac97b08ef0ed5bb6745bf89a4fbf3f173e2ac9f525.svg

X = np.array(img)
rk = np.linalg.matrix_rank(X)
print("Shape:", X.shape)
print("Rank:", rk)

Shape: (200, 320)
Rank: 200

def compress_img(X, r):
    U, sigma, V = np.linalg.svd(X, full_matrices=True)
    x_hat = np.dot(np.dot(U[:, :r], np.diag(sigma[:r])), V[:r, :])  
    plt.imshow(x_hat, cmap='gray')

1-rank approximation:

compress_img(X, 1)

../../_images/49c24f42d246c8e34fbaea20a39d53f2de17a02744c2a5892ad3739c293ad214.svg

\(2\)-rank approximaion:

compress_img(X, 2)

../../_images/e13c2e8cc9d8dc70d1173c05a7bf6619d765dbdaf25315c767f8c17b3c4fa48e.svg

\(5\)-rank approximation:

compress_img(X, 5)

../../_images/a83256d0a70572a1a31a5e3ac05671bc3d6e936776a531eb4aa20f294ed5ecc3.svg

\(10\)-rank approximation:

compress_img(X, 10)

../../_images/de8a75850a42e2aeb02040d222f5453efefe7bc6cd540b8257f6e0fc499d4939.svg

\(20\)-rank approximation is quite admittable:

compress_img(X, 20)

../../_images/644f24c38558aab0ec0b26b74e34560c4ab74f076f406b51853bcba39795bbd4.svg

Now plot the error of arroximation:

plot_SVD_approx(X)

../../_images/b812b7a86551db939013809b0be7f72f0f0b5191ed77f6391ac5d9b5873cf147.svg

Exercises#

Show that \(\Vert \boldsymbol A \Vert_F^2 = \sum\limits_{i = 1}^r \sigma_i^2\).
Find SVD of a one-rank matrix \(\boldsymbol A = \boldsymbol{uv}^\mathsf{T}\), \(\boldsymbol u \in \mathbb R^m\), \(\boldsymbol v \in \mathbb R^n\).
Prove that \(\boldsymbol u_i\) are eigenvectors of \(\boldsymbol {AA}^\mathsf{T}\).
Explain why \(\mathrm{span}(\boldsymbol v_1, \ldots, \boldsymbol v_r) = C(\boldsymbol A^\mathsf{T})\).
Find SVD of matrices

\[\begin{split} \begin{pmatrix} 3 & 0 \\ 4 & 5 \end{pmatrix}, \quad \begin{pmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \\ 0 & 0 & 0 & 0 \\ \end{pmatrix}, \quad \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \end{pmatrix}. \end{split}\]
Show that \(\Vert \boldsymbol A - \hat{\boldsymbol A}_K\Vert_F^2 = \sum\limits_{i=K+1}^r \sigma_i^2\).