HW1#

Deadline: 06.10.2024 23:59 (GMT+6).

General recommendations#

  • Do not erase any existing cells

  • Write solutions of the math problems in markdown cells of HW notebook using LaTeX. If you are not familiar with LaTeX, see a 2-page cheat sheet for a quick start

  • Provide your solution with understandable comments; do not submit tons of formulas and/or code cells without any text description of what you are doing

  • Readability counts! In case of poor writings you may receive penalty up to one point

Task 1.1 (1 point)#

Let \(\boldsymbol A \in\mathbb R^{m\times n}\), \(\boldsymbol B \in\mathbb R^{n\times m}\). Prove that \(\mathrm{tr}(\boldsymbol{AB}) = \mathrm{tr}(\boldsymbol{BA})\). Using this property, calculate \(\mathrm{tr}(\boldsymbol{uv}^\mathsf{T})\) if \(\boldsymbol u, \boldsymbol v \in\mathbb R^n\), \(\boldsymbol u \perp \boldsymbol v\).

YOUR SOLUTION HERE#

Task 1.2 (1 point)#

Let \(\boldsymbol A \in \mathbb R^{m\times n}\). Prove that \(N(\boldsymbol A) = N(\boldsymbol A^{\mathsf T}\boldsymbol A)\), i.e., matrices \(\boldsymbol A\) and \(\boldsymbol A^{\mathsf T} \boldsymbol A\) have same nullspaces.

YOUR SOLUTION HERE#

Task 1.3 (0.5 points)#

Show that similar matrices have equal determinants: \(\det \boldsymbol A = \det \boldsymbol B\) if \(\boldsymbol A \sim \boldsymbol B\).

YOUR SOLUTION HERE#

Task 1.4 (Sherman—Morrison formula, 1.5 points)#

Let \(\boldsymbol A \in \mathbb R^{n\times n}\) be an invertible matrix, \(\boldsymbol u, \boldsymbol v \in \mathbb R^n\) and \(\boldsymbol v^\mathsf{T}\boldsymbol A^{-1} \boldsymbol u \ne -1\) . Prove that

\[ (\boldsymbol A + \boldsymbol u \boldsymbol v^\mathsf{T})^{-1} = \boldsymbol A^{-1} - \frac{\boldsymbol A^{-1} \boldsymbol u \boldsymbol v^\mathsf{T} \boldsymbol A^{-1}}{1 + \boldsymbol v^\mathsf{T}\boldsymbol A^{-1} \boldsymbol u }. \]

YOUR SOLUTION HERE#

Task 1.5 (programming, 1.5 points)#

Apply k-NN algorithm to MNIST dataset and measure its performance:

  • split the datasert into train and test parts

  • train several models with different hyperparameters (take \(1\leqslant k \leqslant 20\) and different distance metrics (\(p=1\), \(p=2\), \(p=+\infty\)))

  • visualize several test samples and their predictions (see code below)

  • plot train and test accuracies for each model on the same graph

  • find a model with best test accuracy

Fetch MNIST dataset:

from sklearn.datasets import fetch_openml
X, Y = fetch_openml('mnist_784', return_X_y=True, parser='auto')
X = X.astype(float).values / 255
Y = Y.astype(int).values
X.shape, Y.shape
((70000, 784), (70000,))

Visualize it:

import numpy as np
import matplotlib.pyplot as plt

%config InlineBackend.figure_format = 'svg'

def plot_digits(X, y_true, y_pred=None, n=4, random_state=123):
    np.random.seed(random_state)
    indices = np.random.choice(np.arange(X.shape[0]), size=n*n, replace=False)
    plt.figure(figsize=(10, 10))
    for i in range(n*n):
        plt.subplot(n, n, i+1)
        plt.xticks([])
        plt.yticks([])
        plt.grid(False)
        plt.imshow(X[indices[i]].reshape(28, 28), cmap='gray')
        # plt.imshow(train_images[i], cmap=plt.cm.binary)
        if y_pred is None:
            title = str(y_true[indices[i]])
        else:
            title = f"y={y_true[indices[i]]}, ŷ={y_pred[indices[i]]}"
        plt.title(title, size=20)
    plt.show()

plot_digits(X, Y)
../_images/aa3c8f7e49fc43f3a670f481f62a33228ddec649cd5061b04f278c4e9a550823.svg
# YOUR CODE HERE

Task 1.6 (programming, 1.5 points)#

Compare the performance of different algorithms for calculation matrix product

\[ \boldsymbol C = \boldsymbol {AB}, \quad \boldsymbol A \in \mathbb R^{m\times n}, \quad \boldsymbol B \in \mathbb R^{n\times p}. \]

Try the following methods:

  • nested pythonic loop implementing the formula \(C_{ik} = \sum\limits_{j=1}^n A_{ij}B_{jk}\)

  • replace the inner for loop by np.dot for calculating \(\boldsymbol a_i^\mathsf{T}\boldsymbol b_k\)

  • calculate \(\sum\limits_{j=1}^n \boldsymbol a_j \boldsymbol b_j^\mathsf{T}\) (use np.outer)

  • just use numpy and calculate A @ B

The plan (see aslo dot product demo below):

  • implement these four functions

  • test that they return the same result

  • measure their performance on a square matrix of shape \(100\times 100\)

  • plot graphs of the execution time versus \(n\)

import numpy as np

def mat_mul_loop(A, B: np.array) -> np.array:
    # YOUR CODE HERE
    pass 

def mat_mul_inner(A, B: np.array) -> np.array:
    # YOUR CODE HERE
    pass 

def mat_mul_outer(A, B: np.array) -> np.array:
    # YOUR CODE HERE
    pass 

def mat_mul_np(A, B: np.array) -> np.array:
    # YOUR CODE HERE
    pass 
# TESTING AREA

def TestMatrices(n_max):
    for n in range(1, n_max):
        m = np.random.randint(1, n_max)
        p = np.random.randint(1, n_max)
        A = np.random.randn(m, n)
        B = np.random.randn(n, p)
        prod_loop = mat_mul_loop(A, B)
        prod_inner = mat_mul_inner(A, B)
        prod_outer = mat_mul_outer(A, B)
        prod_np = mat_mul_np(A, B)
        assert np.allclose(prod_loop, prod_inner)
        assert np.allclose(prod_loop, prod_np)
        assert np.allclose(prod_loop, prod_outer)

Measure performance on two square matrices:

n = 100
A = np.random.randn(n, n)
B = np.random.randn(n, n)
# timeit!
# YOUR CODE HERE

Plot the graphs:

import matplotlib.pyplot as plt
from time import time

%config InlineBackend.figure_format = 'svg'

def measure_time(func, m, n, p, n_samples=10):
    result = np.zeros(n_samples)
    for i in range(n_samples):
        begin = time()
        func(np.random.randn(m, n), np.random.randn(n, p))
        result[i] = time() - begin
    return result.mean()

def get_times_lists(func, step=10, max_size=200, n_samples=20):
    times = []
    sizes = np.arange(step, max_size + 1, step)
    for size in sizes:
        times.append(measure_time(func, size, size, size, n_samples))
    return np.array(times)

def plot_time_vs_size(step=10, max_size=100, n_samples=20):
    loop_times = 1000*get_times_lists(mat_mul_loop, step, max_size, n_samples)
    inner_times = 1000*get_times_lists(mat_mul_inner, step, max_size, n_samples)
    outer_times = 1000*get_times_lists(mat_mul_outer, step, max_size, n_samples)
    np_times = 1000*get_times_lists(mat_mul_np, step, max_size, n_samples)
    sizes = np.arange(step, max_size + 1, step)
    plt.semilogy(sizes, loop_times, c='r', lw=2, label="loop")
    plt.semilogy(sizes, inner_times, c='b', lw=2, label="inner")
    plt.semilogy(sizes, outer_times, c='g', lw=2, label="outer")
    plt.semilogy(sizes, np_times, c='m', lw=2, label="np")
    plt.xlim(0, max_size)
    plt.title("Matrix product")
    plt.legend()
    plt.xlabel("size")
    plt.ylabel("time, ms")
    plt.grid(ls=":");
# YOUR CODE HERE

YOUR CONCLUSIONS HERE

Demo for task 1.6: dot product performance#

Compare several implementations of the dot product.

import numpy as np

def dot_loop(a, b):
    result = 0
    for i in range(len(a)):
        result += a[i] * b[i]
    return result

def dot_sum(a, b):
    return sum(a * b)

def dot_np(a, b):
    return a @ b

Test that all this functions produce the same result:

for n in range(1, 101):
    a = np.random.randn(n)
    b = np.random.randn(n)
    assert np.allclose(dot_loop(a, b), dot_sum(a, b))
    assert np.allclose(dot_loop(a, b), dot_np(a, b))
    assert np.allclose(dot_np(a, b), dot_sum(a, b))

Measure performance:

n = 1000
a = np.random.randn(n)
b = np.random.randn(n)
%%timeit
dot_loop(a, b)
356 µs ± 6.31 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%%timeit
dot_sum(a, b)
89.3 µs ± 8.92 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%%timeit
dot_np(a, b)
1.61 µs ± 69.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

This is why you should almost always prefer numpy to pythonic loops! To emphasize this effect, plot the graphs of execution time versus array size.

import matplotlib.pyplot as plt
from time import time

%config InlineBackend.figure_format = 'svg'

def measure_time(func, size, n_samples=10):
    result = np.zeros(n_samples)
    for i in range(n_samples):
        begin = time()
        func(np.random.randn(size), np.random.randn(size))
        result[i] = time() - begin
    return result.mean()

def get_times_lists(func, step=20, max_size=1000, n_samples=20):
    times = []
    sizes = np.arange(20, max_size + 1, 20)
    for size in sizes:
        times.append(measure_time(func, size, n_samples))
    return np.array(times)

def plot_time_vs_size(step=20, max_size=1000, n_samples=50):
    loop_times = 1000*get_times_lists(dot_loop, step, max_size, n_samples)
    sum_times = 1000*get_times_lists(dot_sum, step, max_size, n_samples)
    np_times = 1000*get_times_lists(dot_np, step, max_size, n_samples)
    sizes = np.arange(step, max_size + 1, step)
    plt.plot(sizes, loop_times, c='r', lw=2, label="loop")
    plt.plot(sizes, sum_times, c='b', lw=2, label="sum")
    plt.plot(sizes, np_times, c='g', lw=2, label="np")
    plt.xlim(0, max_size)
    plt.title("Dot product")
    plt.legend()
    plt.xlabel("size")
    plt.ylabel("time, ms")
    plt.grid(ls=":");
plot_time_vs_size()
../_images/e585819aaaf6f7ad3996b08f843254c416d4d427963080701c2a314782447488.svg