HW1#
Deadline: 06.10.2024 23:59 (GMT+5).
General recommendations#
Do not erase any existing cells
Write solutions of the math problems in markdown cells of HW notebook using LaTeX. If you are not familiar with LaTeX, see a 2-page cheat sheet for a quick start
Provide your solution with understandable comments; do not submit tons of formulas and/or code cells without any text description of what you are doing
Readability counts! In case of poor writings you may receive penalty up to one point
Task 1.1 (1 point)#
Let \(\boldsymbol A \in\mathbb R^{m\times n}\), \(\boldsymbol B \in\mathbb R^{n\times m}\). Prove that \(\mathrm{tr}(\boldsymbol{AB}) = \mathrm{tr}(\boldsymbol{BA})\). Using this property, calculate \(\mathrm{tr}(\boldsymbol{uv}^\mathsf{T})\) if \(\boldsymbol u, \boldsymbol v \in\mathbb R^n\), \(\boldsymbol u \perp \boldsymbol v\).
YOUR SOLUTION HERE#
Task 1.2 (1 point)#
Let \(\boldsymbol A \in \mathbb R^{m\times n}\). Prove that \(N(\boldsymbol A) = N(\boldsymbol A^{\mathsf T}\boldsymbol A)\), i.e., matrices \(\boldsymbol A\) and \(\boldsymbol A^{\mathsf T} \boldsymbol A\) have same nullspaces.
YOUR SOLUTION HERE#
Task 1.3 (0.5 points)#
Show that similar matrices have equal determinants: \(\det \boldsymbol A = \det \boldsymbol B\) if \(\boldsymbol A \sim \boldsymbol B\).
YOUR SOLUTION HERE#
Task 1.4 (Sherman—Morrison formula, 1.5 points)#
Let \(\boldsymbol A \in \mathbb R^{n\times n}\) be an invertible matrix, \(\boldsymbol u, \boldsymbol v \in \mathbb R^n\) and \(\boldsymbol v^\mathsf{T}\boldsymbol A^{-1} \boldsymbol u \ne -1\) . Prove that
YOUR SOLUTION HERE#
Task 1.5 (programming, 1.5 points)#
Apply k-NN algorithm to MNIST dataset and measure its performance:
split the datasert into train and test parts
train several models with different hyperparameters (take \(1\leqslant k \leqslant 20\) and different distance metrics (\(p=1\), \(p=2\), \(p=+\infty\)))
visualize several test samples and their predictions (see code below)
plot train and test accuracies for each model on the same graph
find a model with best test accuracy
Fetch MNIST dataset:
from sklearn.datasets import fetch_openml
X, Y = fetch_openml('mnist_784', return_X_y=True, parser='auto')
X = X.astype(float).values / 255
Y = Y.astype(int).values
X.shape, Y.shape
((70000, 784), (70000,))
Visualize it:
import numpy as np
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'svg'
def plot_digits(X, y_true, y_pred=None, n=4, random_state=123):
np.random.seed(random_state)
indices = np.random.choice(np.arange(X.shape[0]), size=n*n, replace=False)
plt.figure(figsize=(10, 10))
for i in range(n*n):
plt.subplot(n, n, i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(X[indices[i]].reshape(28, 28), cmap='gray')
# plt.imshow(train_images[i], cmap=plt.cm.binary)
if y_pred is None:
title = str(y_true[indices[i]])
else:
title = f"y={y_true[indices[i]]}, ŷ={y_pred[indices[i]]}"
plt.title(title, size=20)
plt.show()
plot_digits(X, Y)
# YOUR CODE HERE
Task 1.6 (programming, 1.5 points)#
Compare the performance of different algorithms for calculation matrix product
Try the following methods:
nested pythonic loop implementing the formula \(C_{ik} = \sum\limits_{j=1}^n A_{ij}B_{jk}\)
replace the inner
for
loop bynp.dot
for calculating \(\boldsymbol a_i^\mathsf{T}\boldsymbol b_k\)calculate \(\sum\limits_{j=1}^n \boldsymbol a_j \boldsymbol b_j^\mathsf{T}\) (use np.outer)
just use numpy and calculate
A @ B
The plan (see aslo dot product demo below):
implement these four functions
test that they return the same result
measure their performance on a square matrix of shape \(100\times 100\)
plot graphs of the execution time versus \(n\)
import numpy as np
def mat_mul_loop(A, B: np.array) -> np.array:
# YOUR CODE HERE
pass
def mat_mul_inner(A, B: np.array) -> np.array:
# YOUR CODE HERE
pass
def mat_mul_outer(A, B: np.array) -> np.array:
# YOUR CODE HERE
pass
def mat_mul_np(A, B: np.array) -> np.array:
# YOUR CODE HERE
pass
# TESTING AREA
def TestMatrices(n_max):
for n in range(1, n_max):
m = np.random.randint(1, n_max)
p = np.random.randint(1, n_max)
A = np.random.randn(m, n)
B = np.random.randn(n, p)
prod_loop = mat_mul_loop(A, B)
prod_inner = mat_mul_inner(A, B)
prod_outer = mat_mul_outer(A, B)
prod_np = mat_mul_np(A, B)
assert np.allclose(prod_loop, prod_inner)
assert np.allclose(prod_loop, prod_np)
assert np.allclose(prod_loop, prod_outer)
Measure performance on two square matrices:
n = 100
A = np.random.randn(n, n)
B = np.random.randn(n, n)
# timeit!
# YOUR CODE HERE
Plot the graphs:
import matplotlib.pyplot as plt
from time import time
%config InlineBackend.figure_format = 'svg'
def measure_time(func, m, n, p, n_samples=10):
result = np.zeros(n_samples)
for i in range(n_samples):
begin = time()
func(np.random.randn(m, n), np.random.randn(n, p))
result[i] = time() - begin
return result.mean()
def get_times_lists(func, step=10, max_size=200, n_samples=20):
times = []
sizes = np.arange(step, max_size + 1, step)
for size in sizes:
times.append(measure_time(func, size, size, size, n_samples))
return np.array(times)
def plot_time_vs_size(step=10, max_size=100, n_samples=20):
loop_times = 1000*get_times_lists(mat_mul_loop, step, max_size, n_samples)
inner_times = 1000*get_times_lists(mat_mul_inner, step, max_size, n_samples)
outer_times = 1000*get_times_lists(mat_mul_outer, step, max_size, n_samples)
np_times = 1000*get_times_lists(mat_mul_np, step, max_size, n_samples)
sizes = np.arange(step, max_size + 1, step)
plt.semilogy(sizes, loop_times, c='r', lw=2, label="loop")
plt.semilogy(sizes, inner_times, c='b', lw=2, label="inner")
plt.semilogy(sizes, outer_times, c='g', lw=2, label="outer")
plt.semilogy(sizes, np_times, c='m', lw=2, label="np")
plt.xlim(0, max_size)
plt.title("Matrix product")
plt.legend()
plt.xlabel("size")
plt.ylabel("time, ms")
plt.grid(ls=":");
# YOUR CODE HERE
YOUR CONCLUSIONS HERE
Demo for task 1.6: dot product performance#
Compare several implementations of the dot product.
import numpy as np
def dot_loop(a, b):
result = 0
for i in range(len(a)):
result += a[i] * b[i]
return result
def dot_sum(a, b):
return sum(a * b)
def dot_np(a, b):
return a @ b
Test that all this functions produce the same result:
for n in range(1, 101):
a = np.random.randn(n)
b = np.random.randn(n)
assert np.allclose(dot_loop(a, b), dot_sum(a, b))
assert np.allclose(dot_loop(a, b), dot_np(a, b))
assert np.allclose(dot_np(a, b), dot_sum(a, b))
Measure performance:
n = 1000
a = np.random.randn(n)
b = np.random.randn(n)
%%timeit
dot_loop(a, b)
367 µs ± 17 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%%timeit
dot_sum(a, b)
85.4 µs ± 1.8 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%%timeit
dot_np(a, b)
1.81 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
This is why you should almost always prefer numpy
to pythonic loops! To emphasize this effect, plot the graphs of execution time versus array size.
import matplotlib.pyplot as plt
from time import time
%config InlineBackend.figure_format = 'svg'
def measure_time(func, size, n_samples=10):
result = np.zeros(n_samples)
for i in range(n_samples):
begin = time()
func(np.random.randn(size), np.random.randn(size))
result[i] = time() - begin
return result.mean()
def get_times_lists(func, step=20, max_size=1000, n_samples=20):
times = []
sizes = np.arange(20, max_size + 1, 20)
for size in sizes:
times.append(measure_time(func, size, n_samples))
return np.array(times)
def plot_time_vs_size(step=20, max_size=1000, n_samples=50):
loop_times = 1000*get_times_lists(dot_loop, step, max_size, n_samples)
sum_times = 1000*get_times_lists(dot_sum, step, max_size, n_samples)
np_times = 1000*get_times_lists(dot_np, step, max_size, n_samples)
sizes = np.arange(step, max_size + 1, step)
plt.plot(sizes, loop_times, c='r', lw=2, label="loop")
plt.plot(sizes, sum_times, c='b', lw=2, label="sum")
plt.plot(sizes, np_times, c='g', lw=2, label="np")
plt.xlim(0, max_size)
plt.title("Dot product")
plt.legend()
plt.xlabel("size")
plt.ylabel("time, ms")
plt.grid(ls=":");
plot_time_vs_size()