Vectors

Vectors#

Possible definitions of a vector:

a directed line segment
an ordered sequence of numbers
an element of a vector space

The most suitable for the purposes of machine learning is the second. Vectors are usually denoted by small bold letters: \(\boldsymbol x\), \(\boldsymbol y\), \(\boldsymbol u\), \(\boldsymbol v\), \(\boldsymbol a\), \(\boldsymbol b, \ldots\). By default, a vector \(\boldsymbol x\) with \(n\) elements is written as a column:

\[\begin{split} \boldsymbol x = \begin{pmatrix} x_1 \\ \vdots \\ x_n \end{pmatrix}. \end{split}\]

To obtain a row representation the vector \(\boldsymbol x\) should be transposed:

\[ \boldsymbol x^\mathsf{T} = (x_1, \ldots, x_n). \]

Sometimes it does not matter if we write the vector \(\boldsymbol x\) as a row of a column. In such cases notation \(\boldsymbol x = (x_1, \ldots, x_n)\) is also admittable.

The set of all vectors with real elements of size \(n\) is denoted as \(\mathbb R^n\).

Vectors in Python#

For numeric operations with matrices and vectors in Python there is NumPy library.

import numpy as np
vector = np.array([1, 2, 7])
print(vector)

[1 2 7]

The attribute dtype specifies the underlying type of the vector’s elements:

print(vector.dtype)
float_vector = np.array([-0.1, 1.123])
print(float_vector.dtype)

int64
float64

Vector operations#

There are two basic vector operations.

Addition: if

\[\begin{split} \boldsymbol x = \begin{pmatrix} x_1 \\ \vdots \\ x_n \end{pmatrix}, \quad \boldsymbol y = \begin{pmatrix} y_1 \\ \vdots \\ y_n \end{pmatrix}, \text{ then } \boldsymbol x + \boldsymbol y = \begin{pmatrix} x_1 + y_1 \\ \vdots \\ x_n + y_n \end{pmatrix}. \end{split}\]
Multiplication by a scalar (number): if

\[\begin{split} \boldsymbol x = \begin{pmatrix} x_1 \\ \vdots \\ x_n \end{pmatrix},\quad \alpha \in \mathbb R, \text{ then } \alpha \boldsymbol x = \begin{pmatrix} \alpha x_1 \\ \vdots \\ \alpha x_n \end{pmatrix}. \end{split}\]

Zero vector:

\[\begin{split} \boldsymbol 0 = \begin{pmatrix} 0 \\ \vdots \\ 0 \end{pmatrix}; \quad \boldsymbol x + \boldsymbol 0 = \boldsymbol x \; \forall \boldsymbol x \in \mathbb R^n. \end{split}\]

Sometimes a vector of all ones is useful:

\[\begin{split} \boldsymbol 1 = \begin{pmatrix} 1 \\ \vdots \\ 1 \end{pmatrix}. \end{split}\]

Two vectors \(\boldsymbol x\) and \(\boldsymbol y\) are collinear if \(\boldsymbol y = \alpha \boldsymbol x\) for some \(\alpha \in \mathbb R\). Collinear vectors lie on the same line which passes through the origin.

If \(\alpha = -1\) then vector \(\boldsymbol y = (-1)\cdot \boldsymbol x = -\boldsymbol x\) is called the opposite vector to \(\boldsymbol x\).

In NumPy all these operations are straightforward:

x = np.linspace(0, 1, num=5)
y = np.arange(1, 6)
z = np.zeros(5)
o = np.ones(6)
print(x)
print(y)
print("Zero vector:", z)
print("Vector of ones:", o)

[0.   0.25 0.5  0.75 1.  ]
[1 2 3 4 5]
Zero vector: [0. 0. 0. 0. 0.]
Vector of ones: [1. 1. 1. 1. 1. 1.]

print("Sum:", x+y)
print("Diff:", y-x)

Sum: [1.   2.25 3.5  4.75 6.  ]
Diff: [1.   1.75 2.5  3.25 4.  ]

print(-y)

[-1 -2 -3 -4 -5]

Vector norm#

If \(\boldsymbol x = (x_1, \ldots, x_n)\), then its length (aka Euclidean norm) is

\[ \Vert \boldsymbol x \Vert = \sqrt{\sum\limits_{k=1}^n x_k^2}. \]

Euclidean norm is a special case of \(p\)-norm (aka Minkowski norm)

\[ \Vert \boldsymbol x \Vert_p = \bigg(\sum\limits_{k=1}^n |x_k|^p \bigg)^\frac 1p, \quad p \geqslant 1. \]

The Minkowski norm becomes

Euclidean norm if \(p=2\);
Manhattan norm \(\Vert \boldsymbol x \Vert_1 = \sum\limits_{k=1}^n \vert x_k \vert\) if \(p=1\);
maximum norm \(\Vert \boldsymbol x \Vert_\infty = \max \{\vert x_1 \vert, \ldots, \vert x_n \vert\}\) if \(p\to\infty\).

A unit vector \(\boldsymbol x\) has norm equal to \(1\): \(\Vert \boldsymbol x\Vert = 1\). The unit ball in \(\mathbb R^n\) is

\[ \{\boldsymbol x \in \mathbb R^n\colon \Vert \boldsymbol x \Vert \leqslant 1\}. \]

The shape of the unit ball depends on the norm. For Euclidean norm it will be a normal ball. Unit disks in two dimensions for different values of \(p\):

The distance between vectors \(\boldsymbol x\) and \(\boldsymbol y\) equals to \(\Vert \boldsymbol x - \boldsymbol y\Vert\). In case of Euclidean norm the distance

\[ \Vert\boldsymbol x - \boldsymbol y\Vert_2 = \sqrt{\sum\limits_{i=1}^n (x_i - y_i)^2} \]

is also called Euclidean.

How to calculate norm in NumPy? Use np.linalg.norm

x = np.array([1, 2, -2])
np.linalg.norm(x)

3.0

To specify \(p\) use parameter ord:

print("1-norm =", np.linalg.norm(x, ord=1))
print("2-norm =", np.linalg.norm(x, ord=2))
print("10-norm =", np.linalg.norm(x, ord=10))
print("infinite norm =", np.linalg.norm(x, ord=np.inf))

1-norm = 5.0
2-norm = 3.0
10-norm = 2.143651567459133
infinite norm = 2.0

Inner product#

Inner product (aka dot product) of vectors \(\boldsymbol x, \boldsymbol y \in \mathbb R^n\) equals

(42)#\[ \langle \boldsymbol x, \boldsymbol y \rangle = \sum\limits_{k=1}^n x_k y_k.\]

Alternative notation: \(\boldsymbol x^\mathsf{T} \boldsymbol y\).

Inner product generates Euclidean norm: if \(\boldsymbol x \in \mathbb R^n\) then

\[ \sqrt{\langle \boldsymbol x, \boldsymbol x \rangle} = \Vert \boldsymbol x \Vert_2. \]

The Cauchy-Schwarz inequality (44) can be written as

\[ \vert\langle \boldsymbol x, \boldsymbol y \rangle\vert \leqslant \Vert \boldsymbol x \Vert_2 \cdot \Vert \boldsymbol y \Vert_2. \]

This inequality turns into equality iff vectors \(\boldsymbol x\) and \(\boldsymbol y\) are collinear.

The vectors \(\boldsymbol x\) and \(\boldsymbol y\) are called orthogonal if \(\langle\boldsymbol x, \boldsymbol y \rangle = 0\). Angle \(\theta\) between two nonzero vectors \(\boldsymbol x\) and \(\boldsymbol y\) is defined from equality

(43)#\[ \cos \theta = \frac{\langle \boldsymbol x, \boldsymbol y \rangle}{\Vert \boldsymbol x\Vert_2 \cdot \Vert \boldsymbol y\Vert_2}.\]

The angle is well-defined since this fraction is always between \(-1\) and \(1\) due to the Cauchy-Schwarz inequality. If \(\langle \boldsymbol x, \boldsymbol y \rangle = 0\), then \(\cos\theta = 0\) and \(\theta = \frac \pi 2\). Hence, the angle between orhogonal vectors equals to \(90°\).

In data analysis and machine learning the formula (43) is often used to measure similarity between vectors \(\boldsymbol x\) and \(\boldsymbol y\): the closer \(\cos \theta\) to \(1\), the more similar the vectors are. By the same reason the quantity \(1-\cos\theta\) is called cosine distance between \(\boldsymbol x\) and \(\boldsymbol y\).

Dot product in NumPy#

There are several way to calculate the inner product of two vectors in Python.

x = np.array([1, 2, 3])
y = np.array([1, -2, 2])
print(np.dot(x, y), x.dot(y), x @ y)

3 3 3

Exercises#

Let \(\boldsymbol x \in \mathbb R^n\). Prove that
- \(\Vert \boldsymbol x \Vert_\infty \leqslant \Vert \boldsymbol x \Vert_1 \leqslant n\Vert \boldsymbol x \Vert_\infty\);
- \(\Vert \boldsymbol x \Vert_\infty \leqslant \Vert \boldsymbol x \Vert_2 \leqslant \sqrt{n}\Vert \boldsymbol x \Vert_\infty\);
- \(\Vert \boldsymbol x \Vert_2 \leqslant \Vert \boldsymbol x \Vert_1 \leqslant \sqrt{n}\Vert \boldsymbol x \Vert_2\).

Solution

Соотношения 1 и 2 немедленно следуют из неравенств

\[ \max\{a_1, \ldots, a_n\} \leqslant \sum\limits_{k=1}^n a_k \leqslant n\max\{a_1, \ldots, a_n\}, \]

справедливых для любых неотрицательных чисел \(a_1, \ldots, a_n\). Неравенство \(\Vert \boldsymbol x \Vert_2 \leqslant \Vert \boldsymbol x \Vert_1\) эквивалентно неравенству

\[ \sum\limits_{k=1}^n x_k^2 \leqslant \Big(\sum\limits_{k=1}^n \vert x_k\vert\Big)^2 = \sum\limits_{k=1}^n x_k^2 + \underbrace{2\sum\limits_{1\leqslant i < j\leqslant n} \vert x_i\vert \vert x_j\vert}_{\geqslant 0}, \]

которое, очевидно, выполнено. Наконец, последнее неравенство вытекает из неравенства Коши—Буняковского—Шварца.

\[ \Vert \boldsymbol x \Vert_1 = \sum\limits_{k=1}^n 1\cdot\vert x_k\vert \leqslant \sqrt{\sum\limits_{k=1}^n 1^2}\sqrt{\sum\limits_{k=1}^n \vert x_k\vert^2} = \sqrt n \Vert \boldsymbol x \Vert_2. \]

Prove that \(p\)-norm for \(1\leqslant p \leqslant \infty\) satisfies the following properties:
- \(\Vert\boldsymbol x\Vert \geqslant 0\), \(\Vert\boldsymbol x\Vert = 0 \iff \boldsymbol x =\boldsymbol 0\);
- \(\Vert\alpha \boldsymbol x \Vert = \vert\alpha\vert \Vert\boldsymbol x \Vert\) for all \(\alpha \in \mathbb R\), \(\boldsymbol x \in \mathbb R^n\);
- \(\Vert\boldsymbol x + \boldsymbol y\Vert \leqslant \Vert\boldsymbol x \Vert + \Vert\boldsymbol y \Vert\) для всех \(\boldsymbol x, \boldsymbol y \in \mathbb R^n\) (triangle inequality).
Show that \(\Vert \boldsymbol x \Vert_\infty = \max \{\vert x_1 \vert, \ldots, \vert x_n \vert\}\), i.e.,

\[ \lim\limits_{p\to+\infty}\bigg(\sum\limits_{k=1}^n |x_k|^p \bigg)^\frac 1p = \max \{\vert x_1 \vert, \ldots, \vert x_n \vert\}. \]
Prove the Cauchy-Schwarz inequality

(44)#\[ \bigg(\sum\limits_{k=1}^n x_k y_k\bigg)^2 \leqslant \sum\limits_{k=1}^n x_k^2 \sum\limits_{k=1}^n y_k^2, \]

Show that cosine distance between two vectors is always between \(0\) and \(2\).
What can be said about vectors \(\boldsymbol x\) and \(\boldsymbol y\) if cosine distance between them equals to \(0\)? \(1\)? \(2\)?
How many arithmetic operations are required to calculate the dot product by formula (42)?