Possible definitions of a vector:
a directed line segment
an ordered sequence of numbers
an element of a vector space
The most suitable for the purposes of machine learning is the second. Vectors are usually denoted by small bold letters: \(\boldsymbol x\), \(\boldsymbol y\), \(\boldsymbol u\), \(\boldsymbol v\), \(\boldsymbol a\), \(\boldsymbol b, \ldots\). By default, a vector \(\boldsymbol x\) with \(n\) elements is written as a column:
To obtain a row representation the vector \(\boldsymbol x\) should be transposed:
Sometimes it does not matter if we write the vector \(\boldsymbol x\) as a row of a column. In such cases notation \(\boldsymbol x = (x_1, \ldots, x_n)\) is also admittable.
The set of all vectors with real elements of size \(n\) is denoted as \(\mathbb R^n\).
Vectors in Python#
For numeric operations with matrices and vectors in Python there is NumPy library.
import numpy as np
vector = np.array([1, 2, 7])
[1 2 7]
The attribute dtype
specifies the underlying type of the vector’s elements:
float_vector = np.array([-0.1, 1.123])
Vector operations#
There are two basic vector operations.
Addition: if
\[\begin{split} \boldsymbol x = \begin{pmatrix} x_1 \\ \vdots \\ x_n \end{pmatrix}, \quad \boldsymbol y = \begin{pmatrix} y_1 \\ \vdots \\ y_n \end{pmatrix}, \text{ then } \boldsymbol x + \boldsymbol y = \begin{pmatrix} x_1 + y_1 \\ \vdots \\ x_n + y_n \end{pmatrix}. \end{split}\]Multiplication by a scalar (number): if
\[\begin{split} \boldsymbol x = \begin{pmatrix} x_1 \\ \vdots \\ x_n \end{pmatrix},\quad \alpha \in \mathbb R, \text{ then } \alpha \boldsymbol x = \begin{pmatrix} \alpha x_1 \\ \vdots \\ \alpha x_n \end{pmatrix}. \end{split}\]
Zero vector:
Sometimes a vector of all ones is useful:
Two vectors \(\boldsymbol x\) and \(\boldsymbol y\) are collinear if \(\boldsymbol y = \alpha \boldsymbol x\) for some \(\alpha \in \mathbb R\). Collinear vectors lie on the same line which passes through the origin.
If \(\alpha = -1\) then vector \(\boldsymbol y = (-1)\cdot \boldsymbol x = -\boldsymbol x\) is called the opposite vector to \(\boldsymbol x\).
In NumPy all these operations are straightforward:
x = np.linspace(0, 1, num=5)
y = np.arange(1, 6)
z = np.zeros(5)
o = np.ones(6)
print("Zero vector:", z)
print("Vector of ones:", o)
[0. 0.25 0.5 0.75 1. ]
[1 2 3 4 5]
Zero vector: [0. 0. 0. 0. 0.]
Vector of ones: [1. 1. 1. 1. 1. 1.]
print("Sum:", x+y)
print("Diff:", y-x)
Sum: [1. 2.25 3.5 4.75 6. ]
Diff: [1. 1.75 2.5 3.25 4. ]
[-1 -2 -3 -4 -5]
Vector norm#
If \(\boldsymbol x = (x_1, \ldots, x_n)\), then its length (aka Euclidean norm) is
Euclidean norm is a special case of \(p\)-norm (aka Minkowski norm)
The Minkowski norm becomes
Euclidean norm if \(p=2\);
Manhattan norm \(\Vert \boldsymbol x \Vert_1 = \sum\limits_{k=1}^n \vert x_k \vert\) if \(p=1\);
maximum norm \(\Vert \boldsymbol x \Vert_\infty = \max \{\vert x_1 \vert, \ldots, \vert x_n \vert\}\) if \(p\to\infty\).
A unit vector \(\boldsymbol x\) has norm equal to \(1\): \(\Vert \boldsymbol x\Vert = 1\). The unit ball in \(\mathbb R^n\) is
The shape of the unit ball depends on the norm. For Euclidean norm it will be a normal ball. Unit disks in two dimensions for different values of \(p\):
The distance between vectors \(\boldsymbol x\) and \(\boldsymbol y\) equals to \(\Vert \boldsymbol x - \boldsymbol y\Vert\). In case of Euclidean norm the distance
is also called Euclidean.
How to calculate norm in NumPy? Use np.linalg.norm
x = np.array([1, 2, -2])
To specify \(p\) use parameter ord
print("1-norm =", np.linalg.norm(x, ord=1))
print("2-norm =", np.linalg.norm(x, ord=2))
print("10-norm =", np.linalg.norm(x, ord=10))
print("infinite norm =", np.linalg.norm(x, ord=np.inf))
1-norm = 5.0
2-norm = 3.0
10-norm = 2.143651567459133
infinite norm = 2.0
Inner product#
Inner product (aka dot product) of vectors \(\boldsymbol x, \boldsymbol y \in \mathbb R^n\) equals
Alternative notation: \(\boldsymbol x^\mathsf{T} \boldsymbol y\).
Inner product generates Euclidean norm: if \(\boldsymbol x \in \mathbb R^n\) then
The Cauchy-Schwarz inequality (44) can be written as
This inequality turns into equality iff vectors \(\boldsymbol x\) and \(\boldsymbol y\) are collinear.
The vectors \(\boldsymbol x\) and \(\boldsymbol y\) are called orthogonal if \(\langle\boldsymbol x, \boldsymbol y \rangle = 0\). Angle \(\theta\) between two nonzero vectors \(\boldsymbol x\) and \(\boldsymbol y\) is defined from equality
The angle is well-defined since this fraction is always between \(-1\) and \(1\) due to the Cauchy-Schwarz inequality. If \(\langle \boldsymbol x, \boldsymbol y \rangle = 0\), then \(\cos\theta = 0\) and \(\theta = \frac \pi 2\). Hence, the angle between orhogonal vectors equals to \(90°\).
In data analysis and machine learning the formula (43) is often used to measure similarity between vectors \(\boldsymbol x\) and \(\boldsymbol y\): the closer \(\cos \theta\) to \(1\), the more similar the vectors are. By the same reason the quantity \(1-\cos\theta\) is called cosine distance between \(\boldsymbol x\) and \(\boldsymbol y\).
Dot product in NumPy#
There are several way to calculate the inner product of two vectors in Python.
x = np.array([1, 2, 3])
y = np.array([1, -2, 2])
print(, y),, x @ y)
3 3 3
Let \(\boldsymbol x \in \mathbb R^n\). Prove that
\(\Vert \boldsymbol x \Vert_\infty \leqslant \Vert \boldsymbol x \Vert_1 \leqslant n\Vert \boldsymbol x \Vert_\infty\);
\(\Vert \boldsymbol x \Vert_\infty \leqslant \Vert \boldsymbol x \Vert_2 \leqslant \sqrt{n}\Vert \boldsymbol x \Vert_\infty\);
\(\Vert \boldsymbol x \Vert_2 \leqslant \Vert \boldsymbol x \Vert_1 \leqslant \sqrt{n}\Vert \boldsymbol x \Vert_2\).
Соотношения 1 и 2 немедленно следуют из неравенств
справедливых для любых неотрицательных чисел \(a_1, \ldots, a_n\). Неравенство \(\Vert \boldsymbol x \Vert_2 \leqslant \Vert \boldsymbol x \Vert_1\) эквивалентно неравенству
которое, очевидно, выполнено. Наконец, последнее неравенство вытекает из неравенства Коши—Буняковского—Шварца.
Prove that \(p\)-norm for \(1\leqslant p \leqslant \infty\) satisfies the following properties:
\(\Vert\boldsymbol x\Vert \geqslant 0\), \(\Vert\boldsymbol x\Vert = 0 \iff \boldsymbol x =\boldsymbol 0\);
\(\Vert\alpha \boldsymbol x \Vert = \vert\alpha\vert \Vert\boldsymbol x \Vert\) for all \(\alpha \in \mathbb R\), \(\boldsymbol x \in \mathbb R^n\);
\(\Vert\boldsymbol x + \boldsymbol y\Vert \leqslant \Vert\boldsymbol x \Vert + \Vert\boldsymbol y \Vert\) для всех \(\boldsymbol x, \boldsymbol y \in \mathbb R^n\) (triangle inequality).
Show that \(\Vert \boldsymbol x \Vert_\infty = \max \{\vert x_1 \vert, \ldots, \vert x_n \vert\}\), i.e.,
\[ \lim\limits_{p\to+\infty}\bigg(\sum\limits_{k=1}^n |x_k|^p \bigg)^\frac 1p = \max \{\vert x_1 \vert, \ldots, \vert x_n \vert\}. \]Prove the Cauchy-Schwarz inequality
Show that cosine distance between two vectors is always between \(0\) and \(2\).
What can be said about vectors \(\boldsymbol x\) and \(\boldsymbol y\) if cosine distance between them equals to \(0\)? \(1\)? \(2\)?
How many arithmetic operations are required to calculate the dot product by formula (42)?