Vectors#

Possible definitions of a vector:

  1. a directed line segment

  2. an ordered sequence of numbers

  3. an element of a vector space

The most suitable for the purposes of machine learning is the second. Vectors are usually denoted by small bold letters: \(\boldsymbol x\), \(\boldsymbol y\), \(\boldsymbol u\), \(\boldsymbol v\), \(\boldsymbol a\), \(\boldsymbol b, \ldots\). By default, a vector \(\boldsymbol x\) with \(n\) elements is written as a column:

\[\begin{split} \boldsymbol x = \begin{pmatrix} x_1 \\ \vdots \\ x_n \end{pmatrix}. \end{split}\]

To obtain a row representation the vector \(\boldsymbol x\) should be transposed:

\[ \boldsymbol x^\mathsf{T} = (x_1, \ldots, x_n). \]

Sometimes it does not matter if we write the vector \(\boldsymbol x\) as a row of a column. In such cases notation \(\boldsymbol x = (x_1, \ldots, x_n)\) is also admittable.

The set of all vectors with real elements of size \(n\) is denoted as \(\mathbb R^n\).

Vectors in Python#

For numeric operations with matrices and vectors in Python there is NumPy library.

import numpy as np
vector = np.array([1, 2, 7])
print(vector)
[1 2 7]

The attribute dtype specifies the underlying type of the vector’s elements:

print(vector.dtype)
float_vector = np.array([-0.1, 1.123])
print(float_vector.dtype)
int64
float64

Vector operations#

There are two basic vector operations.

  1. Addition: if

    \[\begin{split} \boldsymbol x = \begin{pmatrix} x_1 \\ \vdots \\ x_n \end{pmatrix}, \quad \boldsymbol y = \begin{pmatrix} y_1 \\ \vdots \\ y_n \end{pmatrix}, \text{ then } \boldsymbol x + \boldsymbol y = \begin{pmatrix} x_1 + y_1 \\ \vdots \\ x_n + y_n \end{pmatrix}. \end{split}\]
  2. Multiplication by a scalar (number): if

    \[\begin{split} \boldsymbol x = \begin{pmatrix} x_1 \\ \vdots \\ x_n \end{pmatrix},\quad \alpha \in \mathbb R, \text{ then } \alpha \boldsymbol x = \begin{pmatrix} \alpha x_1 \\ \vdots \\ \alpha x_n \end{pmatrix}. \end{split}\]

Zero vector:

\[\begin{split} \boldsymbol 0 = \begin{pmatrix} 0 \\ \vdots \\ 0 \end{pmatrix}; \quad \boldsymbol x + \boldsymbol 0 = \boldsymbol x \; \forall \boldsymbol x \in \mathbb R^n. \end{split}\]

Sometimes a vector of all ones is useful:

\[\begin{split} \boldsymbol 1 = \begin{pmatrix} 1 \\ \vdots \\ 1 \end{pmatrix}. \end{split}\]

Two vectors \(\boldsymbol x\) and \(\boldsymbol y\) are collinear if \(\boldsymbol y = \alpha \boldsymbol x\) for some \(\alpha \in \mathbb R\). Collinear vectors lie on the same line which passes through the origin.

If \(\alpha = -1\) then vector \(\boldsymbol y = (-1)\cdot \boldsymbol x = -\boldsymbol x\) is called the opposite vector to \(\boldsymbol x\).

In NumPy all these operations are straightforward:

x = np.linspace(0, 1, num=5)
y = np.arange(1, 6)
z = np.zeros(5)
o = np.ones(6)
print(x)
print(y)
print("Zero vector:", z)
print("Vector of ones:", o)
[0.   0.25 0.5  0.75 1.  ]
[1 2 3 4 5]
Zero vector: [0. 0. 0. 0. 0.]
Vector of ones: [1. 1. 1. 1. 1. 1.]
print("Sum:", x+y)
print("Diff:", y-x)
Sum: [1.   2.25 3.5  4.75 6.  ]
Diff: [1.   1.75 2.5  3.25 4.  ]
print(-y)
[-1 -2 -3 -4 -5]

Vector norm#

If \(\boldsymbol x = (x_1, \ldots, x_n)\), then its length (aka Euclidean norm) is

\[ \Vert \boldsymbol x \Vert = \sqrt{\sum\limits_{k=1}^n x_k^2}. \]

Euclidean norm is a special case of \(p\)-norm (aka Minkowski norm)

\[ \Vert \boldsymbol x \Vert_p = \bigg(\sum\limits_{k=1}^n |x_k|^p \bigg)^\frac 1p, \quad p \geqslant 1. \]

The Minkowski norm becomes

  • Euclidean norm if \(p=2\);

  • Manhattan norm \(\Vert \boldsymbol x \Vert_1 = \sum\limits_{k=1}^n \vert x_k \vert\) if \(p=1\);

  • maximum norm \(\Vert \boldsymbol x \Vert_\infty = \max \{\vert x_1 \vert, \ldots, \vert x_n \vert\}\) if \(p\to\infty\).

A unit vector \(\boldsymbol x\) has norm equal to \(1\): \(\Vert \boldsymbol x\Vert = 1\). The unit ball in \(\mathbb R^n\) is

\[ \{\boldsymbol x \in \mathbb R^n\colon \Vert \boldsymbol x \Vert \leqslant 1\}. \]

The shape of the unit ball depends on the norm. For Euclidean norm it will be a normal ball. Unit disks in two dimensions for different values of \(p\):

The distance between vectors \(\boldsymbol x\) and \(\boldsymbol y\) equals to \(\Vert \boldsymbol x - \boldsymbol y\Vert\). In case of Euclidean norm the distance

\[ \Vert\boldsymbol x - \boldsymbol y\Vert_2 = \sqrt{\sum\limits_{i=1}^n (x_i - y_i)^2} \]

is also called Euclidean.

How to calculate norm in NumPy? Use np.linalg.norm

x = np.array([1, 2, -2])
np.linalg.norm(x)
3.0

To specify \(p\) use parameter ord:

print("1-norm =", np.linalg.norm(x, ord=1))
print("2-norm =", np.linalg.norm(x, ord=2))
print("10-norm =", np.linalg.norm(x, ord=10))
print("infinite norm =", np.linalg.norm(x, ord=np.inf))
1-norm = 5.0
2-norm = 3.0
10-norm = 2.143651567459133
infinite norm = 2.0

Inner product#

Inner product (aka dot product) of vectors \(\boldsymbol x, \boldsymbol y \in \mathbb R^n\) equals

(42)#\[ \langle \boldsymbol x, \boldsymbol y \rangle = \sum\limits_{k=1}^n x_k y_k.\]

Alternative notation: \(\boldsymbol x^\mathsf{T} \boldsymbol y\).

Inner product generates Euclidean norm: if \(\boldsymbol x \in \mathbb R^n\) then

\[ \sqrt{\langle \boldsymbol x, \boldsymbol x \rangle} = \Vert \boldsymbol x \Vert_2. \]

The Cauchy-Schwarz inequality (44) can be written as

\[ \vert\langle \boldsymbol x, \boldsymbol y \rangle\vert \leqslant \Vert \boldsymbol x \Vert_2 \cdot \Vert \boldsymbol y \Vert_2. \]

This inequality turns into equality iff vectors \(\boldsymbol x\) and \(\boldsymbol y\) are collinear.

The vectors \(\boldsymbol x\) and \(\boldsymbol y\) are called orthogonal if \(\langle\boldsymbol x, \boldsymbol y \rangle = 0\). Angle \(\theta\) between two nonzero vectors \(\boldsymbol x\) and \(\boldsymbol y\) is defined from equality

(43)#\[ \cos \theta = \frac{\langle \boldsymbol x, \boldsymbol y \rangle}{\Vert \boldsymbol x\Vert_2 \cdot \Vert \boldsymbol y\Vert_2}.\]

The angle is well-defined since this fraction is always between \(-1\) and \(1\) due to the Cauchy-Schwarz inequality. If \(\langle \boldsymbol x, \boldsymbol y \rangle = 0\), then \(\cos\theta = 0\) and \(\theta = \frac \pi 2\). Hence, the angle between orhogonal vectors equals to \(90°\).

In data analysis and machine learning the formula (43) is often used to measure similarity between vectors \(\boldsymbol x\) and \(\boldsymbol y\): the closer \(\cos \theta\) to \(1\), the more similar the vectors are. By the same reason the quantity \(1-\cos\theta\) is called cosine distance between \(\boldsymbol x\) and \(\boldsymbol y\).

Dot product in NumPy#

There are several way to calculate the inner product of two vectors in Python.

x = np.array([1, 2, 3])
y = np.array([1, -2, 2])
print(np.dot(x, y), x.dot(y), x @ y)
3 3 3

Exercises#

  1. Let \(\boldsymbol x \in \mathbb R^n\). Prove that

    • \(\Vert \boldsymbol x \Vert_\infty \leqslant \Vert \boldsymbol x \Vert_1 \leqslant n\Vert \boldsymbol x \Vert_\infty\);

    • \(\Vert \boldsymbol x \Vert_\infty \leqslant \Vert \boldsymbol x \Vert_2 \leqslant \sqrt{n}\Vert \boldsymbol x \Vert_\infty\);

    • \(\Vert \boldsymbol x \Vert_2 \leqslant \Vert \boldsymbol x \Vert_1 \leqslant \sqrt{n}\Vert \boldsymbol x \Vert_2\).

  1. Prove that \(p\)-norm for \(1\leqslant p \leqslant \infty\) satisfies the following properties:

    • \(\Vert\boldsymbol x\Vert \geqslant 0\), \(\Vert\boldsymbol x\Vert = 0 \iff \boldsymbol x =\boldsymbol 0\);

    • \(\Vert\alpha \boldsymbol x \Vert = \vert\alpha\vert \Vert\boldsymbol x \Vert\) for all \(\alpha \in \mathbb R\), \(\boldsymbol x \in \mathbb R^n\);

    • \(\Vert\boldsymbol x + \boldsymbol y\Vert \leqslant \Vert\boldsymbol x \Vert + \Vert\boldsymbol y \Vert\) для всех \(\boldsymbol x, \boldsymbol y \in \mathbb R^n\) (triangle inequality).

  2. Show that \(\Vert \boldsymbol x \Vert_\infty = \max \{\vert x_1 \vert, \ldots, \vert x_n \vert\}\), i.e.,

    \[ \lim\limits_{p\to+\infty}\bigg(\sum\limits_{k=1}^n |x_k|^p \bigg)^\frac 1p = \max \{\vert x_1 \vert, \ldots, \vert x_n \vert\}. \]
  3. Prove the Cauchy-Schwarz inequality

(44)#\[ \bigg(\sum\limits_{k=1}^n x_k y_k\bigg)^2 \leqslant \sum\limits_{k=1}^n x_k^2 \sum\limits_{k=1}^n y_k^2, \]
  1. Show that cosine distance between two vectors is always between \(0\) and \(2\).

  2. How many arithmetic operations are required to calculate the dot product by formula (42)?