Random variables#

A random variable is a function \(\xi \colon \Omega \to \mathbb R\) where \(\Omega\) is the set of events or outcomes. It is also called support of \(\xi\). Depending on their support \(\Omega\) all random variables of practical interest are divided into two big groups: discrete and continuous.

A lot of useful tools for working with discrete and continuous distributions can be found in scipy.stats.

Discrete distributions#

All discrete random variables have either finite of countable support:

\[ \Omega = \{x_1, \ldots, x_n\} \text{ or } \Omega = \{x_k\}_{k=1}^\infty. \]

A discrete random variable \(\xi\) is defined by its probability mass function (pmf):

\[ p_k = \mathbb P(\xi = x_k). \]

Every pmf must satisfy the following conditions:

\[ p_k \geqslant 0, \quad \sum\limits_k p_k = 1. \]

Cumulative distribution function (cdf) is

\[ F_\xi(x) = \mathbb P(\xi \leqslant x) = \sum\limits_{k\leqslant x} p_k. \]

Measures of central tendency#

Let \(\xi\) be a discrete random variable.

Expectation (mean) of \(\xi\) is

\[ \mathbb E\xi = \sum\limits_{k} x_k \mathbb P(\xi = x_k) = \sum\limits_{k} p_k x_k. \]

Expectation is a linear operation: \(\mathbb E(a\xi + b\eta) = a \mathbb E \xi + b \mathbb E \eta\).

Variance of \(\xi\) is

\[ \mathbb V\xi = \mathbb E\big(\xi - \mathbb E\xi\big)^2 = \mathbb E\xi^2 - (\mathbb E\xi)^2 = \sum\limits_{k} x_k^2 p_k - \Big(\sum\limits_{k} p_k x_k\Big)^2. \]

Note that \(\mathbb V\xi \geqslant 0\) for all \(\xi\).

Question

In which cases the equality \(\mathbb V \xi = 0\) is possible?

Is variance linear?

Does the equality \(\mathbb V(a\xi + b\eta) = a \mathbb V \xi + b \mathbb V \eta\) hold?

Standard deviation is equal to the square root of variance: \(\mathrm{sd}(\xi) = \sqrt{\mathbb V\xi}\).

Median of \(\xi\) is defined as such number \(m\) for which

\[ \mathbb P(\xi \leqslant m) \geqslant \frac 12, \quad \mathbb P(\xi \geqslant m) \geqslant \frac 12. \]

Mode of \(\xi\) is the point of maximum of its pmf:

\[ \mathrm{mode}(\xi) = \underset{k}{\operatorname{argmax}} \mathbb P(\xi = x_k). \]

Bernoulli distribution#

Bernoulli trial has two outcomes: \(\Omega = \{0, 1\}\), \(0\) = «failure», \(1\) = «success»,

\[ \mathbb P(\text{«success»}) = p, \quad \mathbb P(\text{«failure»}) = 1 - p, \quad 0 \leqslant p \leqslant 1. \]

A typical example of a Bernoulli trial is tossing a coin. If the coin is fair then \(p=\frac 12\).

Bernoulli random variable \(\xi \sim \mathrm{Bern}(p)\) is indicator of success:

\[ \mathbb P(\xi = 1) = p, \quad \mathbb P(\xi = 0) = 1 - p. \]

If \(\xi \sim \mathrm{Bern}(p)\), then

\[ \mathbb E \xi = p, \quad \mathbb V \xi = p(1 - p). \]

In machine learning Bernoulli distribution models outputs of all binary classifiers.

Bernoulli sampling in scipy:

from scipy.stats import bernoulli
bernoulli.rvs(0.3, size=15)
array([1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0])
../../_images/36b0e5f23e0a356d8755db660bb49af885e4cb6c2bd4a270313f7a343c7ee675.svg

Binomial distribution#

The number of successes in \(n\) independent Bernoulli trials is called binomial random variable: \(\xi \sim \mathrm{Bin}(n, p)\) if

\[ \xi = \xi_1 + \ldots + \xi_n, \quad \xi_k \sim \mathrm{Bern}(p). \]

The pmf of \(\xi \sim \mathrm{Bin}(n, p)\) is

\[ p_k = \mathbb P(\xi = k) = \binom nk p^k (1-p)^k, \quad 0 \leqslant k \leqslant n. \]

Due to the binomial theorem this is a correct probability distribution, since

\[ \sum\limits_{k=0}^n\binom nk p^k (1-p)^k = (p + 1 - p)^n = 1. \]

If \(\xi \sim \mathrm{Bin}(n, p)\) then

\[ \mathbb E \xi = np,\quad \mathbb V\xi = np(1-p). \]
from scipy.stats import binom
binom.rvs(10, 0.5, size=15)
array([7, 6, 4, 6, 4, 9, 6, 6, 3, 4, 4, 5, 4, 2, 7])
../../_images/09ba3760e4bb7cca01db4a6c6fe928e49ff3745cc01ac18ea146696fe86b544c.svg

Geometric distribution#

The number of independent Bernoulli trials before first success is called geometric random variable. Hence, \(\xi \sim \mathrm{Geom}(p)\) if

\[ p_k = \mathbb P(\xi = k) = q^{k-1} p, \quad q = 1-p, \quad k\in\mathbb N. \]

Note that geometric distribution has countable support. Since \(\sum\limits_{k=1}^\infty q^{k-1} = \frac 1{1-q} = \frac 1p\) if \( 0 < q < 1\), all probabilities \(p_k\) are summed up to \(1\).

If \(\xi \sim \mathrm{Geom}(p)\) then

\[ \mathbb E \xi = \frac 1p, \quad \mathbb V\xi = \frac{1-p}{p^2}. \]
from scipy.stats import geom
print("High success probability:")
print(geom.rvs(0.8, size=15))
print("Low success probability:")
print(geom.rvs(0.1, size=15))
High success probability:
[1 1 2 1 1 1 1 1 1 2 5 1 1 1 1]
Low success probability:
[10  4 45  2 15  9  4  9  9  7  2  4  8  5  1]
../../_images/143e4e094b7db504ce0e8f8dc98e29fb839b66d8f68f270db5fd32f097b5ded0.svg

Poisson distribution#

A random variable \(\xi\) has Poisson distribution \(\mathrm{Pois}(\lambda)\), \(\lambda > 0\), if

\[ p_k = \mathbb P(\xi = k) = \frac{\lambda ^k}{k!} e^{-\lambda}, \quad k \in \mathbb N \cup \{0\}. \]

If \(\xi \sim \mathrm{Pois}(\lambda)\) then

\[ \mathbb E \xi = \lambda, \quad \mathbb V\xi = \lambda. \]
from scipy.stats import poisson
poisson.rvs(10, size=15)
array([11, 10,  4,  4,  7, 17,  8, 11, 10,  9, 13,  7, 20,  7,  7])
../../_images/541725bfef5a7a902925ae16f71d71ceb21df56eb21c7c476b8ed2db6dc677bf.svg

In some cases Poisson distribution can serve as an approximation to binomial one.

Theorem (Poisson)

Let \(\xi \sim \mathrm{Bin}(n, p_n)\) and \(\lim\limits_{n \to \infty} np_n = \lambda > 0\). Then

\[ \lim\limits_{n \to \infty} \mathbb{P}(\xi = k) = e^{-\lambda} \frac{\lambda^k}{k!}, \quad k \in \mathbb N \cup \{0\} \]

In other words,

\[ \mathrm{Bin}(n, p_n) \stackrel{D}{\to} \mathrm{Pois}(\lambda) \text { if } np_n \to \lambda. \]

Continuous distributions#

The support \(\Omega\) of a continuous random variable \(\xi\) is usually a subset of \(\mathbb R\). In this case \(\xi\) is specified by its probability density function (pdf) \(p_\xi(x)\) such that

\[ \quad \mathbb P(\xi \in [a, b]) = \int\limits_a^b p_\xi(x)\, dx. \]

Any pdf must be nonnegative and \(\int\limits_{-\infty}^\infty p_\xi(x)\, dx = 1\). Also,

\[ F_\xi(x) = \mathbb P(\xi \leqslant x) = \int\limits_{-\infty}^x p_\xi(t)\,dt, \]

and, consequently, the derivative of cdf is equal to pdf: \(F_\xi'(x) = p_\xi(x)\).

Measures of central tendency#

Let \(\xi\) be a continuous random variable whose pdf is \(p_\xi(x)\).

Expectation (mean) of \(\xi\) is

\[ \mathbb E\xi = \int\limits_{-\infty}^{\infty} xp_\xi(x)\,dx \]

Variance of \(\xi\) is

\[ \mathbb V\xi = \mathbb E\big(\xi - \mathbb E\xi\big)^2 = \mathbb E\xi^2 - (\mathbb E\xi)^2 = \int\limits_{-\infty}^{\infty} x^2p_\xi(x)\,dx - \Big(\int\limits_{-\infty}^{\infty} xp_\xi(x)\,dx\Big)^2. \]

Standard deviation is equal to the square root of variance: \(\mathrm{sd}(\xi) = \sqrt{\mathbb V\xi}\).

Median of \(\xi\) is defined as such number \(m\) for which \(F_\xi(x) = \frac 12\).

Mode of \(\xi\) is the point of maximum of its pdf:

\[ \mathrm{mode}(\xi) = \underset{k}{\operatorname{argmax}} p_\xi(x). \]

Uniform distribution#

Uniform random variable has constant pdf: \(\xi \sim U[a, b]\) if

\[\begin{split} p_\xi(x) = \frac {\mathbb I(a\leqslant x \leqslant b)}{b-a} = \begin{cases} \frac 1{b-a},& x\in[a, b],\\ 0, & x \notin [a, b]. \end{cases} \end{split}\]

Если \(\xi \sim U[a,b]\), то

\[ \mathbb E \xi = \frac {a+b}2,\quad \mathbb V \xi = \frac{(b-a)^2}{12}. \]

In scipy uniform random samples are drawn from the interval \([\mathrm{loc}, \mathrm{loc} + \mathrm{scale}]\):

from scipy.stats import uniform
uniform.rvs(loc=-5, scale=15, size=10)
array([ 1.85095823, -2.5877448 ,  6.75810859, -3.66084167,  0.56895383,
       -3.94562624,  7.59477299,  3.52603733,  9.08417991,  7.27618305])

Normal distribution#

A random variable \(\xi\) has normal (or gaussian) distribution \(\mathcal N(\mu, \sigma^2)\) if its pdf equals

\[ p_\xi(x) = \frac 1{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}}. \]

The parameters of the normal distribution \(\mathcal N(\mu, \sigma^2)\) are its mean and variance:

\[ \mathbb E \xi = \mu,\quad \mathbb V \xi = \sigma^2. \]

\(\mathcal N(0, 1)\) is called standard normal distribution.

from scipy.stats import norm
print("Samples from N(0,1):")
print(norm.rvs(size=5))
print("Samples from N(5, 10):")
print(norm.rvs(loc=5, scale=10, size=5))
print("Samples from N(0, 0.1):")
print(norm.rvs(scale=0.1, size=5))
Samples from N(0,1):
[ 0.47812683  1.77411747 -1.20850805  0.25000533  0.88742771]
Samples from N(5, 10):
[ 7.3156586  10.44361727 10.07517477 -1.17424834 -0.11639069]
Samples from N(0, 0.1):
[ 0.0469937  -0.11476238 -0.03641318 -0.02443018 -0.02457553]

Exponential distribution#

Exponential random variable \(\xi \sim \mathrm{Exp}(\lambda)\), \(\lambda > 0\), has pdf

\[ p_\xi(x) = \lambda e^{-\lambda x}, \quad x\geqslant 0. \]

If \(\xi \sim \mathrm{Exp}(\lambda)\) then

\[ \mathbb E \xi = \frac 1\lambda,\quad \mathbb V \xi = \frac 1{\lambda^2}. \]

scipy.stats.expon generates random samples from \(\mathrm{Exp}\big(\frac 1{\mathrm{scale}}\big)\) shifted by loc:

from scipy.stats import expon
expon.rvs(loc=5, scale=6, size=10)
array([ 6.46594756, 20.42456844, 16.54476465, 15.23176985, 12.74708399,
        5.85847301,  7.58332855,  8.1501342 ,  5.48472863, 11.36178256])

Gamma distribution#

A random variable \(\xi\) has gamma distribution \(\mathrm{Gamma}(\alpha, \beta)\), \(\alpha, \beta > 0\), if

\[ p_\xi(x) = \frac 1{\Gamma(\alpha) \beta^\alpha} x^{\alpha - 1} e^{-\frac x \beta},\quad x \geqslant 0, \]

where \(\Gamma(\alpha) = \int\limits_0^\infty x^{\alpha - 1}e^{-x}\,dx\).

Exercise

Show that \(\mathrm{Exp}(\lambda)\) is a special case of \(\mathrm{Gamma}(\alpha, \beta)\).

If \(\xi \sim \mathrm{Gamma}(\alpha, \beta)\) then

\[ \mathbb E\xi = \alpha\beta, \quad \mathbb V\xi = \alpha\beta^2. \]
from scipy.stats import gamma
a = 1
gamma(a).rvs(size=10)
array([0.87913032, 0.05700231, 0.59966204, 0.01763292, 0.44387269,
       1.61046101, 1.69198435, 0.25095155, 0.52575033, 0.17856823])
../../_images/a5818ea6dacc8d099142eba0d1759286b26495435df20684af651dc4ca09b70d.svg

Beta distribution#

A random variable \(\xi\) has beta distribution \(\mathrm{Beta}(\alpha, \beta)\), \(\alpha, \beta > 0\), if

\[ p_\xi(x) = \frac 1{B(\alpha, \beta)} x^{\alpha - 1} (1-x)^{\beta -1}, \quad 0 < x < 1, \]

where \(B(\alpha, \beta) = \int\limits_0^1 x^{\alpha - 1} (1-x)^{\beta - 1}\,dx\).

from scipy.stats import beta
beta.rvs(a=0.5, b=3, size=5)
array([0.03996637, 0.05850063, 0.00144506, 0.33950118, 0.45092543])
../../_images/fa77fefb18cc876202e084efc01bd6fab74c60e6d1d343c3c0471ec6676813dc.svg

Student \(t\)-distribution#

A random variable \(\xi\) has Student \(t\)-distribution with \(\nu\) degrees of freedom if

\[ p_\xi(x) = \frac{\Gamma(\frac{\nu+1}{2})} {\sqrt{\nu\pi}\,\Gamma(\frac{\nu}{2})} \left(1+\frac{x^2}\nu \right)^{-(\nu+1)/2}, \quad \nu > 0. \]

This distribution is similar to \(\mathcal N(0,1)\), but for small values of \(\nu\) it has much heavier tails. Because of that \(\mathbb V\xi\) does not exist if \(\nu \leqslant 2\), and for \(\nu \leqslant 1\) even \(\mathbb E\xi\) does not exist. In other cases

\[ \mathbb E \xi = 0, \quad \mathbb V \xi = \frac{\nu}{\nu-2}. \]
from scipy.stats import t
print("Light tails:")
print(t(5).rvs(size=8))
print("Heavy tails:")
print(t(1).rvs(size=10))
print("Extremely heavy tails:")
print(t(0.2).rvs(size=4))
Light tails:
[-0.4534318   0.97552928  0.6662617   1.2678379  -1.43713886 -1.03014751
 -1.25687696  0.89163291]
Heavy tails:
[-0.90401491  1.30719042 -1.44579458 -1.55892165  3.76804303 -0.95646527
  7.23021483  0.81152841  0.53953514 -1.60768315]
Extremely heavy tails:
[  0.26389071  -0.27122753  -0.23447981 -39.09763726]

As \(\nu \to +\infty\), \(t\)-distribution tends to \(\mathcal N(0,1)\):

nu = 1002
t(nu).mean(), t(nu).var()
(0.0, 1.002)

Laplace distribution#

Pdf of laplacian random variable \(\xi \sim \mathrm{Laplace}(\mu, b)\), \(b > 0\), is

\[ p_\xi(x) = \frac 1{2b} e^{-\frac{\vert x - \mu\vert}b}. \]

If \(\xi \sim \mathrm{Laplace}(\mu, b)\), then

\[ \mathbb E \xi = \mu, \quad \mathbb V \xi = 2b^2. \]

Exercises#

  1. Choose all possible supports of the uniform distribution:

    • \(\Omega = \varnothing\)

    • \(\Omega = \{1, 2, \ldots, n\}\)

    • \(\Omega = \mathbb N\)

    • \(\Omega = [a, b)\)

    • \(\Omega = [0, +\infty)\)

    • \(\Omega = \mathbb R\)

  2. Каждый день рекламу компании X в поисковой выдаче видят ровно \(1000\) человек. Вчера \(50\) из них кликнули на рекламу. С какой вероятностью не менее 50 людей кликнут на ее рекламу сегодня?

  3. Известно, что на поисковой выдаче на рекламу компании X кликает в среднем примерно 50 пользователей в день. Количество показов достаточно большое и может меняться изо дня в день. С какой вероятностью сегодня будет совершено не менее 50 кликов по рекламным объявлениям?

  4. Show then mean, median and mode of \(\mathcal N(\mu, \sigma^2)\) are equal to \(\mu\).

  5. Find mode of \(\mathrm{Gamma}(\alpha, \beta)\).

  6. Find mode of \(\mathrm{Beta}(\alpha, \beta)\).

  7. Let \(\xi_\nu \sim \mathrm{Student}(\nu)\), \(\xi \sim \mathcal N(0, 1)\). Prove that \(\xi_\nu \stackrel{D}{\to} \xi\) as \(\nu \to +\infty\), i.e.,

    \[ \lim\limits_{\nu \to +\infty}p_{\xi_\nu}(x) = p_\xi(x) \text{ for all } x\in \mathbb R. \]
n = 1000
p = 0.05
lam = 50
from scipy.stats import binom, poisson
xi = binom(n, p)
eta = poisson(lam)
print("Binomial answer:", 1 - xi.cdf(49))
print("Poisson answer:", 1 - eta.cdf(49))
Binomial answer: 0.5202589429126758
Poisson answer: 0.5188083154720433