Estimations#

Bias#

Let X1,,Xn be an i.i.d. sample from some distribution Fθ(x). Estimation θ^=θ^(X1,,Xn) of θ is called unbiased if Eθ^=θ. Otherwise θ^ is called biased, and its bias equals to

bias(θ^)=Eθ^θ.

For example, sample average θ^=Xn is unbiased estimate of mean θ since

EXn=1nk=1nEXk=1nnθ=θ.

Sometimes estimation θ^n=θ^(X1,,Xn) is biased, but this bias vanishes as n becomes large. If limnEθ^n=θ, then estimation θ^n is called asymptotically unbiased.

Consistency#

Estimation θ^n=θ^(X1,,Xn) is called consistent if it converges to θ in probability: θ^nPθ, i.e.,

limnP(|θ^nθ|>ε)=0 for all ε>0.

Due to the law of large numbers θ^=Xn is a consistent estimation for expectation θ=EX1 for any i.i.d. sample X1,,Xn.

Bias-variance decomposition#

Mean squared error (MSE) of θ^ is

MSE(θ^)=E(θ^θ)2.

Bias-variance decomposition:

MSE(θ^)=bias2(θ^)+V(θ^).

If limnMSE(θ^n)=0, then estimation θ^n of θ asymptotically unbiased and consistent.

In machine learning bias-variance decomposition is also called bias-variance tradeoff:

https://scott.fortmann-roe.com/docs/docs/BiasVariance/biasvariance.png

Asymptotic normality#

Estimation θ^n is asymptotically normal if θ^nθse(θ^n)DN(0,1), i.e.,

limnP(θ^nθse(θ^n)z)=Φ(z),se(θ^n)=Vθ^n.

If X1,,Xn is an i.i.d. sample from some distribution with finite expection μ and variance σ2, then according to the central limit theorem Xn is asymptotically normal estimation of μ.

Maximum likelihood estimation (MLE)#

Let i.i.d. sample X1,,XnFθ(x). Правдоподобие (функция правдоподобия, likelihood) выборки X1,,Xn — это просто её совместная pmf или pdf. Вне зависимости от типа распределения будем обозначать правдоподобие как

L(θ)L(X1,,Xn|θ)=p(X1,,Xn|θ).

Если выборка i.i.d., то функция правдоподобия распадается в произведение одномерных функций:

L(X1,,Xn|θ)=k=1np(Xk|θ).

Оценка максимального правдоподобия (maximum likelihood estimation, MLE) максимизирует правдоподобие:

θ^ML=argmaxθL(θ)

Поскольку максимизировать сумму проще, чем произведение, обычно переходят к логарифму правдоподобия (log-likelihood). Это особенно удобно в случае i.i.d. выборки, тогда

θ^ML=argmaxθlogL(θ)=argmaxθk=1nlogp(Xk|θ).

Properties of MLE

  • consistency: θ^MLPθ;

  • equivariance: if θ^ML — MLE for θ then φ(θ) — MLE for φ(θ);

  • asymptotic normality: θ^MLθse^DN(0,1);

  • асимптотическая оптимальность: при достаточно больших n оценка θ^ML имеет минимальную дисперсию.

Exercises#

  1. Let X1,,Xn be an i.i.d. sample from U[0,θ] and θ^=X(n). Is this estimation unbiased? Asymptotically unbiased? Consistent?

  2. Show that estimation θ^n is consistent if it is asymptotically unbiased and limnV(θ^n)=0.

  3. Let X1,,Xn be an i.i.d. sample from U[0,2θ]. Show that sample median med(X1,,Xn) is unbiased estimation of θ. See also ML Handbook.

  4. Let X1,,Xn be an i.i.d. sample from a distribution with finite moments EX1 and EX12. Is sample variance Sn unbiased estimation of θ=VX1? Asymptotically unbiased?

  5. There are k heads and nk tails in n independent Bernoulli trials. Find MLE of the probability of heads.

  6. Find MLE estimation of λ if X1,,Xn is an i.i.d. sample from Pois(λ).

  7. Let X1,,Xn be i.i.d. sample from N(μ,τ). Find MLE of μ and τ.

  8. Find MLE estimation of a and b if X1,,XnU[a,b].