Probabilistic models
In probabilistic approach all quantities are considered as random variables. Each training sample comes from a joint probability distribution with density . If we are using some model of machine learning with parameters , then this density is conditioned on :
The parametric family is called a probabilistic model of a machine learning problem.
Maximum likelihood estimation
The likelihood of the dataset
is
If the samples are i.i.d., then
The optimal weights maximize the likelihood, or, equivalently, log-likelihood:
(33)
Alternatively, one can minimize negative log-likelihood (NLL):
The optimal estimation of weights maximizing log-likelihood (33) is called maximum likelihood estimation (MLE).
Bayesian approach

From (33) we obtain a point estimation . In Bayesian framework we estimate not points but distributions!
Assume that parameters have prior distribution . Observing the dataset , we can apply the Bayes formula and obtain posterior distribution
Maximum a posteriori estimation (MAP) maximizes posterior distribution: