Maximum Likelihood Estimation – finding the best Parametric Model

Consider the situation where a collection of data samples has been obtained. It is assumed that this data is drawn from a parametric model. The parameters of this model are unknown.

For example, the collection of data samples can be the heights of all the students in a class. It is assumed that this data fits a Gaussian distribution. A Gaussian distribution is defined by two parameters – the mean \mu and the standard deviation \sigma. The goal is to find the best Gaussian distribution (defined by the parameters) that fits the data.

To find the best parameters that fit the data samples, the probability function of the parameters is computed using the given data. The goal of Maximum Likelihood Estimate is to find the parameter value(s) which maximizes this probability.


The Likelihood is defined as P(\text{data} | p). This is a function of both the data and the parameter(s) p. The likelihood P(\text{data} | p) changes as the parameter of interest p changes.

Maximum Likelihood Estimate (MLE)

The maximum likelihood estimate (MLE) for the parameter p is the value of p that maximizes the likelihood P(\text{data} | p). That is, the MLE is the value of p for which the data is most likely.

The notation \hat{p} is used for the MLE. It can be computed by taking the derivative of the likelihood function and setting it to 0.

Example 1

A coin is flipped 100 times. Given that there were 55 heads, find the maximum likelihood estimate for the probability p of heads on a single toss.


The data is the result of the experiment. In this case it is 55 heads.

Parameter(s) of Interest

The value of the unknown parameter p

For a given value of p, the probability of getting 55 heads in this experiment is the binomial probability.

    \begin{equation*}     P(55 \text{heads}|p) = \binom{100}{55}p^{55}(1-p)^{45} \end{equation*}

This is read as “the probability of 55 heads given that the probability of heads on a single toss is p.”

Setting the derivative to 0

    \begin{equation*}      \frac{d}{dp}P(\text{data} | p) = \binom{100}{55}(55p^{54}(1-p)^{45} - 45p^{55}(1-p)^{44}) = 0  \end{equation*}

Solving for p

    \begin{equation*}     55p^{54}(1-p)^{45} - 45p^{55}(1-p)^{44} = 0  \end{equation*}

    \begin{equation*}     55(1-p) = 45p  \end{equation*}

    \begin{equation*}     55 = 100p  \end{equation*}

Thus the MLE is \hat{p} = .55

Log Likelihood

If is often easier to work with the natural log of the likelihood function. For short this is simply called the log likelihood. Since ln(x) is an increasing function, the maxima of the likelihood and log likelihood coincide.

Example 2

Redoing Example 1 using log likelihood

    \begin{equation*} 	\text{ln}(P(55 \text{ heads }|p)) = \text{ln}\Bigg(\binom{100}{55}\Bigg) + 55\text{ ln}(p) + 45\text{ ln}(1-p) \end{equation*}

    \begin{equation*} 	\frac{d}{dp}(\text{log likelihood}) = \frac{d}{dp}\Bigg[\text{ln }\Bigg(\binom{100}{55}\Bigg) + 55\text{ ln}(p) + 45\text{ ln}(1-p)\Bigg] \end{equation*}

    \begin{equation*} 	\frac{55}{p} = \frac{45}{1-p} \end{equation*}

    \begin{equation*} 	55(1-p) = 45p \end{equation*}

    \begin{equation*} 	55 = 100p \end{equation*}

Thus the MLE is \hat{p} = .55

Example 3

Suppose that a particular gene occurs as one of two alleles (A and a), where allele A has frequency \theta in the population. That is, a random copy of the gene is A with probability \theta and a with probability 1 - \theta. Since a diploid genotype consists of two genes, the probability of each genotype is given by

probability\theta^22 \theta (1 - \theta)(1 - \theta)^2

A test of random sample of people found that k_1 are AA, k_2 are Aa, and k_3 are aa. Find the MLE of \theta.


k_1 are AA, k_2 are Aa, and k_3 are aa.

Parameter(s) of Interest


The Likelihood is given by

    \begin{equation*}  P(k_1, k_2, k_3| \theta) = \binom{k_1 + k_2 + k_3}{k_1}\binom{k_2 + k_3}{k_2}\binom{k_3}{k_3}\theta^{2k_1}(2\theta(1-\theta))^{k_2}(1-\theta)^{2k_3} \end{equation*}

The log Likelihood is given by

    \begin{equation*}  \text{constant} + 2k_1\text{ ln}(\theta) + k_2\text{ ln}(\theta) + k_2\text{ ln}(1 - \theta) + 2k_3\text{ ln}(1 - \theta) \end{equation*}

Set the derivative equal to 0

    \begin{equation*}  \frac{2k_1 + k_2}{\theta} - \frac{k_2 + 2k_3}{1 - \theta} = 0 \nonumber \end{equation*}

Solving for \theta,

    \begin{equation*} 	\hat{\theta} = \frac{2k_1+k_2}{2k_1 + 2k_2 + 2k_3} \end{equation*}

which is simply the fraction of A alleles among all the genes in the sampled population.


Leave a Reply

Your email address will not be published. Required fields are marked *