Maximum Likelihood Estimation - finding the best Parametric Model

Consider the situation where a collection of data samples has been obtained. It is assumed that this data is drawn from a parametric model. The parameters of this model are unknown.

For example, the collection of data samples can be the heights of all the students in a class. It is assumed that this data fits a Gaussian distribution. A Gaussian distribution is defined by two parameters – the mean $\mu$ and the standard deviation $\sigma$ . The goal is to find the best Gaussian distribution (defined by the parameters) that fits the data.

To find the best parameters that fit the data samples, the probability function of the parameters is computed using the given data. The goal of Maximum Likelihood Estimate is to find the parameter value(s) which maximizes this probability.

Likelihood

The Likelihood is defined as $P(\text{data} | p)$ . This is a function of both the data and the parameter(s) $p$ . The likelihood $P(\text{data} | p)$ changes as the parameter of interest $p$ changes.

Maximum Likelihood Estimate (MLE)

The maximum likelihood estimate (MLE) for the parameter $p$ is the value of $p$ that maximizes the likelihood $P(\text{data} | p)$ . That is, the MLE is the value of $p$ for which the data is most likely.

The notation $\hat{p}$ is used for the MLE. It can be computed by taking the derivative of the likelihood function and setting it to $0$ .

Example 1

A coin is flipped $100$ times. Given that there were $55$ heads, find the maximum likelihood estimate for the probability $p$ of heads on a single toss.

Data

The data is the result of the experiment. In this case it is 55 heads.

Parameter(s) of Interest

The value of the unknown parameter $p$

For a given value of $p$ , the probability of getting $55$ heads in this experiment is the binomial probability.

$\begin{equation*} P(55 \text{heads}|p) = \binom{100}{55}p^{55}(1-p)^{45} \end{equation*}$

This is read as “the probability of $55$ heads given that the probability of heads on a single toss is $p$ .”

Setting the derivative to $0$

$\begin{equation*} \frac{d}{dp}P(\text{data} | p) = \binom{100}{55}(55p^{54}(1-p)^{45} - 45p^{55}(1-p)^{44}) = 0 \end{equation*}$

Solving for $p$

$\begin{equation*} 55p^{54}(1-p)^{45} - 45p^{55}(1-p)^{44} = 0 \end{equation*}$

$\begin{equation*} 55(1-p) = 45p \end{equation*}$

$\begin{equation*} 55 = 100p \end{equation*}$

Thus the MLE is $\hat{p} = .55$

Log Likelihood

If is often easier to work with the natural log of the likelihood function. For short this is simply called the log likelihood. Since $ln(x)$ is an increasing function, the maxima of the likelihood and log likelihood coincide.

Example 2

Redoing Example 1 using log likelihood

$\begin{equation*} \text{ln}(P(55 \text{ heads }|p)) = \text{ln}\Bigg(\binom{100}{55}\Bigg) + 55\text{ ln}(p) + 45\text{ ln}(1-p) \end{equation*}$

$\begin{equation*} \frac{d}{dp}(\text{log likelihood}) = \frac{d}{dp}\Bigg[\text{ln }\Bigg(\binom{100}{55}\Bigg) + 55\text{ ln}(p) + 45\text{ ln}(1-p)\Bigg] \end{equation*}$

$\begin{equation*} \frac{55}{p} = \frac{45}{1-p} \end{equation*}$

$\begin{equation*} 55(1-p) = 45p \end{equation*}$

$\begin{equation*} 55 = 100p \end{equation*}$

Thus the MLE is $\hat{p} = .55$

Example 3

Suppose that a particular gene occurs as one of two alleles ( $A$ and $a$ ), where allele $A$ has frequency $\theta$ in the population. That is, a random copy of the gene is $A$ with probability $\theta$ and $a$ with probability $1 - \theta$ . Since a diploid genotype consists of two genes, the probability of each genotype is given by

genotype	$AA$	$Aa$	$aa$
probability	$\theta^2$	$2 \theta (1 - \theta)$	$(1 - \theta)^2$

A test of random sample of people found that $k_1$ are $AA$ , $k_2$ are $Aa$ , and $k_3$ are $aa$ . Find the MLE of $\theta$ .

Data

$k_1$ are $AA$ , $k_2$ are $Aa$ , and $k_3$ are $aa$ .

Parameter(s) of Interest

$\theta$

The Likelihood is given by

$\begin{equation*} P(k_1, k_2, k_3| \theta) = \binom{k_1 + k_2 + k_3}{k_1}\binom{k_2 + k_3}{k_2}\binom{k_3}{k_3}\theta^{2k_1}(2\theta(1-\theta))^{k_2}(1-\theta)^{2k_3} \end{equation*}$

The log Likelihood is given by

$\begin{equation*} \text{constant} + 2k_1\text{ ln}(\theta) + k_2\text{ ln}(\theta) + k_2\text{ ln}(1 - \theta) + 2k_3\text{ ln}(1 - \theta) \end{equation*}$

Set the derivative equal to $0$

$\begin{equation*} \frac{2k_1 + k_2}{\theta} - \frac{k_2 + 2k_3}{1 - \theta} = 0 \nonumber \end{equation*}$

Solving for $\theta$ ,

$\begin{equation*} \hat{\theta} = \frac{2k_1+k_2}{2k_1 + 2k_2 + 2k_3} \end{equation*}$

which is simply the fraction of $A$ alleles among all the genes in the sampled population.

References

https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading10b.pdf

The Beard Sage

Maximum Likelihood Estimation – finding the best Parametric Model

Likelihood

Maximum Likelihood Estimate (MLE)

Example 1

Data

Parameter(s) of Interest

Log Likelihood

Example 2

Example 3

Data

Parameter(s) of Interest

References

Leave a Reply Cancel reply