Short Primer on Probability


A random variable is denoted in capital, V_i and the values it can take is denoted in small v_i.

Consider a collection of k random variables (V_1, V_2, \cdots, V_k). Random variables can be thought of as features of a particular domain of interest.

For example, the result of a coin toss can be represented using a single random variable, C. This variable can take either of the categorical values H or T. If the same coin is tossed k times, this can be represented using k variables (C_1, C_2, \cdots C_k). Each of these values can be either H or T.

    \begin{equation*} C_i \in \{H, T\} \end{equation*}

Joint Probability

An expression of the form p(V_1, V_2, \cdots , V_k) is called a joint probability function over the variables V_1, V_2, \cdots , V_k. The joint probability is defined when the values of (V_1, V_2, \cdots , V_k) are (v_1, v_2, \cdots, v_k) respectively. This is denoted by the expression

    \begin{equation*} p(V_1 = v_1, V_2 = v_2, \cdots , V_k = v_k) \end{equation*}

This is sometimes abbreviated as p(v_1, v_2, \cdots , v_k).

For a fair coin toss, p(H) = p(T) = 1/2. If a fair coin is tossed five times, p(H,T,T,H,T) = 1/32.

The joint probability function satisfies

    \begin{equation*} 0 \leq p(V_1, V_2, \cdots , V_k) \leq 1 \end{equation*}

    \begin{equation*} \sum p(V_1, V_2, \cdots , V_K) = 1 \end{equation*}

Marginal Probability

The marginal probability of one of the random variables can be computed if the values of all of the joint probabilities for a set of random variables are known.

For example, the marginal probability p(B=b) is defined to be the sum of all those joint probabilities for which B=b

    \begin{align*}      p(B=b) & = \sum_{B=b}p(B,M,L,G) \\      p(B=b,M=m) & = \sum_{B=b,M=m}p(B,M,L,G) \\ \end{align*}

When dealing with propositional variables (True/False) P(B=True, M=False) is denoted as P(B, \neg M).

Conditional Probabilities

The conditional probability of V_i given V_j is denoted by P(V_i|V_j).

    \begin{equation*} P(V_i|V_j) = \frac{P(V_i,V_j)}{P(V_j)} \end{equation*}

where P(V_i,V_j) is the joint probability of V_i and V_j and P(V_j) is the marginal probability of V_j. Thus

    \begin{equation*}      P(V_i,V_j) = P(V_i|V_j){P(V_j)} \end{equation*}

Joint conditional probabilities of several variables conditioned on several other variables is expressed as

    \begin{equation*}      P(\neg G, B|\neg M, L) = \frac{P(\neg G, B, \neg M, L)}{P(\neg M, L)} \end{equation*}

Venn diagram clearly illustrating conditional probability. Calculating marginal probability from joint probabilities is obvious. P(B) = P(B,M) + P(B,\neg M).

A joint probability can be expressed in terms of a chain of conditional probabilities.

    \begin{equation*}      P(B,L,G,M) = P(B|L,G,M)P(L|G,M)P(G|M) \end{equation*}

The general form of this chain rule is

    \begin{equation*}      P(V_1, V_2, \cdots ,V_k) = \prod_{i=1}^k P(V_i|V_{i-1}, \cdots , V_1)  \end{equation*}

Bayes Rule

Different possible orders give different expressions but they all have the same value for the same set of variable values. Since the order of variables is not important

    \begin{equation*}      P(V_i, V_j) = P(V_i|V_j)P(V_j) = P(V_j|V_i)P(V_i) = P(V_j,V_i)  \end{equation*}

Which gives Bayes’ Rule

    \begin{equation*}      P(V_i| V_j) = \frac{P(V_j|V_i)P(V_i)}{P(V_j)}  \end{equation*}

Probabilistic Inference

In set notation \mathcal{V} = {V_1, \cdots , V_k}, P(\mathcal{V}) = P({V_1, \cdots , V_k}), The variables V_1, \cdots, V_k having the values v1, \cdots , v_k respectively is denoted by \mathcal{V} = \bm{v}, where \mathcal{V} and \bm{v} are ordered lists.

For a set \mathcal{V}, the variables in a subset \mathcal{E} = \bm{e} of \mathcal{V} are given as evidence.

    \begin{equation*}  P({V}_i=True|\mathcal{E}=\bm{e}) = \frac{P({V}_i=True,\mathcal{E}=\bm{e})}{P(\mathcal{E}=\bm{e})} \end{equation*}

    \begin{equation*}  P({V}_i=True,\mathcal{E}=\bm{e}) = \sum_{{V}_i=True,\mathcal{E}=\bm{e}}P(V_1, \cdots , V_k) \end{equation*}

For example, consider \mathcal{V} = (V_1, V_2, V_3) = (P, Q, R). The evidence \mathcal{E} is R being false. In other words \mathcal{E} = e equates to \neg R.

    \begin{equation*}  P(Q|\neg R) = \frac{P(Q, \neg R)}{P(\neg R)} = \frac{\sum_{Q, \neg R} P(P, Q, R)}{P(\neg R)} = \frac{P(P,Q, \neg R) + P(\neg P,Q, \neg R)}{P(\neg R)} \end{equation*}

    \begin{equation*}  P(\neg Q|\neg R) = \frac{P(\neg Q, \neg R)}{P(\neg R)} = \frac{\sum_{\neg Q, \neg R} P(P, Q, R)}{P(\neg R)} = \frac{P(P,\neg Q, \neg R) + P(\neg P,\neg Q, \neg R)}{P(\neg R)} \end{equation*}

    \begin{equation*}  P(Q|\neg R) + P(\neg Q|\neg R) = 1 \end{equation*}

Thus P(\neg R) need not be computed.

Conditional Independence

A variable V is conditionally independent of a set of variables \mathcal{V}_i given a set \mathcal{V}_j if

(1)   \begin{equation*} 	P(\mathcal{V}|\mathcal{V}_i,\mathcal{V}_j) = P(\mathcal{V}|\mathcal{V}_j) \end{equation*}

\mathcal{V}_i tells nothing more about \mathcal{V} than is already known by knowing \mathcal{V}_j.

(2)   \begin{equation*} 	P(V_i,V_j|\mathcal{V}) = P(V_i|V_j,\mathcal{V})P(V_j|\mathcal{V}) = P(V_i|\mathcal{V})P(V_j|\mathcal{V}) \end{equation*}

Saying that V_i is conditionally independent of V_j given \mathcal{V} also means that V_j is conditionally independent of V_i given \mathcal{V}. The same result also applies to sets \mathcal{V}_i and \mathcal{V}_j.

As a generalization of pairwise independence, the variables V_1, \cdots , V_k are mutually conditionally independent, given a set \mathcal{V} if each of the variables is conditionally independent of all of the others given \mathcal{V}.

(3)   \begin{equation*}  P(V_1, V_2, \cdots , V_k|\mathcal{V}) = \prod_{i=1}^kP(V_i|V_{i-1}, \cdots , V_1|\mathcal{V}) = \prod_{i=1}^kP(V_i|\mathcal{V}) \end{equation*}

When \mathcal{V} is empty

    \begin{equation*} 	P(V_1, V_2, \cdots , V_k) = P(V_1)\cdot P(V_2)\cdot \cdot \cdot P(V_k) \end{equation*}

This implies that the variables are unconditionally independent.

Thank You

  • Mark – for pointing out typos and errors

2 Comments Short Primer on Probability

  1. Mark

    Another great article! Nice, clear explanations.

    I do have some suggestions for improvement though:

    It seems there is an error in the Bayes Rule section. It should be P(V_i|V_j) = P(V_j|V_i)P(V_i)/P(V_j)

    I would also recommend explaining how you introduced the P and not P random variables in the Probabilistic Inference section as it is not quite clear how that works out.

    Finally, it seems your Latex implementation had trouble rendering for the fourth paragraph in the Conditional Independence section (\textbf{mutually conditionally independent}), right above equation (3).

    1. TheBeard

      I have expanded the Probabilistic Inference section and fixed the errors you have pointed out. Thank you.


Leave a Reply

Your email address will not be published. Required fields are marked *