Visualizing Hidden Markov Models

A Hidden Markov Model is in essence a finite state machine. The states of the machine output an observable event with some probability. The states themselves are hidden and transition to each other with some probability.


    \begin{equation*} V = \{v_1, v_2, \cdots, v_V\} \end{equation*}

A vocabulary is a set of all possible observations. For example, here is a sequence of 5 observations.

    \begin{equation*} v_2 \, v_1 \, v_2 \, v_4 \, v_7 \end{equation*}

Observations can repeat and need not output every element in the Vocabulary


    \begin{equation*} O = o_1, o_2, \cdots, o_T \end{equation*}

A sequence of T observations. Each observation is output by a state. For the above example, T = 5 and

Rendered by


    \begin{equation*} T \end{equation*}

A Hidden Markov Model is a sequence of T observations and T+2 states. The 2 extra states are the start and final states.

Set of States

    \begin{equation*} Q = \{q_1, q_2, \cdots, q_N\} \end{equation*}

Q is a set of N possible states. Special states, q_s and q_f, denote the start and final states respectively.

Hidden States

A Hidden Markov Model will start with q_s, followed by T states and end with q_f. Here is a possible sequence of 5 hidden states for the above example observations

Rendered by

States can repeat and need not include every element in the Set of States


A state/observation in the sequence of states/observations is indexed by t. Thus,

    \begin{equation*} 1 \leq t \leq T \end{equation*}

Observation o_t will be output by state q_t. This state can be any state from the set Q. A state in the set of states is indexed by i or j.,

    \begin{equation*} 1 \leq i, j \leq N \end{equation*}

When the t^{th} state in the sequence of states is state j, it is defined as follows

    \begin{equation*} q_t = j \end{equation*}

The above example can thus be redrawn as

Rendered by

Transition Probability Matrix

Transitions between the states are expressed as probabilities. a_{ij} denotes the probability of moving from state i to state j.

    \begin{equation*} a_{ij} = P(q_{t+1} = j|q_t = i) \text{ and } 1 \leq i, j \leq N \text{ and } i = s, j = f \end{equation*}

A state has a transition probability to every other state in Q and the final state q_f.

Rendered by

By the law of total probability

    \begin{equation*} \sum_{j=1}^{N}a_{ij} + a_{if} = 1 \quad \quad \forall i \end{equation*}

The matrix can be defined as follows. Each row must add up to 1.

    \begin{equation*} A =  \begin{bmatrix} $a_{s1}$ & $a_{s2}$ & $\cdots$ & $a_{sN}$ & $a_{sf}$ \\ $a_{11}$ & $a_{12}$ & $\cdots$ & $a_{1N}$ & $a_{1f}$ \\ $\vdots$ & $\vdots$ & $a_{ij}$ & $\vdots$ & $\vdots$ \\ $a_{N1}$ & $a_{N2}$ & $\cdots$ & $a_{NN}$ & $a_{Nf}$ \\ \end{bmatrix} \end{equation*}

For simplicity, here is an example T = 2 configuration

Rendered by

The Transition Probability Matrix for the above configuration will be

    \begin{equation*} A =  \begin{bmatrix} $0.3$ & $0.7$ & $0$ \\ $0.2$ & $0.4$ & $0.4$ \\ $0.3$ & $0.5$ & $0.2$ \\ \end{bmatrix} \end{equation*}

Observation Likelihood

The emission of an observation by a state is also expressed as a probability. b_j(o_t) is the probability of an observation o_t being generated from a state j. These are also called emission probabilities.

    \begin{equation*} b_j(o_t) = P(o_t \vert q_t = j) \quad \quad 1 \leq j \leq N \end{equation*}

Every state in Q has an emission probability for observation o_t.

Rendered by

The matrix can be defined as follows

    \begin{equation*} B =  \begin{bmatrix} $b_{1}(o_1)$ & $b_{1}(o_2)$ & $\cdots$ & $b_{1}(o_t)$ & $\cdots$ & $b_{1}(o_T)$ \\ $b_{2}(o_1)$ & $b_{2}(o_2)$ & $\cdots$ & $b_{2}(o_t)$ & $\cdots$ & $b_{2}(o_T)$ \\ $\vdots$ & $\vdots$ & $b_{j}(o_t)$ & $\vdots$ & $\vdots$ \\ $b_{N}(o_1)$ & $b_{N}(o_2)$ & $\cdots$ & $b_{N}(o_t)$ & $\cdots$ & $b_{N}(o_T)$ \\ \end{bmatrix} \end{equation*}

Extending the previous example

Rendered by

The Observation Likelihood Matrix for the above configuration will be

    \begin{equation*} B =  \begin{bmatrix} $0.7$ & $0.5$ \\ $0.6$ & $0.6$ \\ \end{bmatrix} \end{equation*}

Hidden Markov Model

A Hidden Markov Model \lambda is thus defined by the above two matrices A and B. This is written as

    \begin{equation*} $\lambda = (A, B)$ \end{equation*}

In addition, a first-order Hidden Markov Model also follows two simplifying assumptions.

Markov Assumption

The probability of a particular state depends only on the previous state

    \begin{equation*} P(q_i = a \vert q_1 \cdots q_{i-1}, \lambda) = P(q_i = a \vert q_{i-1}, \lambda) \end{equation*}

Output Independence

The probability of an output observation o_t, depends only on the state that produced the observation q_t and not on any other states or any other observations

    \begin{equation*} P(o_t \vert q_1, \cdots, q_t, \cdots, q_T, o_1, \cdots, o_{t-1}, o_{t+1}, \cdots, o_T, \lambda) = P(o_t \vert q_t, \lambda) \end{equation*}

Thank You

  • Mark – for pointing out typos and errors

2 Comments Visualizing Hidden Markov Models

  1. Mark

    Thank you for the excellent write up! Very clearly explained. However, I believe the example state transition probability matrix (A) should have the 0.3 and 0.5 values in swapped positions on the bottom row (i.e. 0.3 0.5 0.2)


Leave a Reply

Your email address will not be published. Required fields are marked *