Visualizing Hidden Markov Models

A Hidden Markov Model is in essence a finite state machine. The states of the machine output an observable event with some probability. The states themselves are hidden and transition to each other with some probability.

Vocabulary

$\begin{equation*} V = \{v_1, v_2, \cdots, v_V\} \end{equation*}$

A vocabulary is a set of all possible observations. For example, here is a sequence of $5$ observations.

$\begin{equation*} v_2 \, v_1 \, v_2 \, v_4 \, v_7 \end{equation*}$

Observations can repeat and need not output every element in the Vocabulary

Observations

$\begin{equation*} O = o_1, o_2, \cdots, o_T \end{equation*}$

A sequence of $T$ observations. Each observation is output by a state. For the above example, $T = 5$ and

Length

$\begin{equation*} T \end{equation*}$

A Hidden Markov Model is a sequence of $T$ observations and $T+2$ states. The $2$ extra states are the start and final states.

Set of States

$\begin{equation*} Q = \{q_1, q_2, \cdots, q_N\} \end{equation*}$

$Q$ is a set of $N$ possible states. Special states, $q_s$ and $q_f$ , denote the start and final states respectively.

Hidden States

A Hidden Markov Model will start with $q_s$ , followed by $T$ states and end with $q_f$ . Here is a possible sequence of $5$ hidden states for the above example observations

Rendered by QuickLaTeX.com

States can repeat and need not include every element in the Set of States

Indexing

A state/observation in the sequence of states/observations is indexed by $t$ . Thus,

$\begin{equation*} 1 \leq t \leq T \end{equation*}$

Observation $o_t$ will be output by state $q_t$ . This state can be any state from the set $Q$ . A state in the set of states is indexed by $i$ or $j$ .,

$\begin{equation*} 1 \leq i, j \leq N \end{equation*}$

When the $t^{th}$ state in the sequence of states is state $j$ , it is defined as follows

$\begin{equation*} q_t = j \end{equation*}$

The above example can thus be redrawn as

Rendered by QuickLaTeX.com

Transition Probability Matrix

Transitions between the states are expressed as probabilities. $a_{ij}$ denotes the probability of moving from state $i$ to state $j$ .

$\begin{equation*} a_{ij} = P(q_{t+1} = j|q_t = i) \text{ and } 1 \leq i, j \leq N \text{ and } i = s, j = f \end{equation*}$

A state has a transition probability to every other state in $Q$ and the final state $q_f$ .

Rendered by QuickLaTeX.com

By the law of total probability

$\begin{equation*} \sum_{j=1}^{N}a_{ij} + a_{if} = 1 \quad \quad \forall i \end{equation*}$

The matrix can be defined as follows. Each row must add up to $1$ .

$\begin{equation*} A = \begin{bmatrix} $a_{s1}$ & $a_{s2}$ & $\cdots$ & $a_{sN}$ & $a_{sf}$ \\ $a_{11}$ & $a_{12}$ & $\cdots$ & $a_{1N}$ & $a_{1f}$ \\ $\vdots$ & $\vdots$ & $a_{ij}$ & $\vdots$ & $\vdots$ \\ $a_{N1}$ & $a_{N2}$ & $\cdots$ & $a_{NN}$ & $a_{Nf}$ \\ \end{bmatrix} \end{equation*}$

For simplicity, here is an example $T = 2$ configuration

Rendered by QuickLaTeX.com

The Transition Probability Matrix for the above configuration will be

$\begin{equation*} A = \begin{bmatrix} $0.3$ & $0.7$ & $0$ \\ $0.2$ & $0.4$ & $0.4$ \\ $0.3$ & $0.5$ & $0.2$ \\ \end{bmatrix} \end{equation*}$

Observation Likelihood

The emission of an observation by a state is also expressed as a probability. $b_j(o_t)$ is the probability of an observation $o_t$ being generated from a state $j$ . These are also called emission probabilities.

$\begin{equation*} b_j(o_t) = P(o_t \vert q_t = j) \quad \quad 1 \leq j \leq N \end{equation*}$

Every state in $Q$ has an emission probability for observation $o_t$ .

Rendered by QuickLaTeX.com

The matrix can be defined as follows

$\begin{equation*} B = \begin{bmatrix} $b_{1}(o_1)$ & $b_{1}(o_2)$ & $\cdots$ & $b_{1}(o_t)$ & $\cdots$ & $b_{1}(o_T)$ \\ $b_{2}(o_1)$ & $b_{2}(o_2)$ & $\cdots$ & $b_{2}(o_t)$ & $\cdots$ & $b_{2}(o_T)$ \\ $\vdots$ & $\vdots$ & $b_{j}(o_t)$ & $\vdots$ & $\vdots$ \\ $b_{N}(o_1)$ & $b_{N}(o_2)$ & $\cdots$ & $b_{N}(o_t)$ & $\cdots$ & $b_{N}(o_T)$ \\ \end{bmatrix} \end{equation*}$

Extending the previous example

Rendered by QuickLaTeX.com

The Observation Likelihood Matrix for the above configuration will be

$\begin{equation*} B = \begin{bmatrix} $0.7$ & $0.5$ \\ $0.6$ & $0.6$ \\ \end{bmatrix} \end{equation*}$

Hidden Markov Model

A Hidden Markov Model $\lambda$ is thus defined by the above two matrices $A$ and $B$ . This is written as

$\begin{equation*} $\lambda = (A, B)$ \end{equation*}$

In addition, a first-order Hidden Markov Model also follows two simplifying assumptions.

Markov Assumption

The probability of a particular state depends only on the previous state

$\begin{equation*} P(q_i = a \vert q_1 \cdots q_{i-1}, \lambda) = P(q_i = a \vert q_{i-1}, \lambda) \end{equation*}$

Output Independence

The probability of an output observation $o_t$ , depends only on the state that produced the observation $q_t$ and not on any other states or any other observations

$\begin{equation*} P(o_t \vert q_1, \cdots, q_t, \cdots, q_T, o_1, \cdots, o_{t-1}, o_{t+1}, \cdots, o_T, \lambda) = P(o_t \vert q_t, \lambda) \end{equation*}$

Thank You

Mark – for pointing out typos and errors

2 Comments → Visualizing Hidden Markov Models

Mark March 3, 2020 at 11:57 pm

Thank you for the excellent write up! Very clearly explained. However, I believe the example state transition probability matrix (A) should have the 0.3 and 0.5 values in swapped positions on the bottom row (i.e. 0.3 0.5 0.2)

Reply ↓
1. TheBeard March 4, 2020 at 4:11 am
  
  I have fixed the error you have pointed out. Thank you.
  
  Reply ↓

The Beard Sage

Visualizing Hidden Markov Models

Vocabulary

Observations

Length

Set of States

Hidden States

Indexing

Transition Probability Matrix

Observation Likelihood

Hidden Markov Model

Markov Assumption

Output Independence

Thank You

2 Comments → Visualizing Hidden Markov Models

Leave a Reply Cancel reply