Hidden Markov Models Decoding – The Viterbi Algorithm


Given as input an HMM \lambda = (A, B) and a sequence of observations O = o_1,o_2, \cdots ,o_T, determine the most probable sequence of states Q = q_1,q_2,\cdots, q_T that produced those observations.

Viterbi Algorithm

The Viterbi Algorithm is similar to another dynamic programming problem – the minimum edit distance algorithm. The Viterbi algorithm is identical to the Forward algorithm except that it takes the max over the previous path probabilities whereas the forward algorithm takes the sum.

v_t(j) represents the probability that the HMM is in state j after seeing the first t observations and passing through the most probable state sequence q_1,\cdots,q_{t-1}, given the HMM \lambda. The most probable path is represented by taking the maximum over all possible previous state sequences.

    \begin{align*}  v_t(j) & = \max_{q_0,...,q_{t-1}} P(q_0,...,q_{t-1},o_1,o_2,...,o_t,q_t = j|\lambda)  \\  & = \max_{q_0,...,q_{t-2},q_{t-1}} P(q_0,...,q_{t-2},o_1,...,o_{t-1},o_t,q_{t-1} = i,q_t = j|\lambda)  \\  & = \max_{i=1}^{N}\max_{q_0,...,q_{t-2}} P(q_0,...,q_{t-2},o_1,...,o_{t-1},o_t,q_{t-1} = i,q_t = j|\lambda)  \\  & = \max_{i=1}^{N}\max_{q_0,...,q_{t-2}} P(q_0,...,q_{t-2},o_1,...,o_{t-1},q_{t-1} = i|\lambda)P(o_t,q_t=j|q_0,...,q_{t-2},o_1,...,o_{t-1},q_{t-1} = i,\lambda)  \\  & = \max_{i=1}^{N}v_{t-1}(i)P(o_t,q_t=j|q_0,...,q_{t-2},o_1,...,o_{t-1},q_{t-1} = i,\lambda)  \\  & = \max_{i=1}^{N}v_{t-1}(i)P(o_t,q_t=j|q_{t-1} = i,\lambda)  \\  & = \max_{i=1}^{N}v_{t-1}(i)P(o_t|q_t=j,q_{t-1} = i,\lambda)P(q_t=j|q_{t-1} = i,\lambda)  \\  & = \max_{i=1}^{N}v_{t-1}(i)P(o_t|q_t=j,\lambda)P(q_t=j|q_{t-1} = i,\lambda)  \\  & = \max_{i=1}^{N}v_{t-1}(i)a_{ij}b_j(o_t)  \\ \end{align*}

where v_{t-1}(i) is the previous Viterbi path probability from the previous time step.


The Viterbi algorithm also keeps track of backpointers to produce the most likely state sequence. At the end, backtracking via these backpointers results in the best path to the beginning.

bt_t(j) stores the state at time t-1 that maximizes the probability that system was in state j at time t (given the observed sequence).


Rendered by QuickLaTeX.com

Visualizing the computation of a single element v_t(j) in the trellis by finding the maximum among all the paths from v_{t-1}(i), weighted by their transition probabilities a_{ij}, and multiplying by the observation probability b_j(o_{t}). Hidden states are in circles, observations in squares. Shaded nodes are included in the probability computation for v_t(j). As each path is extended to a new state to account for the next observation, a backpointer (shown in red) keeps track of the best path that led to this state.


    \begin{align*}  v_1(j) = P(o_1,q_1 = j|\lambda) = P(o_1|q_1 = j,\lambda)P(q_1=j|\lambda) = a_{0j}b_j(o_1) \text{ }\text{ }\text{ } 1 \leq j \leq N \\  bt_1(j) = 0 \text{ }\text{ }\text{ } 1 \leq j \leq N  \end{align*}


    \begin{align*}  v_t(j) = \max_{i=1}^{N}v_{t-1}(i)a_{ij}b_j(o_t) \text{ }\text{ }\text{ } 1 \leq j \leq N; 1 < t \leq T \\  b_t(j) = arg \max_{i=1}^{N}v_{t-1}(i)a_{ij}b_j(o_t) \text{ }\text{ }\text{ } 1 \leq j \leq N; 1 < t \leq T  \end{align*}


    \begin{align*}  \text{Best score: } P^* = v_{T+1}(q_F) = \max_{i=1}^{N}v_T(i)a_{iF} \\  \text{Start of backtrace: } q_T^* = bt_{T+1}(q_F) = arg \max_{i=1}^{N}v_T(i)a_{iF} \\  \end{align*}

Backtrack Path

    \begin{align*} & q_T* \\ \rightarrow \, & q_{T-1}* = bt_{T}(q_T*) \\ \rightarrow \, & q_{T-2}* = bt_{T-1}(q_{T-1}*) \\ \vdots \\ \end{align*}

Time Complexity

Initialization step runs in O(N). Recursion step is run T \times N times and the summation is over N. Thus, the recursion step runs in O(TN^2). Termination step runs in O(N). The forward algorithm thus reduces the time complexity to O(TN^2).

Leave a Reply

Your email address will not be published. Required fields are marked *