Hidden Markov Models Likelihood Computation – The Forward Algorithm

Computing Likelihood

Given an Hidden Markov Model defined by \lambda = (A, B) and an observation sequence O, determine the likelihood that the observation sequence was generated by this model, P(O|\lambda).

For a particular hidden state sequence Q = q_1,q_2,\cdots,q_T and an observation sequence O = o_1,o_2,\cdots,o_T, the likelihood of the observation sequence is defined as

    \begin{equation*} \begin{split} & P(O|Q,\lambda) \\ =\, & P(o_1,o_2,\cdots,o_T|q_1,q_2,\cdots,q_T,\lambda) \\ =\, & P(o_1|o_2,\cdots,o_T,q_1,q_2,\cdots,q_T,\lambda)P(o_2,\cdots,o_T|q_1,q_2,\cdots,q_T,\lambda) \\ = & P(o_1|q_1,\lambda)P(o_2,\cdots,o_T|q_1,q_2,\cdots,q_T,\lambda) \\ =\, & P(o_1|q_1,\lambda)P(o_2|o_3,\cdots,o_T,q_1,q_2,\cdots,q_T,\lambda)P(o_3,\cdots,o_T|q_0,q_1,q_2,\cdots,q_T,\lambda) \\ =\, & P(o_1|q_1,\lambda)P(o_2|q_2,\lambda)P(o_3,\cdots,o_T|q_1,q_2,\cdots,q_T,\lambda) \\ \,& \vdots \\ =\, & \prod_{i=1}^{T}P(o_i|q_i,\lambda) \end{split} \end{equation*}

The likelihood of the state sequence is defined as follows. The start state is excluded for now and added at the end.

    \begin{equation*} \begin{split} & P(Q|\lambda) \\ =\, & P(q_1,q_2,\cdots,q_T|\lambda) \\ =\, & P(q_T,q_{T-1},q_{T-2},\cdots,q_1|\lambda) \\ = & P(q_T|q_{T-1},\cdots,q_0,\lambda)P(q_{T-1},q_{T-2},\cdots,q_1|\lambda) \\ =\, & P(q_T|q_{T-1},\lambda)P(q_{T-1},q_{T-2},\cdots,q_1|\lambda) \\ =\, & P(q_T|q_{T-1},\lambda)P(q_{T-1}|q_{T-2},\cdots,q_0,\lambda)P(q_{T-2},\cdots,q_1|\lambda) \\ =\, & P(q_T|q_{T-1},\lambda)P(q_{T-1}|q_{T-2},\lambda)P(q_{T-2},\cdots,q_1|\lambda) \,& \vdots \\ =\, & \prod_{i=1}^{T}P(q_i|q_{i-1},\lambda) \end{split} \end{equation*}

where q_0 is the same as the start state q_s.

Combining the two results above

    \begin{equation*} P(O,Q|\lambda) = P(O|Q, \lambda) \cdot P(Q|\lambda) = \prod_{i=1}^{T}P(o_i|q_i,\lambda) \cdot \prod_{i=1}^{T}P(q_i|q_{i-1},\lambda) \end{equation*}

The total probability of the observations can be computed by summing over all possible hidden state sequences

    \begin{equation*} P(O|\lambda) = \sum_{Q}P(O, Q|\lambda) = \sum_{Q}P(O|Q,\lambda)P(Q|\lambda) = \sum_{Q}\Bigg(\prod_{i=1}^{T}P(o_i|q_i,\lambda) \cdot \prod_{i=1}^{T}P(q_i|q_{i-1},\lambda)\Bigg) \end{equation*}

Time Complexity

A sequence Q = q_1,q_2,\cdots,q_T can have N^T combinations.

TermTime Complexity

Thus the total time complexity will be O(N^T) \times (O(T) + O(T)) = O(TN^T).

Forward Algorithm

The forward algorithm (a kind of dynamic programming algorithm) is an efficient algorithm, that uses a table to store intermediate values as it builds up the probability of the observation sequence. The forward algorithm computes the observation probability by summing over the probabilities of all possible hidden state paths that could generate the observation sequence, but it does so efficiently by implicitly folding each of these paths into a single forward trellis.

\alpha_t(j) represents the probability of being in state j after seeing the first t observations, given the Hidden Markov Model \lambda. \alpha_t(j) is computed by summing over the probabilities of every path that could lead to this state.

    \begin{equation*} \begin{split} \alpha_t(j) =\, & P(o_1,o_2,\cdots,o_t,q_t=j|\lambda) \\ =\, & \sum_{i=1}^{N}P(o_1,o_2,\cdots,o_{t-1},o_t,q_{t-1}=i,q_t=j|\lambda)  \\ =\, & \sum_{i=1}^{N}P(o_t,q_t=j|o_1,o_2…o_{t-1},q_{t-1}=i,\lambda)P(o_1,o_2,\cdots,o_{t-1},q_{t-1}=i|\lambda) \\ = & \sum_{i=1}^{N}P(o_t,q_t=j|o_1,o_2,\cdots,o_{t-1},q_{t-1}=i,\lambda)\alpha_{t-1}(i) \\ =\, & \sum_{i=1}^{N}P(o_t|q_t=j,o_1,o_2…o_{t-1},q_{t-1}=i,\lambda)P(q_t=j|o_1,o_2…o_{t-1},q_{t-1}=i,\lambda)\alpha_{t-1}(i) \\ =\, & \sum_{i=1}^{N}P(o_t|q_t=j,\lambda)P(q_t=j|q_{t-1}=i,\lambda)\alpha_{t-1}(i) \\ =\, & \sum_{i=1}^{N}b_j(o_t)a_{ij}\alpha_{t-1}(i) \\ =\, & \sum_{i=1}^{N}\alpha_{t-1}(i)a_{ij}b_j(o_t) \end{split} \end{equation*}

where \alpha_{t-1}(i) is the previous forward path probability from the previous time step.


Rendered by QuickLaTeX.com

Visualizing the computation of a single element \alpha_t(j) in the trellis by summing all the previous values \alpha_{t-1}(i), weighted by their transition probabilities a_{ij}, and multiplying by the observation probability b_j(o_{t}). Hidden states are in circles, observations in squares. Shaded nodes are included in the probability computation for \alpha_t(j).


    \begin{equation*} \alpha_1(j) = P(o_1,q_1=j,\lambda)  = P(o_1|q_1=j,\lambda)P(q_1=j,\lambda)  = a_{sj}b_j(o_1) \quad \quad 1 \leq j \leq N \end{equation*}


    \begin{equation*} \alpha_t(j) = \sum_{i=1}^{N}\alpha_{t-1}(i)a_{ij}b_j(o_t) \quad \quad 1 \leq j \leq N; 1 < t \leq T \end{equation*}


    \begin{equation*} P(O|\lambda) = \alpha_{t+1}(q_f) = \sum_{i=1}^{N}\alpha_{T}(i)a_{if} \end{equation*}

Time Complexity

Initialization step runs in O(N). Recursion step is run T \times N times and the summation is over N. Thus, the recursion step runs in O(TN^2). Termination step runs in O(N). The forward algorithm thus reduces the time complexity to O(TN^2).

Leave a Reply

Your email address will not be published. Required fields are marked *