# Hidden Markov Models Training – The Forward-Backward Algorithm

## Learning

Given an observation sequence and the set of possible states in the HMM, learn the HMM parameters and .

The standard algorithm for HMM training is the Forward-Backward, or Baum-Welch algorithm. It is an iterative algorithm, computing an initial estimate for the probabilities, then using those estimates to computing a better estimate, and so on, iteratively improving the probabilities that it learns.

## Backward Probability

The backward probability is the probability of seeing the observations from time to the end, given that we are in state at time (and given the HMM ):



The state at can be any of N states.



Observations are independent of and .



Observation is independent of .



## Visualization

The computation of by summing all the successive values weighted by their transition probabilities and their observation probabilities . Hidden states are in circles, observations in squares. Shaded nodes are included in the probability computation for .

## Initialization



## Recursion



## Termination



## Probability

Define the probability as the probability of being in state at time and state at time , given the observation sequence and the model.



The observation is defined as



The observation , emitted at state , will only depend on the state . Simplifying



The observations to and the state do not depend on any observation from to . Simplifying



The observations to do not depend on the state . Simplifying



## Visualization

\clearpage



Computation of the joint probability of being in state at time and state at time . The figure shows the various probabilities that need to be combined to produce : the and probabilities, the transition probability and the observation probability . Hidden states are in circles, observations in squares. Shaded nodes are included in the probability computation for .

The probability of the observation given the model is simply the forward probability of the whole utterance (or alternatively, the backward probability of the whole utterance):



The state can be any of possible states.



Observations do not depend on observations




Define the probability as the probability of being in state at time .



Observations do not depend on observations



## Estimate



## Estimate



The forward-backward algorithm has two steps: the expectation step, or E-step, and the maximization step, or M-step.

In the E-step, the expected state occupancy count and the expected state transition count from the earlier and probabilities are computed.

In the M-step, and are used to recompute new and probabilities.