Learning
Given an observation sequence and the set of possible states in the HMM, learn the HMM parameters and .
The standard algorithm for HMM training is the Forward-Backward, or Baum-Welch algorithm. It is an iterative algorithm, computing an initial estimate for the probabilities, then using those estimates to computing a better estimate, and so on, iteratively improving the probabilities that it learns.
Backward Probability
The backward probability is the probability of seeing the observations from time to the end, given that we are in state at time (and given the HMM ):
The state at can be any of N states.
Observations are independent of and .
Observation is independent of .
Visualization
The computation of by summing all the successive values weighted by their transition probabilities and their observation probabilities . Hidden states are in circles, observations in squares. Shaded nodes are included in the probability computation for .
Initialization
Recursion
Termination
Probability
Define the probability as the probability of being in state at time and state at time , given the observation sequence and the model.
The observation is defined as
The observation , emitted at state , will only depend on the state . Simplifying
The observations to and the state do not depend on any observation from to . Simplifying
The observations to do not depend on the state . Simplifying
Visualization
\clearpage
Computation of the joint probability of being in state at time and state at time . The figure shows the various probabilities that need to be combined to produce : the and probabilities, the transition probability and the observation probability . Hidden states are in circles, observations in squares. Shaded nodes are included in the probability computation for .
The probability of the observation given the model is simply the forward probability of the whole utterance (or alternatively, the backward probability of the whole utterance):
The state can be any of possible states.
Observations do not depend on observations
Define the probability as the probability of being in state at time .
Observations do not depend on observations
Estimate
Estimate
The forward-backward algorithm has two steps: the expectation step, or E-step, and the maximization step, or M-step.
In the E-step, the expected state occupancy count and the expected state transition count from the earlier and probabilities are computed.
In the M-step, and are used to recompute new and probabilities.