Learning
Given an observation sequence and the set of possible states in the HMM, learn the HMM parameters
and
.
The standard algorithm for HMM training is the Forward-Backward, or Baum-Welch algorithm. It is an iterative algorithm, computing an initial estimate for the probabilities, then using those estimates to computing a better estimate, and so on, iteratively improving the probabilities that it learns.
Backward Probability
The backward probability is the probability of seeing the observations from time
to the end, given that we are in state
at time
(and given the HMM
):
The state at can be any of N states.
Observations are independent of
and
.
Observation is independent of
.
Visualization
The computation of by summing all the successive values
weighted by their transition probabilities
and their observation probabilities
. Hidden states are in circles, observations in squares. Shaded nodes are included in the probability computation for
.
Initialization
Recursion
Termination
Probability 
Define the probability as the probability of being in state
at time
and state
at time
, given the observation sequence and the model.
The observation is defined as
The observation , emitted at state
, will only depend on the state
. Simplifying
The observations to
and the state
do not depend on any observation from
to
. Simplifying
The observations to
do not depend on the state
. Simplifying
Visualization
\clearpage
Computation of the joint probability of being in state at time
and state
at time
. The figure shows the various probabilities that need to be combined to produce
: the
and
probabilities, the transition probability
and the observation probability
. Hidden states are in circles, observations in squares. Shaded nodes are included in the probability computation for
.
The probability of the observation given the model is simply the forward probability of the whole utterance (or alternatively, the backward probability of the whole utterance):
The state can be any of
possible states.
Observations do not depend on observations
Define the probability as the probability of being in state
at time
.
Observations do not depend on observations
Estimate 
Estimate 
The forward-backward algorithm has two steps: the expectation step, or E-step, and the maximization step, or M-step.
In the E-step, the expected state occupancy count and the expected state transition count
from the earlier
and
probabilities are computed.
In the M-step, and
are used to recompute new
and
probabilities.