Learning
Given an observation sequence
and the set of possible states in the HMM, learn the HMM parameters
and
.
The standard algorithm for HMM training is the Forward-Backward, or Baum-Welch algorithm. It is an iterative algorithm, computing an initial estimate for the probabilities, then using those estimates to computing a better estimate, and so on, iteratively improving the probabilities that it learns.
Backward Probability
The backward probability
is the probability of seeing the observations from time
to the end, given that we are in state
at time
(and given the HMM
):
The state at
can be any of N states.
Observations
are independent of
and
.
Observation
is independent of
.
Visualization
The computation of
by summing all the successive values
weighted by their transition probabilities
and their observation probabilities
. Hidden states are in circles, observations in squares. Shaded nodes are included in the probability computation for
.
Initialization
Recursion
Termination
Probability 
Define the probability
as the probability of being in state
at time
and state
at time
, given the observation sequence and the model.
The observation
is defined as ![]()
The observation
, emitted at state
, will only depend on the state
. Simplifying
The observations
to
and the state
do not depend on any observation from
to
. Simplifying
The observations
to
do not depend on the state
. Simplifying
Visualization
\clearpage
Computation of the joint probability of being in state
at time
and state
at time
. The figure shows the various probabilities that need to be combined to produce
: the
and
probabilities, the transition probability
and the observation probability
. Hidden states are in circles, observations in squares. Shaded nodes are included in the probability computation for
.
The probability of the observation given the model is simply the forward probability of the whole utterance (or alternatively, the backward probability of the whole utterance):
The state
can be any of
possible states.
Observations
do not depend on observations ![]()
Define the probability
as the probability of being in state
at time
.
Observations
do not depend on observations ![]()
Estimate 
Estimate 
The forward-backward algorithm has two steps: the expectation step, or E-step, and the maximization step, or M-step.
In the E-step, the expected state occupancy count
and the expected state transition count
from the earlier
and
probabilities are computed.
In the M-step,
and
are used to recompute new
and
probabilities.

















