## Learning

Given an observation sequence and the set of possible states in the HMM, learn the HMM parameters and .

The standard algorithm for HMM training is the *Forward-Backward*, or *Baum-Welch* algorithm. It is an iterative algorithm, computing an initial estimate for the probabilities, then using those estimates to computing a better estimate, and so on, iteratively improving the probabilities that it learns.

## Backward Probability

The backward probability is the probability of seeing the observations from time to the end, given that we are in state at time (and given the HMM ):

The state at can be any of N states.

Observations are independent of and .

Observation is independent of .

## Visualization

The computation of by summing all the successive values weighted by their transition probabilities and their observation probabilities . Hidden states are in circles, observations in squares. Shaded nodes are included in the probability computation for .

## Initialization

## Recursion

## Termination

## Probability

Define the **probability** as the probability of being in state at time and state at time , given the observation sequence and the model.

The observation is defined as

The observation , emitted at state , will only depend on the state . Simplifying

The observations to and the state do not depend on any observation from to . Simplifying

The observations to do not depend on the state . Simplifying

## Visualization

\clearpage

Computation of the joint probability of being in state at time and state at time . The figure shows the various probabilities that need to be combined to produce : the and probabilities, the transition probability and the observation probability . Hidden states are in circles, observations in squares. Shaded nodes are included in the probability computation for .

The probability of the observation given the model is simply the forward probability of the whole utterance (or alternatively, the backward probability of the whole utterance):

The state can be any of possible states.

Observations do not depend on observations

Define the probability as the probability of being in state at time .

Observations do not depend on observations

## Estimate

## Estimate

The forward-backward algorithm has two steps: the *expectation step*, or **E-step**, and the *maximization step*, or **M-step**.

In the E-step, the expected state occupancy count and the expected state transition count from the earlier and probabilities are computed.

In the M-step, and are used to recompute new and probabilities.