Inspecting word2vec Matrices

A detailed look inside the weight matrices of the word2vec model. Both CBOW and SkipGram models are discussed

N-gram Language Models

The concept of language models as a sequence of words is explored via probabilistic interpretations.

Maximum Likelihood Estimation – finding the best Parametric Model

An intuitive explanation of what likelihood is and how maximizing it gives the best parametric model that fits the data. Few examples are also worked out in detail

Continue reading →

Gradient Descent for a Single Artificial Neuron

This article explore the derivation of the gradient descent algorithm for a single artificial neuron for the logistic activation function. Few properties of the logistic function are also discussed.

Continue reading →

Primer on Lambda Calculus

The basic concepts of Lambda Calculus are discussed including functions, applications, the scope of free and bound variables and the order of evaluation. Few examples are worked out.

Continue reading →

Hidden Markov Models Training – The Forward-Backward Algorithm

This article discusses the problem of learning the HMM parameters given an observation sequence and the set of possible states. The concept of backward Probability is defined and an iterative algorithm for learning is presented. Derivations and diagrams are sketched out.

Continue reading →

Hidden Markov Models Decoding – The Viterbi Algorithm

This article discusses the problem of decoding – finding the most probable sequence of states that produced a sequence of observations. A dynamic programming approach is presented. Derivations and diagrams are sketched out and time complexity is analyzed.

Continue reading →

Short Primer on Probability

A primer on probability explaining the concepts of random variables, joint probabilities, marginalization, conditional probabilities, Bayes Rule, probabilistic inference and conditional independence. Examples and formulae included.

Continue reading →

Hidden Markov Models Likelihood Computation – The Forward Algorithm

This article computes the probability of an observation being output by a given Hidden Markov Model. The brute force method is discussed, followed by a dynamic programming optimization. Derivations and diagrams are sketched out and time complexity is analyzed.

Continue reading →

d-Separation through examples

This article presents a procedure to check whether two nodes in a Bayesian Network are conditionally independent from each other using the idea of active triples within the path connecting the nodes.

Continue reading →

The Beard Sage

Post Category → Machine Learning

Inspecting word2vec Matrices

N-gram Language Models

Maximum Likelihood Estimation – finding the best Parametric Model

Gradient Descent for a Single Artificial Neuron

Primer on Lambda Calculus

Hidden Markov Models Training – The Forward-Backward Algorithm

Hidden Markov Models Decoding – The Viterbi Algorithm

Short Primer on Probability

Hidden Markov Models Likelihood Computation – The Forward Algorithm

d-Separation through examples