A detailed look inside the weight matrices of the word2vec model. Both CBOW and SkipGram models are discussed
Continue readingPost Category → Machine Learning
N-gram Language Models
The concept of language models as a sequence of words is explored via probabilistic interpretations.
Continue readingMaximum Likelihood Estimation – finding the best Parametric Model
An intuitive explanation of what likelihood is and how maximizing it gives the best parametric model that fits the data. Few examples are also worked out in detail
Continue readingGradient Descent for a Single Artificial Neuron
This article explore the derivation of the gradient descent algorithm for a single artificial neuron for the logistic activation function. Few properties of the logistic function are also discussed.
Continue readingPrimer on Lambda Calculus
The basic concepts of Lambda Calculus are discussed including functions, applications, the scope of free and bound variables and the order of evaluation. Few examples are worked out.
Continue readingHidden Markov Models Training – The Forward-Backward Algorithm
This article discusses the problem of learning the HMM parameters given an observation sequence and the set of possible states. The concept of backward Probability is defined and an iterative algorithm for learning is presented. Derivations and diagrams are sketched out.
Continue readingHidden Markov Models Decoding – The Viterbi Algorithm
This article discusses the problem of decoding – finding the most probable sequence of states that produced a sequence of observations. A dynamic programming approach is presented. Derivations and diagrams are sketched out and time complexity is analyzed.
Continue readingShort Primer on Probability
A primer on probability explaining the concepts of random variables, joint probabilities, marginalization, conditional probabilities, Bayes Rule, probabilistic inference and conditional independence. Examples and formulae included.
Continue readingHidden Markov Models Likelihood Computation – The Forward Algorithm
This article computes the probability of an observation being output by a given Hidden Markov Model. The brute force method is discussed, followed by a dynamic programming optimization. Derivations and diagrams are sketched out and time complexity is analyzed.
Continue readingd-Separation through examples
This article presents a procedure to check whether two nodes in a Bayesian Network are conditionally independent from each other using the idea of active triples within the path connecting the nodes.
Continue reading