This paper details how to estimate Gaussian mixtures with the expectation maximization algorithm. We include code and derivation with the introductory concepts.
Abstract
The EM Algorithm has opened previously intractable problems in important application areas to maximum likelihood estimation. But the actual functioning of this estimation technique is difficult to master. We attempt here another intuitive introduction to estimating parameters by iterative Expectation, and Maximization. Estimation of Gaussian Mixture Models from the EM Algorithm is used to clarify the construction of so-called complete data formulations that are used in EM. The GMM re-estimation formulae are derived in detail using only basic multivariable calculus with Lagrange Multipliers, which we hope makes the method more widely accessible to students, scientists, and engineers.
A Gaussian mixture model assumes the overall population is composed of several subpopulations, each of which is Gaussian distributed. Each of the subpopulations is specified by the parameters of a single Gaussian component. The gaussian mixture is a very flexible algorithm for approximating a wide class of distributions, and provides a natural method of clustering the points of a sample. The Expectation Maximization (EM) algorithm has allowed numerically stable computations enabling very large mixture models in high dimensions to be estimated. Mixtures are used for:
Classification: Are there multiple species here?
Clustering: Do these things fall into natural groups?
Estimation with missing data: Big data is never complete, and the EM cycle provides a natural way to estimate mixtures from data sets with missing data. This is shown in a companion paper in this blog series.
Density estimation: What is a good approximation to the Bayesian classifier?
See the PDF for details of the derivations.
Comments