Gaussian Mixture Model
EM algorithm

Introduction

The EM algorithm is short for Expectation-Maximization algorithm. It is based on an iterative optimization of the centers and widths of the kernels. The aim is to optimize the likelihood that the given data points are generated by a mixture of Gaussians. The numbers next to the Gaussians give the relative importance (amplitude) of each component.

Credits

This applet was developed by S. Akaho and adapted by Olivier Michel.

Instructions

  1. You can select the Gaussian Mixture fitting by choosing "GaussMix", or Multiple Line fitting by choosing "LineMix".
  2. Click in the image plane to add sample points.
  3. You can add 10 uniformly distributed sample points by clicking on "RandomPts".
  4. You can also add 100 uniformly distributed sample points lying on a ring by clicking on "RingPts".
  5. "ClearPts" allows to remove all sample points
  6. You can choose the number of kernels by selecting 1 to 5 with the second rightmost button.
  7. You can initialize the kernels by clicking on "InitKernels".
  8. You can start the EM algorithm by selecting "EM Run" or "EM 1 Step" (one step)
  9. You can stop "EM Run" by selecting "EM Stop".

Applet

Questions

  1. Define about 30 data points located in 3 separate clusters. One of the cluster may be a little bit larger than the others. Set the number of kernels to 3. Switch the right-most button to 'EM 1 Step'. Click repeatedly on this button and watch how the kernels converge to the clusters.
  2. Keep the same data points and reinitialize the system. You can switch to 'EM Run', watch how it converges, and the reinitialize again. Repeat this several times in order to find out how reliably the EM algorithm finds the 3 clusters.
  3. Use the same data points, but allow for 5 kernels. Does it still converge to a reasonable solution? Repeat the experiment several times.
  4. Add random data points by pressing the random points button once. Restart the algorithm. Does it still find the original clusters? (Test it several times). Add more random points by clicking again on the random points button and repeat the experiment.
  5. Keep all data points and reduce the number of kernels back to 3. Does the algorithm still find the clusters?
  6. Clear the data points and restart with two clusters (each 10 points) and two kernels. Add more and more points to one of the clusters and watch how the importance of the respective Gaussian component increases.

Bugs

If Gaussians or Lines disappear by EM, initialize kernels by click "InitKernels".
EM algorithm sometimes stops by Applet exception error in Netscape Navigator for Windows95 and Solaris2.4 (netscape for SGI is rather stable).