Gaussian Mixture Model
EM algorithm
|
Introduction
The EM algorithm is short for Expectation-Maximization algorithm. It
is based on an iterative optimization of the centers and widths of the
kernels. The aim is to optimize the likelihood that the given data points
are generated by a mixture of Gaussians. The numbers next to the Gaussians
give the relative importance (amplitude) of each component.
Credits
This applet was developed by S. Akaho and adapted by Olivier Michel.
Instructions
- You can select the Gaussian Mixture fitting by choosing "GaussMix",
or Multiple Line fitting by choosing "LineMix".
- Click in the image plane to add sample points.
- You can add 10 uniformly distributed sample points by clicking on
"RandomPts".
- You can also add 100 uniformly distributed sample points lying on a ring by clicking on "RingPts".
- "ClearPts" allows to remove all sample points
- You can choose the number of kernels by selecting 1 to 5
with the second rightmost button.
- You can initialize the kernels by clicking on "InitKernels".
- You can start the EM algorithm by selecting "EM Run"
or "EM 1 Step" (one step)
- You can stop "EM Run" by selecting "EM Stop".
Applet
Questions
- Define about 30 data points located in 3 separate clusters. One of
the cluster may be a little bit larger than the others. Set the number
of kernels to 3. Switch the right-most button to 'EM 1 Step'. Click repeatedly
on this button and watch how the kernels converge to the clusters.
- Keep the same data points and reinitialize the system. You can switch
to 'EM Run', watch how it converges, and the reinitialize again. Repeat
this several times in order to find out how reliably the EM algorithm finds
the 3 clusters.
- Use the same data points, but allow for 5 kernels. Does it still converge
to a reasonable solution? Repeat the experiment several times.
- Add random data points by pressing the random points button once. Restart
the algorithm. Does it still find the original clusters? (Test it several
times). Add more random points by clicking again on the random points button
and repeat the experiment.
- Keep all data points and reduce the number of kernels back to 3. Does
the algorithm still find the clusters?
- Clear the data points and restart with two clusters (each 10 points)
and two kernels. Add more and more points to one of the clusters and watch
how the importance of the respective Gaussian component increases.
Bugs
If Gaussians or Lines disappear by EM, initialize kernels by click "InitKernels".
EM algorithm sometimes stops by Applet exception error in Netscape Navigator
for Windows95 and Solaris2.4 (netscape for SGI is rather stable).