This applet illustrates the generalization capabilities of the multi-layer
perceptrons. It allows you to define two different sets of data: one for
training and the other for cross-validation. The two sets are necessary
to study generalization in a systematic manner.
Use the popup menu to choose learning points for training or cross-validation.
The graph will display in black the error on the training set and
in green the error on the cross-validation
set.
Applet
Questions
For all questions except the last, leave the decay parameter zero.
Easy problem: Set two simple clusters, a red one (1's) and
a blue one (0's), of training points linearly separable and well distinct.
Then, add cross-validation points in each cluster. To be realistic, the
cross-validation points should be of the same color as training points
in the same cluster. Run learning for about 100 iteration and observe the
resulting error graphs. Could you comment on both errors ?
More complicated problem: Use two similar simple clusters,
but set some cross-validation points a little bit outside the training
clusters. Do you observe any change in the error graphs? Why?
Hard problem: Now, create two linearly separable clusters,
but very close to each other. Create cross-validation points and put some
of the cross-validation points slighly outside the clusters, even inside
the other cluster. Run the learning and comment results. Did you observe
that the error graph reaches a minimum and then rise again ? How would
you explain this?
Non-linearly separable problem: Set 3 blue training points
on the left hand side of the space, 6 red training points in the middle
and 3 blue training points on the right hand side. Add 3 cross validation
points in the first set, 6 in the second and 3 in the last one. Change
the number of hidden units and the learning parameters if neccessary to
obtain the convergence to a null error on the training set. Can you observe
a similar error graph as in question 1 ? Why ?
Getting more and more complicated: Try to solve more complicated
problems (e.g., similar to questions 2 and 3) with non-linearly seperable
clusters.
General questions: How would you characterize the evolution
of the error on a cross-validation set ? How should a training set be designed
in order to get the best results ?
Weight elimination algorithm: As discussed in class,
the decay parameter controls an extra term in the weight update step.
Set the decay parameter to a small value such as .001 and use several (at
least 4) units in the hidden layers. (Don't forget to click Init
each time you change any of the network parameters.) Compare the training
results with standard backprop (decay=0.0).