Artificial Neural Networks, Summer term 2000 Exercises: Mini-project Assistant: Thomas Stroesslin

The aim of this mini-project is to give you the feeling of a realistic application of artificial neural networks.

As always in supervised learning, you start from a database

{(xmu, tmu);  mu = 1, ... p }.

You will use a neural network to predict output values tmu.

The project consists of the following six steps:

These steps are described below.

1. Choose a database:

2. DELVE hosts a collection of databases. All of these databases have several thousands of entries (p>1000). You can therefore use them for realistic predictions.

An overview of the databases is here .

The general philosophy of the DELVE data base is described here.

For the Mini-project, choose one of the following four tasks, for which DELVE provides databases. (for each of the databases, look at the `detailed documentation' to find out what exactly is to be done)

3. Write a program for BackProp with regularization.

Use your code for BackProp/Multilayer perceptron with momentum that you have developed so far.

If you want you can look at the C program which is available at http://lslwww.epfl.ch/~aperez/NN_tutorial/Bkprop.c or at the program bpsim at http://diwww.epfl.ch/mantra/tutorial/english/mlpc/index.html. You may use any programming language you want.

In your program you should compare two out of the following three methods of regularisation:

1. Early stopping.
2. Weight decay.
3. Weight elimination

The penalty term for weight decay is:

penalty = lambda * Sumi (wi2)

where the sum runs over all weights in all layers, but not over the thresholds. (Remember that the thresholds are treated as additional weights attached to a bias unit with constant activity xi = -1. Thus in terms of the learning algorithm the thresholds are treated as further weights. They are, however, not included in the sums of the penalty term.)

Similarly the term for weight elimination is

penalty = lambda * Sumi (wi2 / c + wi2)

Again, the sum does not include the thresholds. Take for example c=N where N is the number of weights.

4. Split the data set into 3 groups
1. Learning set
2. Validation set
3. Set for prediction and final error measure

Don't touch the data in the third group during learning. It is reserved for the final performance measure.

In the Delve database, results will be reported as s function of the number of data you used for 1)+2) together. For example 200 samples for 1) + 2) and the rest for error measures. Or 512 samples in 1) and 2) and the rest for error measures. Check on the result data page for your data base, what numbers you should use. To find this out, click in the column 'view results' of the DELVE summary table .

5. Optimize the network via the regularization parameter lambda
6. Suppose you take a total of 512 data points for learning and testing (groups 1 and 2). Use the method of cross-validation in order to optimize lambda. This means: For each value of lambda, split the data points randomly into two groups 1) and 2) of comparable size, run Backprop and record the learning and the validation error. Repeat about ten times with different splits and take the maen. (Thus for each value of lambda, you make several complete learning trials).

Now, if your database is rather small (let us assume 128 data samples for training and validation), you might want to use the systemmatic leave one out cross validation technique to get better results: Take each sample in turn as the validation set, then avearage your 128 error measurements.

TASK: plot the learning error and test error as a function of lambda.

In case of early stopping:

TASK: plot the learning error and test error as a function of the learning time.

7. Measure the performance on the remaining data
8. Once you have found the optimal lambda you can retrain your network with this value of lambda You should restart the training process several times to retain the best solution.

Proceed similarly for early stopping: make several repetitions and retain the best solution.

At the very end, use the resulting network in forward mode on part 3) of the data base. This gives you the final prediction error. (Note: once you touch part 3) you are no longer allowed to change parameters or 'improve' the network.)

TASK: Report the performace error of your network and compare it with the performace error of other methods as reported in the DELVE data base.

Click in the column 'view results' of the DELVE summary.

DELVE uses the following formula for final error measurements:

prediction tasks: E = (Summu[tmu - xout(mu)]2) / (Summu[tmu - <t>]2)  ; mu = 1, ... p }.

classification tasks: E = (Summu|tmu - sgn[xout(mu)]|) / (Summu|tmu - tmax|)  ; where tmax is the biggest class}.

TASK: Compare the performance of the two regularization methods that you have chosen. Which one is better for your data set?

If you have time, you may also think about whether the difference is significant. How to measure significance is indicated here.

9. Write a short report (2 - 3 pages maximum).
10. In the report you should state:

• The database you have chosen.
• The network structure (number of nodes/layers)
• Show the graphs mentioned in step 4 for both regularization procedures.
• State your results (step 5)

Instead of writing a report you may also present your results in a short seminar talk.