Miniproject

Artificial Neural Networks, Summer term 2000
Exercises: Mini-project

Assistant: Thomas Stroesslin

The aim of this mini-project is to give you the feeling of a realistic application of artificial neural networks.

As always in supervised learning, you start from a database

{(x^mu, t^mu); mu = 1, ... p }.

You will use a neural network to predict output values t^mu.

The project consists of the following six steps:

Choose a database
Write a program for BackProp with regularization
Split the data set into 3 groups
Optimize the network via the regularization parameter lambda
Measure the performance on the remaining data
Write a short report (2 - 3 pages maximum).

These steps are described below.

Choose a database:

DELVE

p>1000

An overview of the databases is here .

The general philosophy of the DELVE data base is described here.

For the Mini-project, choose one of the following four tasks, for which DELVE provides databases. (for each of the databases, look at the `detailed documentation' to find out what exactly is to be done)

predict the load of a workstation from various performance measures. Specifically, you should use measures 1-9 and 19-21 to predict usr (number 22). That is the task `cpuSmall'.
recognize characters (images of 26 letters of the alphabet)
classify biological data on DNA sequences
predict house prices in the Boston Massachusetts area

Write a program for BackProp with regularization.
Use your code for BackProp/Multilayer perceptron with momentum that you have developed so far.

If you want you can look at the C program which is available at http://lslwww.epfl.ch/~aperez/NN_tutorial/Bkprop.c or at the program bpsim at http://diwww.epfl.ch/mantra/tutorial/english/mlpc/index.html. You may use any programming language you want.

In your program you should compare two out of the following three methods of regularisation:
1. Early stopping.
2. Weight decay.
3. Weight elimination
The penalty term for weight decay is:
penalty = lambda * Sum_i (w_i²)
where the sum runs over all weights in all layers, but not over the thresholds. (Remember that the thresholds are treated as additional weights attached to a bias unit with constant activity x_i = -1. Thus in terms of the learning algorithm the thresholds are treated as further weights. They are, however, not included in the sums of the penalty term.)

Similarly the term for weight elimination is
penalty = lambda * Sum_i (w_i² / c + w_i²)
Again, the sum does not include the thresholds. Take for example c=N where N is the number of weights.
Split the data set into 3 groups

Learning set
Validation set
Set for prediction and final error measure

Don't touch the data in the third group during learning. It is reserved for the final performance measure.

In the Delve database, results will be reported as s function of the number of data you used for 1)+2) together. For example 200 samples for 1) + 2) and the rest for error measures. Or 512 samples in 1) and 2) and the rest for error measures. Check on the result data page for your data base, what numbers you should use. To find this out, click in the column 'view results' of the DELVE summary table .

Optimize the network via the regularization parameter lambda

Suppose you take a total of 512 data points for learning and testing (groups 1 and 2). Use the method of cross-validation in order to optimize lambda. This means: For each value of lambda, split the data points randomly into two groups 1) and 2) of comparable size, run Backprop and record the learning and the validation error. Repeat about ten times with different splits and take the maen. (Thus for each value of lambda, you make several complete learning trials).

Now, if your database is rather small (let us assume 128 data samples for training and validation), you might want to use the systemmatic leave one out cross validation technique to get better results: Take each sample in turn as the validation set, then avearage your 128 error measurements.

TASK: plot the learning error and test error as a function of lambda.

In case of early stopping:

TASK: plot the learning error and test error as a function of the learning time.

Measure the performance on the remaining data

Once you have found the optimal lambda you can retrain your network with this value of lambda You should restart the training process several times to retain the best solution.

Proceed similarly for early stopping: make several repetitions and retain the best solution.

At the very end, use the resulting network in forward mode on part 3) of the data base. This gives you the final prediction error. (Note: once you touch part 3) you are no longer allowed to change parameters or 'improve' the network.)

TASK: Report the performace error of your network and compare it with the performace error of other methods as reported in the DELVE data base.

Click in the column 'view results' of the DELVE summary.

prediction tasks: E = (Sum_mu[t^mu - x^out(mu)]²) / (Sum_mu[t^mu - <t>]²) ; mu = 1, ... p }.

classification tasks: E = (Sum_mu|t^mu - sgn[x^out(mu)]|) / (Sum_mu|t^mu - t_max|) ; where t_max is the biggest class}.

For more information about DELVE's error measurements, see chapter 8 of the Delve User Manual

TASK: Compare the performance of the two regularization methods that you have chosen. Which one is better for your data set?

If you have time, you may also think about whether the difference is significant. How to measure significance is indicated here.

Write a short report (2 - 3 pages maximum).

In the report you should state:

The database you have chosen.
The network structure (number of nodes/layers)
Show the graphs mentioned in step 4 for both regularization procedures.
State your results (step 5)

Instead of writing a report you may also present your results in a short seminar talk.

[Neural Java home page]

Last updated: 10-Apr-00 by Thomas Strösslin