Adaline, Perceptron
and Backpropagation 
Introduction
Singlelayer neural networks can be trained using various learning algorithms.
The bestknown algorithms are the Adaline, Perceptron and Backpropagation
algorithms for supervised learning. The first two are specific to singlelayer
neural networks while the third can be generalized to multilayer perceptrons.
Credits
The applet was written by Olivier
Michel. This page written by Alix
Herrmann.
Presentation
Let's consider a singlelayer neural network with b inputs and c
outputs:

W_{ij} = weight from input i to unit j in output layer;
W_{j }is the vector of all the weights of the jth neuron
in the output layer.

I^{p} = input vector (pattern p) = (I_{1}^{p},
I_{2}^{p}, ..., I_{b}^{p}).

T^{p} = target output vector (pattern p) = (T_{1}^{p},
T_{2}^{p}, ..., T_{c}^{p}).

A^{p} = Actual output vector (pattern p) = (A_{1}^{p},
A_{2}^{p}, ..., A_{c}^{p}).

g() = sigmoid activation function: g(a ) = [1 + exp (a)]^{1}
Theory
Click on each topic to learn more. Then scroll down to the applet.
Applet
This applet allows you to compare the different learning algorithms.
The network implemented here has two inputs and a single output neuron.
In this tutorial, you will train it to classify 2dimensional data points
into two categories.
Click here to see the instructions.
You may find it helpful to open a separate browser window for the instructions,
so you can view them at the same time as the applet window.
Questions

Ideal case: place 10 red points (class 1) and
10 blue points (0) in two similar, distinct, and linearly separable clusters.

Compare the speed of convergence of the four algorithms. Which one is the
fastest?
 Which values of the learning rate provide the best results ?

Different cluster dispersions: Place 20 red points (1) in
a very narrow cluster (strongly correlated points) and 5 blue points (0)
in a very wide cluster in such a way that the classes are linearly separable.

Compare the performance of the four algorithms on this problem. Which one
is the best?

Which values of the learning rate provide the best results ?

Imperfectly separable case: Place 10 red points to (1) and
10 blue points (0) in two similar, linearly separable clusters. Then,
place an additional blue point inside the red cluster.

Compare the behavior of the perceptron with the behavior of the pocket
algorithm.

Which values for the learning rate give the best results ?

For which kind of problem is the Adaline algorithm the best ?

For which kind of problem is the Backpropagation algorithm the best ?

For which kind of problem is the Perceptron algorithm the best ?

For which kind of problem is the Pocket algorithm the best ?