Supervised learning in a single-layer neural network

Let's consider a single-layer neural network with b inputs and c outputs:

W_ij = weight from input i to unit j in output layer; W_jis the vector of all the weights of the j-th neuron in the output layer.
I^p = input vector (pattern p) = (I₁^p, I₂^p, ..., I_b^p).
T^p = target output vector (pattern p) = (T₁^p, T₂^p, ..., T_c^p).
A^p = Actual output vector (pattern p) = (A₁^p, A₂^p, ..., A_c^p).
g() = sigmoid activation function: g(a ) = [1 + exp (-a)]^-1

Supervised learning

We have seen that different weights of a neural network produce different functions of the input. To train a network, we can present some sample inputs and compare the actual output to the desired results. The difference is called the error. [an error term is computed and fed back]

[an error term is computed and fed back]

The different learning rules tell us which way to adjust the weights to reduce this error. We say that training has converged when this error reaches some small, acceptable level.

Often the learning rule takes the following form:
W_ij (t+1) = W_ij (t) + eta . err (p)
where 0 <= eta < 1 is a parameter that controls the learning rate, and err(p) is the error when input pattern p is presented.

[Back to the Adaline/Perceptron/Backprop applet page]

Adaline learning

ADALINE is an acronym for ADAptive LINear Element (or ADAptive LInear NEuron). It was developed by Bernard Widrow and Marcian Hoff (1960).

The adaline learning rule (also known as the least-mean-squares rule, the delta rule, and the Widrow-Hoff rule) is a training rule that minimises the output error using (approximate) gradient descent. After each training pattern I^p is presented, the correction to apply to the weights is proportional to the error. The correction is calculated before the thresholding step, using err_ij (p)=T^p-W_ij I^p:

Thus, the weights are adjusted by

W_ij (t+1) = W_ij (t) + eta (T^p-W_ij I^p) (I^p)
This corresponds to gradient descent on the quadratic error surface, E_j=Sum_p [T^p - W_j ^. I^p] ²

[Back to the Adaline/Perceptron/Backprop applet page]

Perceptron learning

In perceptron learning, the weights are adjusted only when a pattern is misclassified. The correction to the weights after applying the training pattern p is
W_ij (t+1) = W_ij (t) + eta (T^p- A^p) (I^p)
This corresponds to gradient descent on the error surface E (W_ij )= Sum_{misclassified} [W_ij (A^p)(I^p)].

[Back to the Adaline/Perceptron/Backprop applet page]

Pocket algorithm

The perceptron learning algorithm does not terminate if the learning set is not linearly separable. In many real-world cases, however, we want to find the "best" linear separation even when the learning sets are not ideal. The pocket algorithm is a modification of the perceptron rule proposed by S. I. Gallant (1990). It stores the best weight vector so far in a "pocket" while continuing to learn. The weights are actually modified only if a better weight vector is found.

[Back to the Adaline/Perceptron/Backprop applet page]

Backpropagation

The backpropagation algorithm was developed for training multilayer perceptron networks. In this applet, we will study how it works for a single-layer network. It was popularized by Rumelhart, Hinton and Williams (1986), although similar ideas had been developed previously by others (Werbos, 1974; Parker, 1985). The idea is to train a network by propagating the output errors backward through the layers. The errors serve to evaluate the derivatives of the error function with respect to the weights, which can then be adjusted.

The backpropagation algorithm for a single-layer network using the sum-of-squares error function consists of two phases:

Feedforward - apply an input; evaluate the activations a_j and store the error delta_jat each node j

a_j

Sum _i(W_ij (t) I^p_i)

A^p_j = g (a_j

delta_j = A^p_j -I^p_j

Backpropagation - compute the adjustments and update the weights. Since there is just one layer, the output layer, we compute

W_ij (t+1) = W_ij (t) - eta delta_i I^p_j

[Back to the Adaline/Perceptron/Backprop applet page]

Supervised learning in a single-layer neural network

Supervised learning

Adaline learning

Perceptron learning

Pocket algorithm

Backpropagation

Further reading