Single-layer Neural Networks (Perceptrons)

To build up towards the (useful) multi-layer Neural Networks, we will start with considering the (not really useful) single-layer Neural Network. This is called a Perceptron.

The Perceptron

Input is multi-dimensional (i.e. input can be a vector):

input x = ( I₁, I₂, .., I_n)

Input nodes (or units) are connected (typically fully) to a node (or multiple nodes) in the next layer.

A node in the next layer takes a weighted sum of all its inputs:

Summed input =

Example

input x = ( I₁, I₂, I₃) = ( 5, 3.2, 0.1 )

Summed input = = 5 w₁ + 3.2 w₂ + 0.1 w₃

The rule

The output node has a "threshold" t.
Rule: If summed input ≥ t, then it "fires" (output y = 1).
Else (summed input < t) it doesn't fire (output y = 0).

This implements a function

Obviously this implements a simple function from multi-dimensional real input to binary output. What kind of functions can be represented in this way?

We can imagine multi-layer networks. Output node is one of the inputs into next layer.

Perceptron has just 2 layers of nodes (input nodes and output nodes). Often called a single-layer network on account of having 1 layer of links, between input and output.

Fully connected?

Note to make an input node irrelevant to the output, set its weight to zero. e.g. If w₁=0 here, then Summed input is the same no matter what is in the 1st dimension of the input.

Weights may also become negative (higher positive input tends to lead to not fire).

Some inputs may be positive, some negative (cancel each other out).

The brain

A similar kind of thing happens in neurons in the brain (if excitation greater than inhibition, send a spike of electrical activity on down the output axon), though researchers generally aren't concerned if there are differences between their models and natural ones.

Big breakthrough was proof that you could wire up certain class of artificial nets to form any general-purpose computer.
Other breakthrough was discovery of powerful learning methods, by which nets could learn to represent initially unknown I-O relationships (see previous).

Sample Perceptrons

Perceptron for AND:

2 inputs, 1 output.
w₁=1, w₂=1, t=2.

Q. This is just one example. What is the general set of inequalities for w₁, w₂ and t that must be satisfied for an AND perceptron?

Perceptron for OR:

2 inputs, 1 output.
w₁=1, w₂=1, t=1.

Q. This is just one example. What is the general set of inequalities that must be satisfied for an OR perceptron?

Question - Perceptron for NOT?

What is the general set of inequalities that must be satisfied?

What is the perceptron doing?

The perceptron is simply separating the input into 2 categories, those that cause a fire, and those that don't. It does this by looking at (in the 2-dimensional case):

w₁I₁ + w₂I₂ < t

If the LHS is < t, it doesn't fire, otherwise it fires. That is, it is drawing the line:

w₁I₁ + w₂I₂ = t

and looking at where the input point lies. Points on one side of the line fall into 1 category, points on the other side fall into the other category. And because the weights and thresholds can be anything, this is just any line across the 2 dimensional input space.

So what the perceptron is doing is simply drawing a line across the 2-d input space. Inputs to one side of the line are classified into one category, inputs on the other side are classified into another. e.g. the OR perceptron, w₁=1, w₂=1, t=0.5, draws the line:

I₁ + I₂ = 0.5

across the input space, thus separating the points (0,1),(1,0),(1,1) from the point (0,0):

As you might imagine, not every set of points can be divided by a line like this. Those that can be, are called linearly separable.

In 2 input dimensions, we draw a 1 dimensional line. In n dimensions, we are drawing the (n-1) dimensional hyperplane:

w₁I₁ + .. + w_nI_n = t

XOR

XOR is defined as follows:

input  output
0,0    0
0,1    1
1,0    1
1,1    0

Perceptron cannot do XOR

Need:
1.w₁ + 0.w₂ cause a fire, i.e. >= t
0.w₁ + 1.w₂ >= t
0.w₁ + 0.w₂ doesn't fire, i.e. < t
1.w₁ + 1.w₂ also doesn't fire, < t

w₁ >= t
w₂ >= t
0 < t
w₁+w₂ < t
Contradiction.

Note: We need all 4 inequalities for the contradiction. If weights negative, e.g. weights = -4 and t = -5, then weights can be greater than t yet adding them is less than t, but t > 0 stops this.

A "single-layer" perceptron can't implement XOR. The reason is because the classes in XOR are not linearly separable. You cannot draw a straight line to separate the points (0,0),(1,1) from the points (0,1),(1,0).

Led to invention of multi-layer networks.

Q. Prove can't implement NOT(XOR)
(Same separation as XOR)

Linearly separable classifications

If the classification is linearly separable, we can have any number of classes with a perceptron.

For example, consider classifying furniture according to height and width:

Each category can be separated from the other 2 by a straight line, so we can have a network that draws 3 straight lines, and each output node fires if you are on the right side of its straight line:

3-dimensional output vector.
Problem: More than 1 output node could fire at same time. (Draw your lines. Then find a new point on the "right" side of 2 lines.)

Perceptron Learning Rule

We don't have to design these networks. We could have learnt those weights and thresholds, by showing it the correct answers we want it to generate.

Let input x = ( I₁, I₂, .., I_n) where each I_i = 0 or 1.
And let output y = 0 or 1.

Algorithm is:

Repeat forever:

Given input x = ( I₁, I₂, .., I_n). Perceptron produces output y.
We are told correct output O.
If we had wrong answer, change w_i's and t, otherwise do nothing.

Motivation for weight change rule:

If output y=0, and "correct" output O=1, then y is not large enough, so reduce threshold and increase w_i's. This will increase the output. (Note inputs cannot be negative, so high positive weights means high positive summed input.)
Q. Why not just send threshold to minus infinity?
Q. Or send weights to plus infinity?
Note: Only need to increase w_i's along the input lines that are active, i.e. where I_i=1. If I_i=0 for this exemplar, then the weight w_i had no effect on the error this time, so it is pointless to change it (it may be functioning perfectly well for other inputs).
Similarly, if y=1, O=0, output y is too large, so increase threshold and reduce w_i's (where I_i=1). This will decrease the output.
If y=1, O=1, or y=0, O=0, no change in weights or thresholds.

Hence algorithm is:

Repeat forever:

Given input x = ( I₁, I₂, .., I_n). Perceptron produces output y.
We are told correct output O.
For all i:
w_i := w_i + C (O-y) I_i
t := t - C (O-y)

where C is some (positive) learning rate.
Note the threshold is learnt as well as the weights.
If O=y there is no change in weights or thresholds.
If I_i=0 there is no change in w_i

Example

The sequence of exemplars presented may go like this:

input x network fires or not (y) told correct answer (O) what to do with w's what to do with t

(0,1,0,0) 0 1 increase reduce

(1,0,0,0) 0 0 no change no change

(0,1,1,1) 1 0 reduce increase

(1,0,1,0) 1 0 reduce increase

(1,1,1,1) 0 1 increase reduce

(0,1,0,0) 1 1 no change no change

.... .... .... .... ....

Note same input may be (should be) presented multiple times.

Convergence Proof exists

Convergence Proof - Rosenblatt, Principles of Neurodynamics, 1962.
i.e. Proved that:

If the exemplars used to train the perceptron are drawn from two linearly separable classes, then the perceptron algorithm converges and positions the decision surface in the form of a hyperplane between the two classes.

e.g. In 2 dimensions: We start with drawing a random line. Some point is on the wrong side. So we shift the line. Some other point is now on the wrong side. So we shift the line again. And so on. Until the line separates the points correctly.

Neural Networks - A Systematic Introduction, by Raúl Rojas, 1996.
- Ch.3 - Weighted Networks - The Perceptron
- Ch.4 - Perceptron learning

Reading

"The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain". F. Rosenblatt. Psychological Review. Vol. 65, No. 6, 1958.

input x	network fires or not (y)	told correct answer (O)	what to do with w's	what to do with t
(0,1,0,0)	0	1	increase	reduce
(1,0,0,0)	0	0	no change	no change
(0,1,1,1)	1	0	reduce	increase
(1,0,1,0)	1	0	reduce	increase
(1,1,1,1)	0	1	increase	reduce
(0,1,0,0)	1	1	no change	no change
....	....	....	....	....

Single-layer Neural Networks (Perceptrons)

The Perceptron

Example

The rule

This implements a function

Fully connected?

The brain

Sample Perceptrons

Perceptron for AND:

Perceptron for OR:

Question - Perceptron for NOT?

What is the perceptron doing?

XOR

Perceptron cannot do XOR

Q. Prove can't implement NOT(XOR) (Same separation as XOR)

Linearly separable classifications

Perceptron Learning Rule

Example

Convergence Proof exists

Reading

Q. Prove can't implement NOT(XOR)
(Same separation as XOR)