The Neural Network data structure - motivation
From considering a computational approach to learning a function from exemplars,
and considering the problems with a prediction machine based on a distance metric,
we have developed a "wish list" for our data structure.
Neural Networks
are a type of data structure
that address most of this wish list.
They:
- can represent a function f
from real-valued multi-dimensional input
to real-valued multi-dimensional output,
- can learn a general non-linear, continuous, non-chaotic function
- can learn which parts of the input are important and which are not
(can weigh input dimensions differently),
- can remove entirely parts of the input that are not important
(set weight to zero),
- can
build up the representation from scratch based only on seeing exemplars,
- can do this in a fixed-size data structure,
that remains fixed in size as the representation improves,
[Fixed memory cost]
-
can return an answer quickly to a given input
without having to examine every past exemplar
(i.e. the past exemplars are effectively encoded in the current
definition of f
- instead of having a huge list of past exemplars growing without limit),
[Low computational cost]
- can return an answer in a finite time
(in fact, in exactly the same time for all inputs),
[Fixed computational cost]
- can return an answer for inputs never seen before
(i.e. can make predictions),
- represent a behaviour-generating module from the start
(i.e. we don't gather exemplars and then run a statistical analysis on them
- instead we always have a definition of f,
which can return an answer
at any time).
You may have heard of
Neural Networks
because they
have been advertised as "a type of program that can learn".
But they are far more than that.
Traditional programs can learn too.
Neural Networks
have some very specific features (above)
that mean they can be used for this problem.
An example of a problem where we can get many (input,output) exemplar pairs,
but it is hard to figure out the function that relates input to output.
Optical character recognition (OCR)
of printed books should be easier than recognition of handwriting.
Page
of an
18th century book.
A typically inaccurate
OCR transcript
of the above.
Note the points where it has trouble.
- There is nothing to show the program the box where a letter starts and ends.
- The 18th century
"long s"
does not help.
The neural network approach leads to thinking about the kind of "machines" constructed in the brain.
Example
- How do you play chess?
- No.1.
Reasoning. Applying rules. Lookahead search.
Memory-free.
- No.2.
Compare current state to explicit memory of all previous matches.
- Computational cost - Gets worse with experience.
Have to compare to all previous experiences, each time.
Takes increasing amount of time.
- Memory cost -
Grows indefinitely.
Need more and more memory as you have more experience.
Data structure growing indefinitely in size.
- No.3.
Run current state through a "neural machine"
that you have been building for 20 years,
changing its parameters (or "weights")
with every match you play.
You don't remember the matches, but they have left their legacy
in the current set of weights.
- Computational cost -
Input provokes an output after limited amount of computation.
(In neural network, after same amount of computation
for all inputs.)
- Memory cost - More limited.
Machine may grow, but not by as much as when had to remember
all experiences.
(In neural network, data structure is always
same size.
Nothing changes except the values of the weights.)
Human playing chess:
- No.1:
Amateur player. "If I move there, and then he moves there .."
Can only do limited look ahead.
- No.2:
Brain can't really do no.2.
Brain can do extremely truncated no.2 (remember some very memorable moves).
- No.3:
Expert player. "I don't think about the rules.
I just "know" the next move."
Experienced human uses no.1 and no.3.
A no.3 layer generating suggestions for a "sanity check" no.1 layer to analyse.
To make a move, you get 10 ideas (from where?
mysterious subconscious layer)
and run them through your no.1 layer
to see which work.
Computer playing chess:
- No.1: Statespace search.
No use of memory of previous games.
Can do much longer lookahead than human, but still limited.
- No.2:
Can do much better no.2 than human, but still limited.
No escape from growing memory and computational cost.
- No.3:
Can do Neural network type approach.
Computer chess program can do all of these,
with many of the same issues as human doing them.
- Deep Blue
examines 200 million moves a second.
"no.3" in pattern recognition
Similar issues in pattern recognition (like vision).
Do we have explicit memories?
Maybe not.
It is easier to
recognise a familiar person (no.3)
than the no.2 version, which is draw them,
or explain to a
person what they look like
(e.g. police sketch artist).
What could a "no.3" program look like?
Imagine character recognition as a rule-based program:
IF pixel 0 = white AND pixel 1 = black AND ....
OR if pixel 0 = white AND pixel 1 = white AND ....
OR if ...
then return "3"
ELSE if pixel 0 = white AND ...
OR if ....
then return "5"
ELSE if ...
And so on.
At worst this could enumerate all possible images - maybe trillions of lines of code.
Maybe it could reduce those numbers with use of OR statements covering big groups.
But still a massive program.
The neural network will, when finished, embody a function similar in some ways but in fact
even less comprehensible.