School of Computing. Dublin City University.
Online coding site: Ancient Brain
coders JavaScript worlds
But what exactly does "learning" mean?
It must be more than just memory:
Human: The capital of Ireland is Dublin. Machine: OK. Human: The capital of France is Paris. Machine: OK. Human: What is the capital of Ireland? Machine: Dublin. Human: Brilliant.The machine should learn something new.
We have experience of a finite number of situations. We must generalise our experience to a new situation, where we have never been told the answer, and make a good guess, where we are not simply recovering a past answer from memory.
Human: In situation x1, answer is a1. Human: In situation x2, answer is a2. Human: In situation x3, answer is a3. Human: In situation x4, what is answer? Machine: f(x4).where the machine is "learning" the function f.
Each Input-Output pair (xi, ai) that we show to the machine
is called an exemplar.
Learning from exemplars like this is called
supervised learning.
Human: In situation x1, answer is a1. Machine: OK. I am building a function f(x). Currently, f(x) = a1 for all x. Human: In situation x2, answer is a2. Machine: f is becoming a more complex mapping, that gives different results for different x. e.g. Perhaps currently f((x1+x2)/2) = (a1+a2)/2.If we have 2 points so far, we may (so far) assume f is a straight line.
Human: In situation x3, answer is a3. Machine: f is getting more complex.Maybe now f is no longer a straight line.
Human: In situation x4, what is answer? Machine: Using my current definition of f, my answer is f(x4).
If x=0, answer is 0. If x=100, answer is 100. If x=50, answer is 95. If x=25, what is answer?
Another approach (computer science) is to build a machine (algorithm plus data structure) to represent the function.
But some functions defeat this idea. We can quite easily construct functions where nearby input does not lead to nearby output. These are called chaotic functions. These functions may defeat learning from exemplars.
We do not expect that the learning method from exemplars will be able to learn
to represent
a chaotic function
(or
a function in a region where it is behaving chaotically).
To represent the chaotic area the function would have to be learnt
to an impractically (perhaps infinitely) high level of granularity.
However we do hope that our unknown function is not actually perverse enough to be chaotic, at least in some local regions.
f(x) = sin(x) + sin(2x) + sin(5x) + cos(x)Here is the kind of thing we want - a learning method approximating the above function after having seen 1 million exemplars (top) and 5 million (bottom): See details later.
Above we considered some non-linear function f and gave an equation for it.
It is important to note: If we have an equation for f, then we don't need supervised learning or any learning. Just represent f by equation.
Supervised learning from exemplars is for situations where we have an unknown f - we simply postulate that some mapping from our inputs to our outputs must exist.
So why do we often use an explicit f, both above and below?
To show how the algorithm will work.
With explicit f, we can see how the algorithm is doing.
Use explicit f to debug your algorithm.
Then release it on unknown f.
One worry about unknown functions:
Some mappings of Input to Output -
where there is only a disjoint, discrete mapping
of certain inputs to certain outputs,
and in no areas any continuous
(or even approximation of continuous) mapping
- do not lend themselves
to learning, only memorisation.
e.g. Capitals:
Human: The capital of Ireland is Dublin. Machine: OK. Human: The capital of France is Paris. Machine: OK. Human: What is the capital of the UK (the country between Ireland and France)? Machine: Emm, my guess is "Dublaris". Human: What is the capital of Iceland? Machine: Emm, "Dublin"?There may be a function in that every argument has a unique value generated for it. But it is not continuous - no interpolation is possible.
Do we know a priori if our function f is continuous?
i.e. Is there anything to learn?
Another worry about unknown functions:
f is a postulated function
- a function that is assumed to exist,
but whose properties are unknown
(and possibly unknowable).
Imagine trying to predict horse races:
With jockey of weight 8 stone, our horse won the race. With jockey of weight 8.5 stone, our horse lost the race. With jockey of weight 9 stone, our horse won the race. With jockey of weight 8.25 stone, will our horse win or lose? With jockey of weight 8.75 stone, will our horse win or lose?
Are these the correct inputs? Are we missing some inputs? How do we know?
If x=0, answer is 0. If x=100, answer is 100. If x=0, answer is 10. If x=100, answer is 30. If x=100, answer is 100.May be caused by missing inputs.
If input x is multi-dimensional,
how do you build up this function f?
How do you represent f?
f could be a function of the series of inputs x1,x2,...,xn that led up to the current input xn, rather than just a function of the immediate input xn.