Types of learning
There are many forms of Machine Learning.
Many of the concepts we have learnt can be transferred in some form
to other forms of machine learning.
Alternatives to Supervised Learning
Supervised Learning requires a teacher,
who actually knows what the right answer is
for a large set of exemplars.
This is not how most animal and human learning is done.
With Reinforcement Learning, the program
learns not from being explicitly told the
right answer, but from sporadic rewards and punishments.
e.g. I cannot tell machine what signals to send to its
motors to walk, but I can tell when it has succeeded,
and can say "Good dog!"
Typically, the rewards are numbers.
The program uses various
trial-and-error algorithms to maximise these numeric rewards over time.
e.g.
I do not program the robot soccer player, but I give it 10 points every time it scores a goal,
and minus 10 points when the opposition scores.
It uses these points to "grade" every action in every state.
With Unsupervised Learning, the program is focused not on exemplars
but rather on
dividing up the input space into regions
(classification, category formation).
There must be some sense of one set of category definitions
being better than another.
The basic learning algorithm is simply to learn to
represent the world:
- Input x.
- Run through network to get output y.
- Compare y with x.
- Backpropagate the error.
Simply grouping together inputs is useful
- e.g. Which countries' economies are similar to which?
Should all post-communist economies adopt the same reforms?
Consider 40 dimensional input,
reconstruct 40 dimensional output,
but have to encode it in the middle in just 7 hidden nodes
(not 40 hidden nodes).
What is the encoding?
Which x's are grouped together?
Less hidden units to represent
and reconstruct the input
means a more efficient network.
-
Note this is using a neural network to do
data compression
- store the data in a smaller size.
The output of the hidden layer is the compressed representation of the input data.
- Autoencoder neural network
- There are many fixed, pre-defined data compression algorithms you can use.
- The neural network approach to compression
will learn a method of compression adapted to the data we have.
Imagine if the network consisted of a dedicated
hidden unit for every possible input,
with all weights 1 or 0.
Reconstruction would be perfect.
But this would just be a lookup-table.
Whole idea of a network is a more efficient representation
than a lookup-table,
because we want some predictive ability.
- Facebook's
Seer ("SElf-supERvised") algorithm.
Scrapes a billion unlabelled images from Instagram, deciding for itself which objects look alike.
Sorts them into categories 1 to n (no labels).
-
Later, a human can see at a glance a large numbers of images in category m,
and say category m is "cat".
- In many problems (e.g. 2-D image recognition)
we have the same pattern recognition problem
but it may be "shifted" across the input.
- It is wasteful and slow to learn the same pattern recognition multiple times
(once when pattern in top LHS,
once when pattern in top middle,
once when pattern in top RHS, etc. etc.)
- Convolutional neural network
- Architecture to learn pattern recognition once
and be able to repeatedly shift that recognition across the image / across the inputs.
- Machine Learning Mastery
We have up to now considered
feed-forward networks.
In a
recurrent neural network,
activity may
feed back to earlier layers.
Activity flows around closed loops.
The network may settle down into a stable state,
or may have more complex dynamics.
Much of the brain seems better described by this model.
Continuous streams of input
Some form of
recurrent network model
seems better suited for problems
where we do not have
discrete inputs (like images)
but rather have a
continuous stream of input.
Example: Speech recognition,
where:
- It is not clear where words begin and end in the audio.
- Words are easier to understand in the context of the words that come before and after.
With a recurrent network,
the state of the network encodes information of recent past events,
which may be used to modify the processing of
the current input pattern.
-
e.g. Total input is sensory input at time t:
It
plus output from previous step:
Ot-1
(which itself was the result of running the network on
It-1 and Ot-2, and so on)
-
e.g. The inputs to the network are the sensory inputs
plus the output of the previous time step.
The latter has some influence over how the raw sensory inputs
are interpreted.
It is like the current "expectation" or
"emotional mood" you are in
when you see the input.
It is like an internal state.
Speech to text programs
- How to generate transcript of a video:
- Notta Chrome extension.
- I found this worked quite well.
Lots of editing work still needed.