Encoding the Genotype
Recall our
rough
Genetic Algorithm
and our
definitions.
We need to fill in some items.
First, how should the genotype be encoded so we can subject it to mutation and crossover?
Binary bitstrings?
Program code?
What is the definition of a random genotype?
A random program?
Binary Chromosome
In nature,
DNA
is a "string" written in an alphabet of 4 "characters"
A, T, G and C (the 4 nucleobases).
Other mechanisms of inheritance
may have existed before DNA,
and other mechanisms may exist elsewhere in the universe.
Simplest chromosome for machine evolution: String of bits 0 or 1.
-
If chromosome is binary string of length n,
this can encode an integer from 0 to 2n-1.
-
Divide by 2n-1
to get real number x in the interval 0 to 1.
-
We can normalise previous number x to real number in any interval a to b
by taking
a + x (b-a).
-
Can decode it to any real number in range
REAL_MIN to REAL_MAX.
- Or could use standard
Floating point
encoding.
Decoding the chromosome is one-way
Decoding the chromosome is a
one-way process.
We
decode a chromosome to a number (or something else).
We never
encode a number as a chromosome.
10010011001100100101111000010100100100001010100110101010101010100101001001010011010111001
|
This genotype can be
interpreted as representing an integer in range INT_MIN to INT_MAX.
Or it can be
interpreted as representing a real number in range REAL_MIN to REAL_MAX.
Or it can be
interpreted as representing text (program code).
etc.
Decoding to minus infinity to plus infinity
Given a real number y from 0 to 1,
we can normalise this to any real number x from minus infinity to plus infinity
by taking the inverse of a very linear
sigmoid function
(which encodes plus/minus infinity to 0-1).
For example setting:
y = 1 / (1 + exp(-x/5))
and solving for x.
Q. An actual straight line is no good for this. Why?
Gray codes
Say we encode numbers as binary:
0 000
1 001
2 010
3 011
4 100
5 101
6 110
7 111
Note that to get from 3 to 4 we need a massive mutation (of all 3 bits).
This can lead to a "barrier" that it is difficult for the GA to pass over.
Gray codes
are an attempt to encode numbers such that to go to the next number
always entails only 1 bit change.
This rule works:
"Start with all 0's for 0.
For each successive number, successively flip the rightmost bit that produces
a new string."
0 000
1 001
2 011
3 010
4 110
5 111
6 101
7 100
You could say now, a single bit change will change from 7 to 0.
But we had before anyway a single bit change from 4 to 0.
And indeed 7 to 3, 8 to 0, etc.
Anyway, a small mutation leading to a large (and probably bad) change
is not such a problem as a small (desirable) continuous change
needing a large mutation.
Jumping barriers
Evolution has a way of "jumping" barriers anyway.
e.g. In original coding, 3 can't get to 4,
but it jumps to 7, and then declines to 4.
3 can get to 4 via 1-bit mutations
in 3 generations.
Or can it? What if 7 is really unfit?
i.e. Fit is 3 and 4 only.
Then the 7 mutations die immediately.
It could get stuck on local minimum, but often you have to be unlucky not to be able to jump barrier somehow.
Bitstrings v. Regular language implementation
Evolving bitstrings is popular,
but not compulsory.
Consider where a system is defined by 100 parameters:
- 20 parameters are real numbers 0 to 1.
- 20 parameters are real numbers 0 to 1000.
- 20 parameters are integers 0 to 1000.
- and so on
You can have two approaches:
- Store in regular language implementations.
- 20 real numbers. Need to define randomise method so they start in 0 to 1. Mutation means (say) plus/minus a real number between 0 and 0.1. Have to check for out of bounds.
- 20 real numbers. Need to define randomise method so they start in 0 to 1000. Mutation means (say) plus/minus a real number between 0 and 100. Have to check for out of bounds.
- 20 integers. Need to define randomise method so they start in 0 to 1000. Mutation means (say) plus/minus an integer between 0 and 100. Have to check for out of bounds.
- and so on
Mutation is flexible:
Different amounts for different parameters.
Crossover probably will not generate new parameters:
Easiest way to define
Crossover
is to take some parameters from one parent, rest of parameters from other parent.
Probably will
not define it to cut in middle of a parameter. (That would take some coding work.)
Will probably rely on mutation for new parameters.
- Store bitstrings.
All parameters, say, stored back-to-back on one bitstring.
Need method to decode a bitstring to the 100 parameters (not easy).
Will involve all the bounds-checking above.
Once we have defined that method, other methods are easy:
- Randomise method - bitstring is random 0s and 1s.
- Mutation method - flip some percentage of the bits.
- Crossover method - start of one bitstring, end of another.
(Maybe multiple cuts.)
Mutation is not flexible:
Same rate for whole genotype.
Bit flip might have more impact on one parameter than on another.
Crossover will often generate new parameters:
Crossover might cut in middle of a parameter.
Get a quite different child parameter.
|
Either approach has something to recommend it.
People evolve many different data structures.