Encoding the Genotype

Recall our rough Genetic Algorithm and our definitions. We need to fill in some items.

First, how should the genotype be encoded so we can subject it to mutation and crossover? Binary bitstrings? Program code?

What is the definition of a random genotype? A random program?

Binary Chromosome

In nature, DNA is a "string" written in an alphabet of 4 "characters" A, T, G and C (the 4 nucleobases). Other mechanisms of inheritance may have existed before DNA, and other mechanisms may exist elsewhere in the universe.

Simplest chromosome for machine evolution: String of bits 0 or 1.

If chromosome is binary string of length n, this can encode an integer from 0 to 2ⁿ-1.
Divide by 2ⁿ-1 to get real number x in the interval 0 to 1.
We can normalise previous number x to real number in any interval a to b by taking a + x (b-a).
Can decode it to any real number in range REAL_MIN to REAL_MAX.
Or could use standard Floating point encoding.

IEEE floating point encoding. (Different implementations use different numbers of bits.)
Graphics showing different IEEE float implementations

Decoding the chromosome is one-way

Decoding the chromosome is a one-way process.
We decode a chromosome to a number (or something else).
We never encode a number as a chromosome.

10010011001100100101111000010100100100001010100110101010101010100101001001010011010111001

This genotype can be interpreted as representing an integer in range INT_MIN to INT_MAX.
Or it can be interpreted as representing a real number in range REAL_MIN to REAL_MAX.
Or it can be interpreted as representing text (program code).
etc.

Linux "file" command to determine/guess file type.
IEEE floating point . (Different implementations have different max and min values.)
Floating point number max and min values in Java

Decoding to minus infinity to plus infinity

Given a real number y from 0 to 1, we can normalise this to any real number x from minus infinity to plus infinity by taking the inverse of a very linear sigmoid function (which encodes plus/minus infinity to 0-1).

For example setting:

  y = 1 / (1 + exp(-x/5))

and solving for x.

Q. An actual straight line is no good for this. Why?

Gray codes

Say we encode numbers as binary:

Note that to get from 3 to 4 we need a massive mutation (of all 3 bits). This can lead to a "barrier" that it is difficult for the GA to pass over.

Gray codes are an attempt to encode numbers such that to go to the next number always entails only 1 bit change. This rule works: "Start with all 0's for 0. For each successive number, successively flip the rightmost bit that produces a new string."

You could say now, a single bit change will change from 7 to 0. But we had before anyway a single bit change from 4 to 0. And indeed 7 to 3, 8 to 0, etc.

Anyway, a small mutation leading to a large (and probably bad) change is not such a problem as a small (desirable) continuous change needing a large mutation.

Jumping barriers

Evolution has a way of "jumping" barriers anyway. e.g. In original coding, 3 can't get to 4, but it jumps to 7, and then declines to 4.

3 can get to 4 via 1-bit mutations in 3 generations.

Or can it? What if 7 is really unfit? i.e. Fit is 3 and 4 only. Then the 7 mutations die immediately. It could get stuck on local minimum, but often you have to be unlucky not to be able to jump barrier somehow.

Bitstrings v. Regular language implementation

Evolving bitstrings is popular, but not compulsory.

Consider where a system is defined by 100 parameters:

20 parameters are real numbers 0 to 1.
20 parameters are real numbers 0 to 1000.
20 parameters are integers 0 to 1000.
and so on

You can have two approaches:

Store in regular language implementations.

20 real numbers. Need to define randomise method so they start in 0 to 1. Mutation means (say) plus/minus a real number between 0 and 0.1. Have to check for out of bounds.
20 real numbers. Need to define randomise method so they start in 0 to 1000. Mutation means (say) plus/minus a real number between 0 and 100. Have to check for out of bounds.
20 integers. Need to define randomise method so they start in 0 to 1000. Mutation means (say) plus/minus an integer between 0 and 100. Have to check for out of bounds.
and so on
Mutation is flexible: Different amounts for different parameters.
Crossover probably will not generate new parameters: Easiest way to define Crossover is to take some parameters from one parent, rest of parameters from other parent. Probably will not define it to cut in middle of a parameter. (That would take some coding work.) Will probably rely on mutation for new parameters.

Store bitstrings.
All parameters, say, stored back-to-back on one bitstring. Need method to decode a bitstring to the 100 parameters (not easy). Will involve all the bounds-checking above.
Once we have defined that method, other methods are easy:

Randomise method - bitstring is random 0s and 1s.
Mutation method - flip some percentage of the bits.
Crossover method - start of one bitstring, end of another. (Maybe multiple cuts.)
Mutation is not flexible: Same rate for whole genotype. Bit flip might have more impact on one parameter than on another.
Crossover will often generate new parameters: Crossover might cut in middle of a parameter. Get a quite different child parameter.

Either approach has something to recommend it.
People evolve many different data structures.