Error detection and correction
If we allow
any number of errors in data bits
and in check bits,
then
no error-detection (or correction) method
can
guarantee to work,
since any valid pattern can be transformed into any other valid pattern.
All methods of error-detection and correction
only work if we assume the number of bits changed by error
is below some threshold.
If all bits can be changed, no error detection method can work
even in theory.
All methods only work below a certain error rate.
Above that rate, the line is simply not usable.
General strategy:
Coding scheme so that a small no. of errors in this block
won't change one legal pattern into
another legal pattern:
-
Frame or codeword length n = m (data) + r (redundant or check bits).
- Any data section (length m) is valid
(we allow any 0,1 bitstring).
- Not every codeword (length n) is valid.
-
Make it so that:
(no. of valid codewords)
is a very small subset of
(all possible 0,1 bitstrings of length n)
so very unlikely that even large no. of errors
will transform it into a valid codeword.
-
Price to pay: Lots of extra check bits (high r).
-
Obviously this works up to some error rate
- won't work if no. of errors is large enough (e.g. = n).
- Error-check says "I will work if less than p errors in this block"
If errors still getting through:
Reduce block size, so will get less errors per block.
- Reduce m.
Reduce the amount of info you transmit before doing error-checking.
This can vastly reduce the probability of multiple errors per block.
-
e.g. Say we have average 1 error per 1000.
Transmit blocks of 10000. Average 10 errors each.
-
Transmit blocks of 10. Average 1 error per 100 blocks.
Almost never 2 errors in a block.
3.2.1 Error-correcting codes
Frame or
codeword length n = m (data) + r (redundant or check bits).
All 2
m patterns are legal.
Not all 2
n patterns are legal.
Basic idea:
If illegal pattern, find the legal pattern closest to it.
That might be the original data (before errors corrupted it).
Given two bitstrings, XOR
gives you the number of bits
that are different.
This is the
Hamming distance.
If two codewords are Hamming distance d apart,
it will take d one-bit errors to convert one into the other.
-
To detect (but not correct)
up to d errors per length n,
you need a coding scheme where
codewords are at least (d+1) apart in Hamming distance.
Then d errors can't change into another legal code,
so we know there's been an error.
- To correct
d errors, need codewords (2d+1) apart.
Then even with d errors, bitstring will be d away from original
and (d+1) away from nearest legal code.
Still closest to original.
Original can be reconstructed.
Error-detection example: Parity bit
Recall
modems.
Appended to data so that no. of 1 bits is even (or odd).
Codeword distance 2.
Can detect (but not correct) 1 error.
Can't detect 2 errors.
Error-correction example: Sparse codewords
Let's say only 4 valid codewords, 10 bits:
0000000000
0000011111
1111100000
1111111111
Minimum distance 5.
Can detect and
correct 1,2 errors.
Can detect but not
correct 3,4 errors.
Can't detect 5 errors.
Note: Can detect some (even most) 5 bit errors.
Just can't guarantee to detect all 5 bit errors.
e.g. Encode every 2 bits this way.
Need 5 times the bandwidth to send same amount of data.
But can communicate even if almost every other bit is an error
(4 out of 10).
Detect and
correct all 1 errors.
Need distance 3.
We need to have 2m legal messages.
If change 1 bit, must get illegal (and an illegal which is 1 bit away from this message,
but not 1 bit away from any other legal message).
So each of the 2m must have n illegal codewords at distance 1 from it
(systematically change each bit).
These are my illegals, can't overlap with the illegals that are 1 bit away from
other patterns.
Each message needs (n+1) patterns reserved for it.
(n+1) 2m <= 2n
(n+1) <= 2n-m
(m+r+1) <= 2r
For large r, this is always true.
i.e. This is putting a limit on how small r can be.
e.g. Let m=64. Then we need:
r+65 <= 2r
Remember powers of 2.
i.e. r >= 7
What block size?
It scales well. Let m=1000. then r=10.
Question is, will 1000 bits only have 1 error?
e.g. Could send 1 M bits, need only 20 check bits
to error-correct 1 bit error!
But if there are 2 errors the system fails.
If errors getting through:
Reduce m until almost never get more than 1 error per block.
Error-detection (and re-transmit) v. Error-correction
Example
Errors isolated. 1 in 10
6.
Block size 1000 bits.
i.e. Error on average 1 bit every 1000 blocks.
-
To do error-correction on 1000 bit block,
need
10 check bits (210=1024).
1 M of data needs overhead of 10,000 check bits.
-
To just error-detect a block with a 1 bit error,
need 1 parity bit.
1 M of data needs 1,000 check bits.
Once every 1000 blocks (1 M), 1 block needs to be re-transmitted (extra 1001).
1 M of data needs overhead 2,001 bits.
Q. For what error rate are they equal?
3.2.2 Error-detecting codes
Parity bit for 1 bit error detection
Parity bit
can detect 1 error.
Won't detect 2.
Will detect 3,
won't detect 4, etc.
If large block has 1 parity bit
and is badly damaged,
the odds of detecting this are only 0.5.
Parity bit for n bit burst error detection
Each block is considered as matrix, width n, height k.
Parity bit calculated for each column.
Last row of parity bits appended:
(parity bit for col 1, col 2, ..., col n)
Transmit n(k+1) block row by row.
-
Any burst of length up to n in the data bits
will leave at most 1 error in each col.
Can be detected.
- If burst is in the parity bits row, they will be wrong
and we will detect there has been an error.
Note: Don't know where error was (this is detection not correction). Will ask for re-transmit
(don't know that data bits were actually ok).
Q. What about burst of length up to n partly in last row of data
and partly in the parity bits row?
For longer bursts, damaging the whole block:
Prob. of any given damaged column having correct parity by chance = 1/2
Prob. of all columns having correct parity by chance = (1/2)
n
Reasonable chance we'll detect it.
(If every parity bit in last line ok, it is prob. because there was no error,
rather than by chance.)
Isolated errors
Isolated errors more difficult:
Any 2-bit error, where errors come at time t and t+n,
will
not be detected.