Previous, Math 511 Home, Contents, Next

Error Detecting Codes

When humans have to record lengthy identification numbers, the potential for typographical errors is high. A publisher doesn’t want to ship the wrong books because someone typed in one wrong digit from a long list of identification numbers. In such situations, check digits are usually added to the identification number. These check digits can be used to verify that the number is entered correctly. For example, the ISBN number of Theory and Practice of Error Control Codes by Blahut is 0-201-10102-5. In this case, the initial 0 denotes the language (English), the 201 denotes the publisher (Addison-Wesley), and the 10102 is picked by the publisher to denote the particular book. The last digit, 5, is the check digit. It is computed according to the following rule. If the first nine digits are d1d2d3d4d5d6d7d8d9, then the tenth digit is given by the formula
d10 = 1´ d1 + 2´ d2 + 3´ d3 + 4´ d4 + 5´ d5 + 6´ d6 + 7´ d7 + 8´ d8 + 9´ d9 (mod 11)
You can check that this procedure does yield 5 in the given example. You can check that an ISBN number is correct by computing
1´ d1 + 2´ d2 + 3´ d3 + 4´ d4 + 5´ d5 + 6´ d6 + 7´ d7 + 8´ d8 + 9´ d9 + 10´ d10 (mod 11).
Since 10 = -1  (mod 11), the result should be 0. If any single number is miscopied, or if any two numbers are transposed, the result will no longer be 0. When the data entry clerk enters the number, the computer carries out the computation above. If the computation doesn’t come up 0, the computer can beep and the data entry person can re-enter the ISBN number, or perhaps refer to the title of the book if the number has been miscopied onto an order form. One possible problem with this scheme is that since the computation is done mod 11, the check digit could turn out to be 10, rather than a single digit 0-9. The solution to this can be inferred from the fact that the ISBN number of Lie Algebras and Lie Groups by Serre is 0-8053-8703-X.

This is only one of many check digit schemes (though it is the most efficient scheme I know). Credit card numbers, the scanner codes on products, all sorts of identification numbers use various check digit schemes as a means of detecting errors. These schemes all take the same basic form. Carry out some algebraic computation on the digits in the identification number in some algebraic structure. Record the answer as the check digit. If the algebraic structure is well chosen, then any single change in the data will produce a change in the check digit. Of course, several errors might cancel each other out, but you can't expect a single check digit to do everything. The advantage of using some finite algebraic structure is you want the check digit to take on just one of a finite number of possibilities (like 0-9 and X) so you can record it using a single digit. You will want pick a structure where it is easy to check if a number is correct and where several different types of errors (typos, transpositions, etc.) all can be caught, as in the ISBN example.

An early place where check digits were used was in computers. Let’s start with a short review of terminology. Computers work with binary, base 2, numbers. These numbers are built up from Binary digITS, called bits. Each bit is either 0 or 1. We gather 8 bits together into a unit called a byte. 10011010 is a byte. If you change one bit, programs probably won’t work anymore, so it is vital to detect when errors have been made. The earliest approach to catching errors was the use of a parity bit in each byte.

In our (old-fashioned) byte, we will have seven data bits and one parity bit. Suppose our data bits are 1011010. If we are using even parity, we will choose the 8th bit so that there are an even number of 0’s and 1’s in the byte. Since we have four 1’s and three 0’s in our seven data bits, we will choose a 0 for the parity bit (chosen to be the bit on the left) so we end up with even numbers of both 0’s and 1’s in the final eight-bit byte, 01011010. After we receive a byte, we can check that the parity is even. If there are an odd number of 0’s and 1’s in the received byte, we will know an error has been made and can ask for the byte to be retransmitted. This approach will detect any single error in transmitting a byte. Going back to our example, suppose we receive 01111010. We count three 0’s and five 1’s in this byte. Since three and five are odd, we know an error has occurred.

To put this in the same framework as the ISBN check digit scheme, note that if we have data bits b1b2b3b4b5b6b7, then we can write the formula for the parity bit b0 (with even parity) as

b10 = b1 + b2 + b3 + b4 + b5 + b6 + b7 (mod 2)
We then test that the parity is even by computing
b0 + b1 + b2 + b3 + b4 + b5 + b6 + b7 (mod 2).
Since 1 = -1 (mod 2), the answer should be 0.

If you recall when we discussed ASCII encoding, the standard way to encode letters into binary in computer systems was to use the starting three bits 010 for capital letters and 011 for lower case letters, with the letters from A to Z listed in the last five bits from 00001 to 11010 (1 - 26 in binary). Some of you may have wondered why the highest bit was 0 in both cases. The reason is that when ASCII was developed, you only used seven data bits, and the highest order bit was the parity bit. So ASCII used the two highest order data bits to denote what type of character was denoted. Completing the list given above, 11 is primarily lower case letters, 10 is primarily upper case letters, 01 is for other printable characters (0100000 was the space for example) and 00 is for carriage control (0001000 is a tab for example). As the use of one parity bit and seven data bits was replaced by the use of eight data bits during the 80’s, ASCII changed to always having a 0 in the highest order bit for the standard characters, and using bytes starting with a 1 to encode extra symbols, like ¸ and ´.

As noted, computers rarely (if ever) use the original seven data bit with one parity bit method of encoding anymore. Memory and bandwidth (the amount of data transferred per unit time) being valuable, programmers wanted to use all 8 bits for data, and soon enough did so. PCs typically use eight bits for data and have a separate ninth parity bit. As memory circuits have gotten more reliable, it has become cheaper and easier just to stick to eight data bits and no parity bits for the internal operation of machines, which is the case for some PCs today. When transmitting data between different machines, more sophisticated error detecting and error correcting codes are now used. A simple error correcting code, based on parity, will be covered in the next lecture.


Please report any problems with this page to bennett@math.ksu.edu
©1998 Andrew G. Bennett