Math 511 Home, Contents, Next

BCH Codes

One of the important applications of algebraic structures is that we can understand other situations better by recognizing them as examples of algebraic structures. For example, after the development of the Hamming codes in the early 50’s, mathematicians knew from theoretical grounds that better coding systems existed, but they didn’t know how to construct them. In the late 50’s, mathematicians worked out that linear cyclic codes, such as the Hamming code, correspond to ideals in the polynomial ring Z2[x]/(xn + 1). Once this connection was made, it was much easier to find good codes, such as the BCH codes introduced in 1960 by Bose and Chaudhuri and independently by Hocquengham. To set the stage for the BCH codes, we will first establish several simple theorems relating factoring and roots of polynomials, as well as some basic definitions for finite fields.

Theorem: If g(x) | p(x), then every root of g(x) is also a root of p(x).

Proof: g(x) | p(x) means p(x) = g(x)q(x) for some q(x). Any root z of g(x) is by definition a value such that g(z) = 0, from which we immediately deduce p(z) = q(z)g(z) = 0 so z is also a root of p(x).

Proving this theorem also establishes the contrapositive, that if there is a root z such that g(z) = 0 but p(z) ¹ 0, the g(x) doesn’t divide p(x).

Next we establish some definitions for finite fields. We start with a theorem.

Theorem: If F is a field with n elements, and a is a non-zero element of F, then an-1 = 1.

Proof: If F is the field Zp, then this is just Fermat’s theorem, which we’ve already established. But if you go back and look at the proof of Fermat’s theorem, we never used the particular structure of Zp, we just used basic properties about the multiplication table for any field. So the same proof works in this more general case.

Definition: The order of a (non-zero) element a of a field is the smallest m > 0 so that am = 1.

Example: In the field Z7, the non-zero elements are 1, 2, 3, 4, 5, and 6. Their orders are 1, 3, 6, 3, 6, and 2 respectively.

Definition: An element of a field with n elements is called primitive if its order is n-1 (the largest it could be).

Example: In Z7, the primitive elements are 3 and 5.

Definition: Let GF(2n) be the field with 2n elements. The minimal polynomial of an element a of GF(2n) is the polynomial pa(x) in Z2[x] of smallest degree such that pa(a) = 0.

Note that while a is an element of GF(2n), the minimal polynomial is in Z2[x]. Obviously the polynomial of smallest degree with coefficients in GF(2n) that has a as a root is x - a. But since we are restricted to polynomials with coefficients in Z2, we probably need a higher degree polynomial than just a linear term.

Theorem: Let pa(x) be the minimal polynomial of a. Then pa(x) | s(x) if and only if s(a) = 0.

Proof: We proved above that pa(x) may divide s(x) only if s(a) = 0. The interesting fact is that things go the other way; the single condition s(a) = 0 implies the whole polynomial pa(x) | s(x). The proof, as has often been the case lately, depends on the division algorithm in Z2[x]. Suppose s(a) = 0. We have pa(a) = 0 by the definition of minimal polynomial. Now we can write s(x) = q(x)pa(x) + r(x) with deg(r(x)) < deg(pa(x)) and all polynomials in Z2[x]. We can solve for r(x) = s(x) - q(x)pa(x), so r(a) = s(a) - q(a)pa(a) = 0. But pa(x) is the polynomial of smallest degree with a for a root. It follows that r(x) = 0 and pa(x) | s(x).

This has been a fairly quick march across a fair bit of algebraic theory. None of this theory was introduced for coding theory. All these concepts were introduced in the 19th century or earlier. Once the connection between linear cyclic codes and ideals of polynomial rings was established, mathematicians could use these concepts to find good coding systems. We need to pick a generator g(x) for our coding system. We need g(x) | xn + 1 in order to have a linear cyclic code. We want our code to be efficient, with as many data bits in our n-bit messages as possible. This corresponds to picking g(x) of low degree, since the number of data bits we send is n - deg(g(x)). Finally, we want to correct at least one error. An error in the ith position will result in a remainder congruent to xi after division by g(x). In order to be sure we can distinguish single errors, we need xj ¹ xi  (mod g(x)) for all choices of j ¹ i. These ideas lead to the following code.

Example: Given a length 2n - 1, let a be a primitive element of GF(2n). Then the minimal polynomial pa(x) generates an efficient code of length 2n - 1 that corrects single errors.

We check this produces a linear cyclic single-error-correcting code. Since a2^n - 1 + 1 = 0 (by Fermat’s theorem), pa(x) | x2^n - 1 + 1, so pa(x) is the generator of a linear cyclic code. To show that it corrects single errors, we need to show xj ¹ xi  (mod pa(x)) for all j ¹ i, or equivalently that pa(x) doesn’t divide xj + xi (recall + and - are the same since our coefficients are chosen over Z2). Now a is a root of pa(x) but aj + ai = a(aj-i + 1) ¹ 0 by the definition of a primitive element, so a is not a root of xj + xi and hence pa(x) doesn’t divide xj + xi and our code corrects single errors. Finally, since pa(x) is the polynomial of minimal degree that has a for a root, this code should be fairly efficient. In fact, this approach generates the Hamming codes.

So far it doesn’t look like we have gotten anywhere. We had the Hamming codes when we started. But now that we fully understand the algebraic structure that leads to the Hamming codes, we can play similar games with to build linear cyclic codes that correct several errors at a time. These ideas give us the BCH codes. If a is a primitive element, then pa(x)pa^3(x) generates a double-error-correcting code. Similarly, pa(x)pa^3(x)pa^5(x) generates a triple-error-correcting code and so on. We can check these assertions just by using similar (but lengthier) calculations involving roots and factors as we had for the Hamming code. I’m not going to go through all that, though you are welcome to stop by my office sometime and I’ll be glad to show you the details (it takes about 15 minutes).

In this second part of the course, we have seen several examples of "found" algebraic structures. Matrices and differential operators weren’t invented to be examples of non-commutative rings, they were introduced when trying to solve particular problems. But once we understood their underlying algebraic structure, we could use that structure to help solve those problems, with inverse and pseudo-inverse matrices and by understanding which differential commute (basically the constant coefficient operators). In the same way, linear cyclic codes were introduced as a trick of parity, but once the underlying algebraic structure was understood it was possible to construct much more efficient codes. This structure was based on domains and especially Euclidean domains, where the division algorithm gives us the tools we needed. In addition, the ideas needed to develop good codes are an understanding of factoring and roots of polynomials, perhaps the major topic of secondary algebra. When your class asks what factoring is good for, you can answer, in addition to the usual applications, that CDs encode and decode data based on polynomial factoring. We will go over some of the details of the "cross-interleaved Reed-Solomon" code CDs use in the next handout.


Please report any problems with this page to bennett@math.ksu.edu
©1998 Andrew G. Bennett