Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 1: Introduction to Information Theory

The first problem we tackle is how to communicate perfectly over imperfect communication channels.

The running example is this. We have some message which is a sequence of bits. We want to write this data to a noisy hard disk that transmits each bit correctly with probability and correctly with probability . This is called the binary symmetric channel (symmetric as the probablity of flipping from to is the same as to ).

Suppose . This is clearly unacceptable. We want a useful disk drive to flip no bits in say 10 years writing a day, which means we want error probability around or smaller. We solve this problem by introducing communication systems to detect and correct the errors.

The flow is such:

  • Start with message
  • gets encoded into , which is some intermediate representation we design
  • Some noise gets added to which represents noise from the imperfect hard disk
  • This results in which is the received vector
    • The addition is in modulo 2 arithmetic, i.e. and
  • Finally we decode using some algorithm to get from

The Repetition Code

A simple communication system is to repeat each bit times. After noise is added, we decode each bit in the received vector by taking the majority vote. The idea is that when is small, the probability of multiple bits failing independently becomes small. So the majority vote will correct some errors.

Theorem. The majority vote decoding is optimal (i.e. has the lowest probability of error).

Proof. Since each bit is independent, we just need to consider the optimal decision for one bit. Suppose we received bits . The objective is to recover the original bit , which could have been or . Thus, we want: Note that:

  • In the first line, is the random variable representing the unknown true bit
  • is a candidate value that we are choosing (some notation overloading as is also used to represent the true bit above)
  • is treated as a constant, and does not affect choice of

Now, if we denote as the number of bit flips between and , we have: Notice that we are computing the probability for this specific received sequence , hence we do not need to multiply by the number of combinations.

With this formulation, we see that is maximized when the choice of minimizes the number of bit flips (if ). This is because a scenario with few bit flips is more likely than a scenario with many bit flips. Hence, the majority vote is optimal as it minimizes the number of bit flips.

Exercise 1.2. Show that the error probability is reduced by the user of by computing the error probability of this code for a binary symmetric channel with noise level .

Solution. The naive error probability is:

With , the error probability is:

Now we want to show that when , has a better error probability:

Hence we showed that has a better error probability.

Even though has a better error probability, the problem is that our rate of information transfer has fallen by a factor of . To improve our error probability, we can continue to increase repetition at the cost of decreased rate.

Exercise 1.3. Find the probability of error of , the repetition code with repetitions for odd

Assuming , which terms in this sum is biggest? How much bigger than the second largest term?

The largest term is

The second largest term is around times smaller, so the largest term dominates.

Use stirling's approximation to approximate the largest term and find the probability of error.

Using stirling's approximation:

Hence we have probability of error as:

To get error probability less than , we write out the inequality and manipulate to find the lower bound for .

Hamming (7, 4) code

Thus far, we have been trying to encode each bit independently. What if we encode blocks of bits together? Can we get more efficient codes?

A block code converts a sequence of source bits , of length , into a transmitted sequence of length bits.

  • We add some redundancy by making . The extra bits (called parity-check bits) are used to store some redundant information of the original bits.
  • This is usually implemented as some linear function of the original bits
  • In the (7,4) Hamming Code, we transmit bits for every source bits

The encoding can be shown with an example. Suppose .

  • The first 4 bits of transmitted , i.e. just copy the source sequence
  • The next 3 bits are parity-check bits
    • is set such that is even
    • is set such that is even
    • is set such that is even
  • For the example of :

We can see that the Hamming code is a linear code, since the encoding can be written compactly in terms of a matrix-vector multiplication (using modulo-2 arithmetic). Specifically, the transmitted code-vector may be obtained from source-vector using a matrix multiplication:

Where is the generator matrix of the code:

The columns of the generator matrix may be viewed as defining four basis vectors in a seven dimensional binary space. The sixteen codewords are obtained by taking all possible linear combinations of these vectors (in modulo-2 arithmetic). This is a useful perspective:

  • With 7 dimensions, we have 128 possible vectors, out of which only 2^4 = 16 are valid. This means that if we get an invalid vector, we know that it needs to be corrected.
  • The 16 valid vectors are well spread-out in the space (every pair of codewords differs in at least 3 positions), enabling correction

Decoding Hamming Code

Decoding the hamming code follows the same logic as before. Assuming that the channel is a binary symmetric channel and all source vectors are equiprobable, we want to choose a source vector whose encoding differs from received in as few bits as possible. Recall that the theorem we proved earlier shows that we want to maximize .

Naively, we can enumerate all 16 valid hamming codes (each of length 7), and then compare against each and find the closest. This is inefficient as our codes get longer.

The better way is syndrome decoding. A syndrome is defined as the pattern of violations of the parity bits. In other words, it is the vector of "unhappy" parity bits.

Syndome example. Suppose we transmit t = 1000101 and noise flips the second bit, giving us r = 1100101. The syndrome would be z = 110, because the first two parity checks are unhappy and the third is happy.

The syndrome decoding task is thus to find the unique bit that lies inside all the unhappy circles and outside the happy circles. If we can find such a bit, flipping it would account for the observed syndrome.

Matrix Version of Hamming Decoding

We can describe the decoding operation in terms of matrices. Let us define:

Let . Then the syndrome vector is:

Since the part computes the expected parity bit and subtracts it from the actual received parity bits (from the identity part). But because in modulo-2 arithmetic, , so:

It should be clear that for valid transmitted vectors of the form , will be the zero vector / syndrome.

Exercise 1.4. Prove that this is so.

We want to show that is the zero vector for all . Consider . We have:

For compatible matrix blocks (i.e. is defined etc.), we have:

Hence will always be the zero vector.


We close this section of hamming decoding by noting that the received vector is obtained by:

And the syndrome vector is:

Hence, the syndrome decoding problem is to find the most probably noise vector that satisfies . Such a decoding algorithm is called maximum likelihood decoder.

Exercise 1.5 Refer to the (7,4) Hamming code. Decode these received strings: r = 1101011

We have z = 011, so flip , =1100

r = 0110110

We have z = 111, so flip , =0100

r = 0100111

We have z = 001, so flip , =0100

r = 1111111 We have z = 000, so flip nothing, =1111

Exercise 1.6. Calculate the probability of block error of the 7, 4 hamming code as a function of the noise level and show that to leading order it goes as .

The block error is the probability that one or more of the decoded bits in one block fail to match the corresponding source bits.

Assuming that whenever 2 or more bits are flipped in a block of 7 bits, we get a block decoding error, we can derive this as a binomial distribution.

Recall Taylor's expansion:

So for :

The leading term is .

Show that to leading order the probability of bit error goes as .