Information Theory and Coding Exercises

Problem 1: Football and Handball Teams

In an institute with equal numbers of students, there are football teams (11 players) and handball teams (7 players).

  1. Suppose it is known that in a randomly chosen team, no two students were born on the same day of the week. How much uncertainty exists about the sport played by the team? Assume there are exactly 52 weeks in a year.
  2. Repeat the above if it is known that the team consists of students who were born on the same day of the week.

Solution

  1. None. In the football team, there must necessarily be players who were born on the same weekday.
  2. Let E be the event “students who were born on the same day of the week” and let X be the type of team. Then it is easily calculated that…

Problem 2: Discrete Memoryless Source

Suppose the alphabet of a discrete memoryless source consists of the 27 entries of a dictionary. Assume a uniform probability mass function for the source and that we encode the source symbols one by one.

  1. Determine the efficiency of a compact binary encoder.
  2. Determine the efficiency of a compact ternary encoder.

Solution

  1. A compact binary encoding of a uniform source produces codewords whose lengths differ by at most one unit (see problem 10, newsletter 2). In this case, there are 5 codewords of length 4 and 22 codewords of length 5. Therefore, the average codeword length is L = (4 × 5 / 27) + (5 × 22 / 27) ≈ 4.81. The encoding efficiency is η = H2(X) / L ≈ 98.7%. Note that it is necessary to construct the Huffman coding tree.
  2. The efficiency is 1 since the probabilities of the source symbols are integer powers of 3 (the number of elements in the coding alphabet).

Problem 3: Source Alphabet Size

What is the size of the alphabet of a source that generates all symbols with equal probability if, to reliably transmit its messages through an ideal channel with capacity 3 bits, the symbol generation rate cannot exceed the channel transmission rate?

Solution

We must have log2n ≤ 3. Then n ≤ 8.

Problem 4: Binary Code

Let C be the binary code defined by the parity check matrix H:

[Insert Parity Check Matrix H here]

  1. Determine the length, size, and distance of C.
  2. Write a generator matrix for C.
  3. What is the largest integer t such that the code can correct any pattern of up to t errors?
  4. How would you decode the received vector 101010101?

Solution

  1. This is a [9, 3, 3] code. The 6 × 9 matrix H has all its rows linearly independent. It is, in fact, the repetition code (xxx, yyy, zzz) where x, y, z ∈ {0, 1}, and it follows easily that the distance is 3.
  2. For example, G: [Insert Generator Matrix G here]
  3. t = 1. The error-correcting capability of the code is 1.
  4. The nearest codeword is 111000111, which differs from the received vector in one symbol within each triplet of bits. No other codeword has a smaller distance, and the minimum distance is 3, so this would be the decoder’s output.

Problem 5: Binary Parity Code

Consider the simple binary parity code (n, n-1) in which the redundancy is such that the number of non-zero symbols in all codewords is even.

  1. Show that the code is cyclic and find its generator polynomial.
  2. Calculate the probability that no error is detected when transmitting over a binary symmetric channel without memory.

Solution

  1. First, let’s see that the code is linear, which is true because the code consists of all vectors of n bits with even weight, which includes the zero vector and is closed under addition. To see that it is cyclic, it suffices to note that cyclically rotating any codeword preserves its weight (even) and results in another codeword. The generator polynomial is x + 1.
  2. The probability of not detecting an error is: [Insert Formula Here] where A2i = (n choose 2i) is the number of codewords of weight 2i.

Problem 6: ARQ Protocol

Consider a link with the following characteristics: C = 1 Mbit/s, bit error rate ε = 5 × 10-6, and negligible propagation delay. If we use a stop-and-wait ARQ protocol with timeout m = 100, what range of values of n ensures that the effective throughput does not exceed 500 Kbits/s?

Solution

Let the effective throughput (or efficiency) of the protocol be denoted by η(n). The problem poses a simple geometric question: we need to find the solutions, if any, of the equation (1 – k(n + m))(n / (n + m)) = 1/2. If the equation has no real solutions or has only one, then for any value of n, the effective throughput exceeds 500 Kbits/s. But if the equation has two distinct real solutions, say n1 and n2, such that 100 ≤ n1 < n2, then the effective throughput does not exceed 500 Kbits/s for n ∈ [n1, n2].

In this case, the solutions are approximately n1 ≈ 100.5 and n2 ≈ 99,799.5. Therefore, the effective throughput does not exceed 500 Kbits/s for n ∈ [101, 99799].