Sep 14, 2020

Encoding CRC-32

CRC, cyclic redundant check, is a method for error detection. In this post, I'm focusing on different implementations of the CRC calulation and more specific the CRC-32 calculation and its variations. Finally, I will investigate a zip file and its CRC calculation. I'm using Visual Studio 2019 Community, HxD editor and WinRar.

Before going into the implementation, I will provide a short overview of the CRC. I have no intention to explain the theory in detail, I leave that to other resources on the web. I'm focusing on CRC-32 since it's used in the zip file.

CRC is a type of cyclic block codes. Cyclic block codes are a type of linear block codes. The CRC itself is a number, typically 8- 16- or 32- bits in size. It is calculated from the message data and a polynom. Typically the CRC bits together with the message data forms a code word, which is modulated and sent over a channel.

Regarding the calculation, it can be described from a mathematical point of view using long division. It can also be described from a LFSR (Linear-feedback shift register) point of view and that is what is described here.

The LFSR can be described as a fixed number of cascaded flip-flops. Each clock tick will shift around the content (0 or 1) of the flip-flops according to the polynom. The polynom specify how many flip-flops, how many XOR gates and where they are in the circuit. We will have a look at the CRC-32 circuit in a moment.

A commonly used CRC-32 polynom, used in Ethernet, PKZIP and other applications, is defined as: 

G(x) = x³² + x²⁶ + x²³ + x²² + x¹⁶ + x¹² + x¹¹ + x¹⁰ + x⁸ + x⁷ + x⁵ + x⁴ + x² + x + 1

The polynom can be written in two forms, forward and reversed. In addition, the MSB can be omitted, since it is always 1.

Forward form:    x²⁶ + x²³ + x²² + x¹⁶ + x¹² + x¹¹ + x¹⁰ + x⁸ + x⁷ + x⁵ + x⁴ + x² + x + 1
Reversed form: 1 + x + x² + x⁴ + x⁵ + x⁷ +  x⁸ + x¹⁰ + x¹¹ + x¹² + x¹⁶ + x²² +  x²³ + x²⁶

For implementation purpose, we can translate the polynom into a binary/hexadecimal number.

Forward form:    00000100110000010001110110110111 (0x04C11DB7)
Reversed form:   11101101101110001000001100100000 (0xEDB88320)

The CRC-32 circuit can be represented as:

CRC-32 circuit

In the circuit above, there are 32 flip-flops and 14 XOR gates. For each clock tick, each flip-flop shift out its content. This means the XOR gate in the top right corner will produce a new LSB value based on the MSB and the incoming message bit.

Let's start the discussion about the CRC-32 implementation by showing an example that simulates the CRC-32 circuit.
uint32_t CalcCrc32_LeftShift(const std::vector<uint8_t>& bytes)
{
  uint32_t crcRegister = 0xFFFFFFFF;

  for (auto byte : bytes) {
    for (int j = 0; j < 8; j++) {
      uint32_t result = ((byte << 24) ^ crcRegister) & 0x80000000;
      crcRegister <<= 1;
      if (result) crcRegister ^= 0x04C11DB7;
      byte <<= 1;
    }
  }

  return ~crcRegister;
}
What's going on here?
Here is the short story. The function takes a vector of bytes (message data) and produce a CRC by doing a MSB shift (left shift) for each bit in the message data and applying the normal CRC-32 polynom when needed.

Here is the long story. The variable uint32_t crcRegister is used to represent the 32 bit LFSR. The crcRegister simulates the cascaded flip-flops, where the MSB of the crcRegister corresponds to flip-flop 31 in the CRC-32 circuit. For each byte, we loop 8 times, which simulates 8 clock ticks, i.e. 8 shifts. We have to shift 8 times since one byte is 8 bits. For each clock tick, the content of crcRegister will be updated. From the CRC-32 circuit, we can observe that we only need to apply the CRC-32 polynom if the output from the XOR gate in the top right corner is 1. I will explain in moment why. By "apply" the forward CRC-32 polynom, I mean perform each XOR operation in the CRC-32 circuit. The output from the XOR gate in the top right corner, is actually represented by the MSB in the variable uint32_t result. So let's say that the MSB in result is 1, then we know for sure that one of the operands in each XOR operations will have the value 1. The other operand in each XOR operation is the value shifted out from the flip-flop, which can be either 0 or 1. So the result of performing each XOR operation may either be 0 or 1. On the other hand if the MSB in result is 0, then we know for sure that one of the operands in each XOR operation will have the value 0. The other operand in each XOR operations is the value shifted out from the flip-flop, which can either be 0 or 1. So the result of performing each XOR operation can always be considered as the value shifted out from the flip-flop, meaning no XOR operation is neeed.

Below is a program, which is calcules the CRC-32 code (CRC-32 BZIP2) from a message data.
int main() {
  std::vector<uint8_t> messageData = {0xDE, 0xAD, 0xBE, 0xEF};
  auto crc32_BZIP2 = CalcCrc32_LeftShift(messageData);
  std::cout << std::hex << crc32_BZIP2 << std::endl;
  return 0;
}
CRC-32 BZIP2 = 0x7e25e5e7 for message data = 0xDEADBEEF

There are details in the function CalcCrc32_LeftShift that I've not discussed yet. These details are related to the name CRC-32 BZIP2, which I will desribe in a moment. Except from the polynom, there are additional parameters that can be variated to obtain slightly differerent CRC-32 codes. I will discuss them below.

The initialization value of the crcRegister is commonly set to 0x00000000 or 0xFFFFFFFF. From an LFSR point of view, the flip-flops are set to 0 or 1.

We may choose to perform a bitwise inversion of the final result. Note that this can be achieved by applying an XOR operation with 0xFFFFFFFF of the final result as well.

There are two more possible parameters. For each byte, we may (or not) reflect the input, i.e the bits in the message data. Further, we may (or not) reflect the output, i.e. the bits in final result after all iterations.

Take a look at the previous program again, which calls the function CalcCrc32_LeftShift. First we have the polynom, which is set to 0x04C11DB7. There is no reflection of the input. The initialization value is set to 0xFFFFFFFF. The MSB in the crcRegister is shifted out and processed together with the MSB in the message bytes. The output is inverted. There is no reflection of the output. All these parameters together gives the specific CRC-32 code called CRC-32 BZIP2 code. 

For further variations of the parameters and its specific CRC-32 code, I recommend the Online CRC calculator.

Let's have a look at another program where the input and output are reflected. The name of this specific CRC-32 code, is simply CRC-32. The bits in each byte must be reflected as well as the final result. In other words, the MSB in the crcRegister is shifted out and processed together with the LSB in the message bytes. Below is an implementation of reflecting data.
template <class T>
T Reflect(T in)
{
  T x = in;
  T out = 0;
  auto n = sizeof(T) * 8;
  for (auto i = 0; i < n; i++) {
    out = out << 1 | (x & 1);
    x >>= 1;
  }

  return out;
}

std::vector<uint8_t> Reflect(const std::vector<uint8_t>& data)
{
  std::vector<uint8_t> result(data.size());
  std::transform(data.begin(), data.end(), result.begin(), 
                 [](auto byte) { return Reflect(byte); });

  return result;
}

int main() {
  std::vector<uint8_t> messageData = {0xDE, 0xAD, 0xBE, 0xEF};
  auto crc32 = Reflect(CalcCrc32_LeftShift(Reflect(messageData)));
  std::cout <<  std::hex << crc32 << std::endl;
  return 0;
}
CRC-32 = 0x7c9ca35a for message data = 0xDEADBEEF

There is a more efficient way to calculate the CRC-32 above by using a right shift implementation instead. In this implemenation the reversed polynom come in handy.
uint32_t CalcCrc32_RightShift(const std::vector<uint8_t>& bytes)
{
  uint32_t crcRegister = 0xFFFFFFFF;

  for (auto byte : bytes) {
    for (int j = 0; j < 8; j++) {
      uint32_t result = (byte ^ crcRegister) & 1;
      crcRegister >>= 1;
      if (result) crcRegister ^= 0xEDB88320;
      byte >>= 1;
    }
  }

  return ~crcRegister;
}

int main() {
  std::vector<uint8_t> messageData = {0xDE, 0xAD, 0xBE, 0xEF};
  auto crc32 = CalcCrc32_RightShift(messageData);
  std::cout <<  std::hex << crc32 << std::endl;
  return 0;
}

CRC-32 = 0x7c9ca35a for message data = 0xDEADBEEF

We can understand the simularity between the reversed input/output left shift version and right shift version. In both cases, we process the MSB in the crcRegister together with LSB in the message bytes.

From above, we can conclude:
CRC-32 BZIP2 = CalcCrc32_LeftShift(messageData) = Reverse(CalcCrc32_RightShift(Reverse(messageData)) 

CRC-32 = CalcCrc32_RightShift(messageData) = Reverse(CalcCrc32_LeftShift(Reverse(messageData))

Before ending this post, I will have a quick look at a compressed file and its CRC code.

I'm using HxD to create a simple file, containing 0xDEADBEEF.

HxD - Uncompressed file

I compress it, using WinRar,  and look into the compressed file using HxD.

HxD - Compressed file

According to Wikipedia - Zip,  we can see that the CRC code is at offset 14 in the file header. This code is what we've recently calculated, i.e. the CRC-32 code.

You are welcome to leave comments, complaints or questions!