Jul 20, 2016

Representation of data types in memory - Part 2

This is the continuation of the series "Representation of data types in memory". In this part, I will investigate how signed numerical integers are stored in memory. I'm using Windows Vista 32 bit with Microsoft Visual C++ 2010 Express (in Debug Mode) and WinDbg. Note that I'm not considering C++11.

Well, in my previous post, we saw how the unsigned numerical integers were saved in little-endian format in a block of 0xCC. This is of course true for signed numerical integers as well, but signed integers need to take the sign into account as well.

I will start this post in a similar way like my previous one, by using the same simple program, but with signed numerical integers.
int main()
{
   int a = -1;
   int b = -2;
   int c = -3;
   int d = -4;

   return 0;
}
When starting WinDbg and execute all the initializations statements, we see the result below.

WinDbg - Memory view after initializations
As discussed before, in the Memory view, we can see the blue rectangle which indicates the inserted block of 0xCC, which typically is done in Debug mode. The blue arrow shows the offset of the Memory view, which is where the insertion of 0xCC starts. Each integer is also "guarded" by four bytes 0xCC. Now we will focus on how the signed numerical integers are saved in memory. This is done by using the two's-complement representation. What is two's-complement? You can read all about it on the net, but here is a short explanation cited from wikipedia.
"Two's complement is a mathematical operation on binary numbers, as well as a binary signed number representation based on this operation. Its wide use in computing makes it the most important example of a radix complement."
There are a lot of in-depth information on the net how two's-complement conversion is done, here is my short version;

Invert all bits of the (positive) number and add one (+1).

Let's use an actual value from our simple program as an example, for instance;
int a = -1;
Positive number: 110
It's an int, i.e. four bytes in size -> 110 = 0000000116.
Then invert all bits; 0000000116 -> FFFFFFFE16
and then add one (+1); FFFFFFFE16 -> FFFFFFFF16

In other words, the number FFFFFFFF16, represent -1 in two's complement representation.

We can see this number in the Disassembly view, more specific, we can see this instruction;
mov     dword ptr [ebp-8],0FFFFFFFFh
It means that -1 is represented as 0xFFFFFFFF and will be stored at memory address ebp-0x8. Taking little-endian into account, each byte will be saved according to below;

ebp-0x8: 0xFF
ebp-0x7: 0xFF
ebp-0x6: 0xFF
ebp-0x5: 0xFF

Now let's continue with a similar program as in the first part in this series, this time with signed numerical integers.
int main()
{
   char a = -1;
   short int b = -2;
   int c = -3;
   long int d = -4;

   return 0;
}
When starting WinDbg and execute all the initializations statements, we see the result below.
WinDbg - Memory view after initializations

Like the first simple program, the blue arrow and blue rectangle, shows the inserted 0xCC done by the rep stos instruction.
Let's investigate how each type is stored. We know they are stored in two's-complement little-endian format.

Below, I've used following shorthand to denote the two's complement conversion for each type;

Positive number (using correct size) -> Invert all bits -> Added 1

char a = -1;
0116 -> FE16 -> FF16

ebp-0x5: 0xFF
short int b = -2;
000216 -> FFFD16 -> FFFE16

ebp-0x14: 0xFE
ebp-0x13: 0xFF
int c = -3;
0000000316 -> FFFFFFFC16 -> FFFFFFFD16

ebp-0x20: 0xFD
ebp-0x1F: 0xFF
ebp-0x1E: 0xFF
ebp-0x1D: 0xFF
long int d = -4;
0000000416 -> FFFFFFFB16 -> FFFFFFFC16

ebp-0x2C: 0xFC
ebp-0x2B: 0xFF
ebp-0x2A: 0xFF
ebp-0x29: 0xFF

You are welcome to leave comments, complaints or questions!

No comments:

Post a Comment