Jan 19, 2015

Pass-by-value vs. pass-by-const-reference - Part 1

You probably already know the difference between pass-by-value and pass-by-const-reference. However, before proceeding, below follows a brief explanation.

Pass-by-value will result in a copy operation, since the function use a copy of the original input parameter. This can be a very expensive operation, especially for user-defined classes with a lot of members (and base classes).

In a function, which use pass-by-const-reference, no copies are used, the function use the original input parameter (but can not modify it, since it declared const).

This post is actually the first post of two, where I'm comparing the disassembly of pass-by-value code vs. pass-by-const-reference code. In this post, the comparison is done with built-in types.

In the next post, I'm doing the same comparison, but for user-defined classes.

I'm using Visual Studio 2010 Express and building in Debug mode. Why? Because I'm demonstrating very simple examples, which means the compiler will do a lot of optimizations in Release mode. Further, I'm using WinDbg to view the disassembly.

Are you familiar with the calling convention cdecl? This calling convention is used by default for each function you are writing in Visual Studio. What you need to know in this context, is that cdecl will put the arguments on the stack in reversed order. It will be clearer when we are stepping through the examples below.

Okay, let's checkout the example. Below is a very simple program which adds two integers. The addition is done in the function add.

int main()
{
   int a = 3;
   int b = 7;
   int c = add(a, b);

   return 0;
}
 
As mentioned above, I'm going to compare the pass-by-value case with the pass-by-const-reference.

First up is the pass-by-value case. Below is the C++ code and the disassembly of the main function with its stack.

int add(int a, int b)
{
   return a+b;
}

WinDbg - disassembly of main (about to execute the call instruction) and its stack
Okay, let's see what's going on above. The first instruction of interest is the instruction in the red frame. Here we set variable a=3. Next instruction (within the green frame) set the variable b=7.

Then we are ready to execute the add function, i.e. the call instruction. But before making the call, we must put the arguments on stack. Remember that the add function use the cdecl calling convention, which means we pass the arguments to the stack in reversed order, i.e. first push 7 (b), and then push 3 (a).

Also note what's occurs on the stack. The red and green frames is the original values of a and b. The blue and purple frames is the result of the copying, i.e. the push instructions.

Now let's continue with the disassembly of the add function.


WinDbg - disassembly of add function (about to execute the move eax instruction) and its stack
Above, follow the blue and purple lines, and you will see that the add function use the pushed values (i.e. the copies of the original values) to do the calculation.

Next up is the pass-by-const-reference case. Below is the C++ code and the disassembly of the main function with its stack.

int add(int &a, int &b)
{
   return a+b;
}


WinDbg - Disassembly of main function (about to execute the call instruction) and its stack

Above, as in the pass-by-value case, the red and green frame sets the variables a=3 and b=7. But after the initiate of the variables, there is a difference. We push two values to the stack, but not copies of the values, but addresses to the values instead.

The first lea instruction stores the address of 7 (b) in eax, which is then pushed on the stack. The second lea instruction stores the address of 3 (a) in ecx, which is then pushed on the stack.

Then we are ready to execute the add function, i.e. the call instruction.

Also check out the stack. The red and green frames is the original values of a and b (exactly as in the pass-by-value case). The blue and purple frames contains the address of b and a (instead of copies as in the pass-by-value vase).

Now let's continue with the disassembly of the add function.


WinDbg - Disassembly of add function (about to execute the move eax instruction) and its stack
Above, follow the blue and purple lines, and you will see that the add function use the addresses of b and a to do the calculation.

The move eax, dword ptr [ebp+8] instruction, sets eax to the address of a. Next instruction sets eax to the contents of the address which eax points to, i.e. value 3.

Same procedure is done for b, which is then added to 3 (a) in the instruction add eax, dword ptr [ecx].

To summarize, in the pass-by-value case, copies of the values are pushed on the stack. In the pass-by-const-reference case, the addresses of the values are pushed on the stack.

In the next part of comparing pass-by-value and pass-by-const-reference, I will have a look at user-defined classes.

You are welcome to leave comments, complaints or questions!

Jan 7, 2015

SOLID

I bet you have encountered some code smells during your programming career.

Maybe you have made a change (corrected a bug) in a module, which caused a change (introduced a new bug) in another unforseen module?

Maybe you have experienced that a very simple change in your module will force you to do changes in multiple modules?

Even worse, code duplicates you wasn't aware of (you fixed a bug in a function in a module, but the bug was not fully solved due an exact code duplicate of the function in another module)?

There are more code smells around out there, you tell me!

However, using SOLID will help you to prevent some of the code smells. SOLID is a set of five basic OOP principles, that will help you to write quality software.

The SOLID acronym:

S : Single responsibility principle (SRP)
   There should never be more than one reason for a class to change

O : Opened/closed principle (OCP)
   Software entities should be open for extension, but closed for modification

L : Liskov substitution principle (LSP)
   Functions that use pointers or references to base classes must be able to use objects of derived classes without knowing it

I : Interface segregation principle (ISP)
   Clients should not be forced to depend upon interfaces that they do not use

D: Dependency inversion principle (DIP)
   A. High level modules should not depend upon low level modules. Both should depend upon abstractions.
B. Abstractions should not depend upon details. Details should depend upon abstractions


There is a lot of good articles on the net. I have no intention to explain them here. This post is more of a reminder that there are basic principles to follow.

My personal experienced is that you will need some training to actually understand when to use them. Further you will need some time to make it a habit to always consider SOLID when you are writing software.

Generally, I think it is easier to use SOLID when writing new software. To apply SOLID in existing code demands more effort (and time). Of course, it depends on the code you start from.

You are welcome to leave comments, complaints or questions!

Jan 5, 2015

PE file with empty main()

If you build an empty console program, how many bytes is needed for the Portable Executable (PE)? And what's inside the PE file?

In this post I'm experimenting with an empty console program. I'm reducing the PE file, so it just contains the headers and the binary machine code. I'm using Visual Studio 2010 Express and building the PE files in Release mode. Further, I'm using PE Insider from Cerbero and PEBrowserPro from Smidgeonsoft.

Let's compile and link the following empty program below.
int main()
{
   return 0;
}

A program doing nothing. So simple as it can be. Now, let's check out the size of this program, using file properties.

File properties - size of file


Alright, 6144 bytes of binary data is needed for this empty program. Why is this size needed, and what kind of binary data is in there? Let's fire up PE Insider, and first check out the header size, and then the section table.

PE Insider - size of headers

PE Insider - section table

Above, the SizeOfHeaders present the (HEX)size of the headers, and the SizeOfRawData, present the (HEX)size for each section needed on disk.

Let's try to understand the number 6144. This is actually the sum of the size of the headers and the size of the sections. So let's sum it to verify this:

0x400+0x800+0x600+0x200+0x200+0x200 = 0x1800 (DEC: 6144)

Okay, now we understand why the size is 6144. But what's inside all this binary data?

First let's check out the .text section, i.e. the code. This section needs 0x800 bytes. However, my empty program is doing nothing!

PE Insider - .text section

Above, the .text section starts at offset 0x400 (file on disk), and there is a lot of things going on.

Again, the main function is doing nothing. But there is a lot of other code around in the .text section. It's code from the C Runtime library. For instance, the main function is not the first function called when executing the program, the first function called is the mainCRTStartup function. This is a function in the C Runtime library (and part of the PE file). How do I know this? Well, each program has an entry point, which is specified in one of the headers in the PE file. Let's check it out.

PE Insider - Entry point
 
PEBrowserPro - disassemble view of entry point

Okay, the entry point is at Relative Virtual Address (RVA) 0x12A0 (of course within the .text section). Thanks to the disassemble view from PEBrowserPro, we can see what's going on there. At this RVA, the mainCRTStartup is located.

Next test to do; let's tell Visual Studio to use another entry point (i.e. not the mainCRTStartup). We can do this in the Property Pages dialog.


Visual Studio - specifying my own entry point

After compiling and linking, let's check the file properties again.

File properties - size of file

Wow! The file size is reduced! From 6144 bytes to 3072 bytes. Let's check out the section table again.

PE Insider - section table

Comparing to the screenshots above, the .text section is reduced from 0x800 to 0x200, the .rdata section from 0x600 to 0x200, the .data section 0x200 to 0x000. The .rsrc and the .reloc section remain the same size.

So how do the .text section look like now when we specified our own entry point?

PE Insider - .text section (main entry point)


PEBrowserPro - disassemble view - .text section (main entry point)

The only thing going on in the .text section is the return 0 statement. We have managed to get rid of the C Runtime Library code.

Note that there is a lot of 0's in the .text section. This is just zeropadding, so each section can start at a multiple of 0x200.

Now let's continue with the other sections in the PE file. The purpose for the .reloc section, is to help the loader to do some relocation if the executable not is loaded at its preferred load address. If we remove the DYNAMICBASE switch, the .reloc section will be removed. This means that the executable always will be loaded at its preferred load address, and does not need any relocation information. Let's remove the DYNAMICBASE switch.


Visual Studio - Remove dynamicbase

 Next, compile and link again and check out the file properties, and the section table again.

File properties - size of file
PE Insider - section table

Voila! The file size is decreased from 3072 bytes to 3036 bytes. The .reloc section is gone.

Let's continue with the other sections. What's going on in the .rdata section and the .rsrc section?

PE Insider - .rdata section
 
PE Insider - .rsrc section

Thanks to the ASCII view, we can figure out that Visual Studio embed a manifest in .rsrc section. We can also see that there is a PDB path in the .rdata section.

If we want to get rid of this, we just go to the Property Pages and remove the manifest as well as the debug information.


Visual Studio - Removing manifest



Visual Studio - Removing debugging information

Again, compile and link and check out the file size, the section table, and the .rdata section.

File properties - size of file
PE Insider - section table
PE Insider - .rdata section, remaining data

Not much left. File size is reduced from 3036 bytes to 2048 bytes. The .rsrc section is gone. The .text section contains a couple of bytes of binary machine code and the .data section is empty. There remain some data in the .rdata section. I'm not completely sure what this data is. If you know, please tell me. However, it can be removed by saying no to Whole Program Optimization.


Visual Studio - Whole Program Optimization

Compile and link the PE file and let's have a final look at the file properties and the section table.

File properties - size of file


PE Insider - section table

Well, that's about it! The file size is reduced from 2048 bytes to 1024 bytes. The PE file now just contains the headers and the .text section with a couple of binary machine codes.

You are welcome to leave comments, complaints or questions!

Jan 2, 2015

Base relocation table

A Portable Executable (PE) file usually contains some headers and some sections. One of the sections, that may exist, is the .reloc section, and within this section is a base relocation table. The base relocation table is needed to fix up virtual addresses in the PE file if the PE file not was loaded at its preferred load address.

In this post, I will have a look at a base relocation table and see what's going on there. Further, I will discuss how to read the entries in the base relocation table, and what they are doing in the code. I'm using PE Insider from Cerbero, PEBrowserPro from Smidgeonsoft and WinDbg.

Before checking out a .reloc section in a PE file, let's discuss the .reloc section briefly.

The .reloc section contains a serie of blocks. There is one block for each 4 KB page that contains virtual addresses, which is in need for fix ups. Each block contains an IMAGE_BASE_RELOCATION struct and a serie of entries.

The IMAGE_BASE_RELOCATION is defined as below, according to the header file winnt.h.
typedef struct _IMAGE_BASE_RELOCATION {
    DWORD   VirtualAddress;
    DWORD   SizeOfBlock;
//  WORD    TypeOffset[1];
} IMAGE_BASE_RELOCATION;

The VirtualAddress holds a Relative Virtual Address (RVA) of the 4 KB page, which the relocation applies to. The SizeOfBlock holds the size of the block in bytes (including the size of the IMAGE_BASE_RELOCATION struct).

Each entry is a 2-byte value and represent a location (offset within the 4 KB page pointed out by the VirtualAddress member in the IMAGE_BASE_RELOCATION struct), which needs to be fixed up in case of the PE file is not loaded at its preferred load address.

That was some theory. Now let's take a look at a .reloc section from a simple "Hello world!" program. I've compiled and linked it in Visual Studio 2010 Express in Release mode.

#include <cstdio>

int main()
{
   std::printf("Hello world!");
}

Before proceeding, just a quick note. The program above lack the return 0 statement. This does not cause any trouble, the compiler actually add this statement. This can be seen in the disassembly views which follows below (xor eax, eax, ret).

Let's open the "Hello world!" program in PE Insider.

First, let's check out the preferred load address.

PE Insider - Preferred load address
The preferred load address is the ImageBase value, i.e. 0x00400000.

Next, check out the section table.

PE Insider - section table
Apparently there are five sections needed for my simple "Hello world!" program. We are only interested in the .reloc section, so lets check it out.

PE Insider - .reloc section
The .reloc section contains two blocks (within the red border). Since there is 2 blocks, it means that there is 2 4KB pages that contain virtual addresses which is in need for fix ups.

The first member in the IMAGE_BASE_RELOCATION in the first block is the VirtualAddress member, with the value 0x00001000 (green underscored value). The second member is the SizeOfBlock member, with the value 0x00000128 (black underscored value). In order to calculate the number of entries, we must subtract the the size of the IMAGE_BASE_RELOCATION struct from the SizeOfBlock member, i.e. 0x128-0x8 = 0x120 bytes. Further, we know that each entry is a 2-byte value, which gives the number of entries to 0x90 (DEC: 144).

The VirtualAddress value, in the second block, is 0x00002000 (green underscored value) and the SizeOfBlock value is 0x00000020 (black underscored value). Applying same idea as a for the first block, the number of entries is 0xC (DEC: 12).

So what is the meaning of the information above?

Regarding the first block, the loader understand it must do some fix ups in the 4 KB page that starts at RVA 0x1000. As you can see in the section table above, 0x1000 is where the .text section starts. In other words, the loader must fix up 0x90 (DEC: 144) virtual addresses in the .text section, i.e in the code. Note that the .text section is covered in 1 4 KB page.

Regarding the second block, the loader understand it must do some fix ups in the 4 KB page that starts at RVA 0x2000. As you can see in the section table above, this is the .rdata section. In other words, the loader must fix up 0xC (DEC: 12) virtual addresses in the .rdata section. Note that the. rdata section is covered in 1 4 KB page.

So how do the loader know how to fix up the virtual address, and where it should do it?

In order for the loader to know how to fix up the address, it takes the delta value of the preferred load address and the actual load address. The delta value is then added to the virtual address.

In order for the loader to know where to do the fix ups, it check out each entry in the blocks in the .reloc section. As mentioned above, an entry is a 2-byte value. The 4 most significants bits in the value, represents the type of relocation. In the x86 world, the type is always 3 (HIGHLOW), which means that we add the complete 32 bit value of the delta value to the virtual address that is in need for fix up. The remaining bits (12 bits) of the entry, tells us the offset in the 4 KB page (pointed out by the VirtualAddress member in the IMAGE_BASE_RELOCATION struct), i.e. where to do the fix up in the page.

Let's clarify this with an example. As mentioned above, the .text section will have 0x90 fix ups. Let's go in to detail for the first 2 fix ups.

According to the .reloc section above, the first entry is 0x3001. This means that the type is 3 and the offset, in the page, is 0x001. The second entry is 0x3007, which means that the type is 3 and the offset, in the page, is 0x007.

Thanks to these offset values, the loader now knows that it must fix up the virtual address at RVA 0x1000+0x1 = 0x1001, and RVA 0x1000+0x7 = 0x1007.

Below is the .text section on file, and the affected RVA's within the red border. I've also marked them as 1 and 2, cause I will deal with them seperatly in my explanation. I also use them as references between the screenshots.
PE Insider - .text section - HEXVIEW
So above, we see the binary machine code. This does not really tells us so much, so let's disassemble this part using PEBrowserPro.


PEBrowserPro - .text section - disassembled

Thanks to the disassembled view, we can easily see what's occurs on RVA 0x1001 and RVA 0x1007. Note that the view using virtual addresses (with the preferred load address 0x0040000).

Lets start with case 1, i.e the RVA 0x1001.

At this position, we can see the value 0xF4204000. This is a virtual address (0x004020F4, remember little-endian), which needs to be fixed up according to the .reloc section. Actually, it is a virtual address within the .rdata section. The string literal "Hello world!" is put in the .rdata section. In the disassemble view above, we push the address, to the string literal, on the stack. In case of the PE file not will be loaded at its preferred load address, this value 0xF4204000 needs to be fixed up.

Now let's move on to case 2, i.e. the RVA 0x1007.

At this position, we can see the value 0xA0204000. This is a virtual address (0x004020A0, remember little-endian), which needs to be fixed up according to the .reloc section. This is a virtual address within the .rdata section as well. More specific, the virtual address 0x004020A0 is a part of the Import Address Table (IAT). It can be verified by checking the Directories.

PE Insider - Directories

Each entry in the IAT contains virtual addresses to functions in other modules. In this case, the IAT entry at the virtual address 0x004020A0 contains a virtual address to the function printf in the msvcr100.dll module.

So when executing the CALL instruction (see the disassemble view), the processor finds out printf's virtual address by checking the virtual address 0x004020A0. Before load time, we don't know the virtual address of printf yet, because the DLL (msvcr100.dll) has not been loaded into virtual memory.

As you can understand, if the loader does not load the "Hello world!" program at its preferred load address, the .rdata section will not be located at 0x2000, and the entry, which tells us where to find the printf function is invalid. Therefore the loader must add the delta value to 0x004020F4.

Finally, I will just show what's going on in the .text section, when the "Hello world!" program is executed (loaded in memory). To this, I'm using WinDbg. Note that the "Hello world!" program is linked with the DYNAMICBASE switch, i.e. it's Address Space Layout Randomization (ASLR) compatible.

First, let's find out the load address.

WinDbg - Load address

Remember that according to the PE file, the preferred load address is 0x00400000. The actual load address is 0x00250000, which gives us a delta value 0x0040000-0x00250000 = 0x1B0000 (negative difference).

Second, let's check out the .text section


WinDbg - .text section - Disassembled view

As you see above, the virtual addresses differ from the disassembled view in PEBrowserPro, due to the "Hello world!" program is loaded at 0x00250000.

For instance, according to PEBrowserPro, the push instruction pushes the virtual address 0x004020F4 on the stack. Perfectly OK when the program is loaded at its preferred load address 0x00400000.
But in this case, the program is loaded at 0x00250000. The delta value is 0x1B0000 (calculated above). This is a negative difference. In other words, the loader must fix up virtual address 0x4020F4 by reducing it with 0x1B0000. This gives 0x004020F4-0x1B0000 = 0x002520F4. This is exactly what happened in the screenshot above.

You are welcome to leave comments, complaints or questions!