Dec 21, 2014

Do you know where the stack begin?

There is a lot of information on the net, showing the concepts of a stack data structure. I bet you know how a stack works. I also bet you are familiar with the registers esb and esp. You have probably seen them got some funny values in your favorite debugger. But do you know where the stack begin?

In this post, I will point out the user-mode stack's start address, using WinDbg, for a single threaded Win32 console program "Hello world!", written in Visual Studio 2010 Express, using Windows Vista 32 bit.

So, let's check out my "Hello world" program.
#include <cstdio>

int main()
{
   std::printf("Hello world!");

   return 0;
}
No surprise. It is a classical "Hello world!" program. It is also a single threaded Win32 console program.

If you didn't know it, there are user-mode threads and kernel-mode threads. Each thread has its own stack. The user-mode thread has its own stack and kernel-mode thread has its own stack. In this post, we only focus on the user-mode stack. In the "Hello world!" program, there is only one thread, i.e. one user-mode stack. There is something called Thread Environment Block (TEB, also called TIB), which contains information about the thread. Each thread has its own TEB, so for a multithreaded program, there are a lot of TEB's around. You can find the TEB (TEB's) in the process virtual memory space, in the user-mode part of the memory. The information within the TEB, tells us where the stack begins for this particular thread.

Well, le'ts not go into detail of the TEB. We want to check out the information in the TEB, and that's very easy in WinDbg.

Let's fire up WinDbg and open the "Hello world!" program (note that it is built in Release mode). In the prompt, I've used the !teb command, which displays the contents of the current thread. As mentioned before, there is only one thread running in this program.

WinDbg - debug break after loading modules


Above, the StackBase tells us that the stack begins at the virtual address 0x00230000. You can also note that the esp value, during this first debug break, is 0x0022f694. Let's take the difference between these two values (remember that the stack grows to lower addresses). The difference is 0x96C (DEC: 2412), which means that there are already 2412 bytes on the stack when the first debug break was executed.

As you can see in the screenshot above, I've also added a breakpoint in the main function. I've done that so we can see the stack status before executing the printf.

Let's continue the execution of the program.

By writing g in the WinDbg prompt, we continue the execution. Below follows a set of screenshots, showing what's going on when we break in the main() function.

WinDbg - diassembly view - before executing push instruction


WinDbg - Register view - before executing push instruction


Above, in the Disassembly view, we are about to execute the push instruction. This means that we are about to push the offset (virtual address 0x00e520f4 in this case) to the string literal "Hello world!", i.e. push a 4 byte value. The string literal itself is within the .rdata section of the PE file (helloworld.exe).

As you can see in the Register view, the esp value is 0x22f3e4. Taking the difference to the StackBase, we get 0xC1C (DEC: 3100), i.e. when entering main(), there are already 3100 bytes on the stack.

Before ending this post, let's check out another execution of the "Hello world!" program via WinDbg.

WinDbg - debug break after loading modules (another execution)
Apparently, the StackBase got another value, 0x00160000. It can also be mentioned that the "Hello world!" program is ASLR compatible. I guess that's the reason why we get another StackBase. Maybe it will be a discussion for another post.

It is probably not very helpful to know the start address of the stack in your daily programming activities, but it always fun learn something new.

Dec 13, 2014

INT in notepad.exe

The Import Name Table (INT) contains information how to fill the Import Address Table (IAT) during load time, with appropriate virtual addresses, i.e. virtual addresses to functions in other modules (but still within the same virtual address space). I will explain where to find the INT in the file on disk (and not in memory) and also what kind of information is within the INT. I'm using notepad.exe as example and Windows Vista 32 bit.

So where do we find the INT?

To find the INT, we must start with the headers in the Portable Executable (PE) file (notepad.exe). Within one of the headers, you will find a Directory Table, containing directories. In the case of notepad.exe, there is a directory called Import Directory, which we will inspect in more detail. Using PE Insider from Cerbero, we can easily see the Directory Table.

PE Insider - Directory Table

Above you see three directories. The Export Directory is the first one appearing in the PE file, then follows the Import Directory and the Resource Directory. There are more directories in the PE file, but explaining the directories is not within the scope of this post.

There are a lot of good information about the structures in the Import Directory on the net. I will discuss it briefly here.

The Import Directory holds a Relative Virtual Address (RVA), which point to an array of IMAGE_IMPORT_DESCRIPTOR's. Above, you can find the RVA on the Import Directory RVA row and the right most column, i.e. 0x00008BF4.

The IMAGE_IMPORT_DESCRIPTOR is defined as below, according to the header file winnt.h.
typedef struct _IMAGE_IMPORT_DESCRIPTOR {
    union {
        DWORD   Characteristics;            
        DWORD   OriginalFirstThunk;         
    } DUMMYUNIONNAME;
    DWORD   TimeDateStamp;                  
    DWORD   ForwarderChain;                 
    DWORD   Name;
    DWORD   FirstThunk;                     
} IMAGE_IMPORT_DESCRIPTOR;

The OriginalFirstThunk holds a RVA, which point to an array of IMAGE_THUNK_DATA's. So does the FirstThunk.

It can also be mentioned that the Name member in the IMAGE_IMPORT_DESCRIPTOR holds a RVA, which points to an Ascii string, which is the name of the DLL, e.g. kernel32.dll.

The IMAGE_THUNK_DATA is defined as below, according to the header file winnt.h.

typedef struct _IMAGE_THUNK_DATA32 {
    union {
        DWORD ForwarderString;       
        DWORD Function;             
        DWORD Ordinal;
        DWORD AddressOfData;        
    } u1;
} IMAGE_THUNK_DATA32;

The IMAGE_THUNK_DATA struct represents an imported function from a PE file. In the PE file, the IMAGE_THUNK_DATA is either an Ordinal value or an AddressOfData value. The latter value is a RVA, which points to an IMAGE_IMPORT_BY_NAME struct. How do the loader knows if the value in the IMAGE_THUNK_DATA is an Ordinal value or an AddressOfData value? The loader checks the most significant bit in the value. If the bit is 0, the value is an AddressOfData value, if the bit is 1, the value is an Ordinal value.

The IMAGE_IMPORT_BY_NAME is defined as below, according to the header file winnt.h.
typedef struct _IMAGE_IMPORT_BY_NAME {
    WORD    Hint;
    BYTE    Name[1];
} IMAGE_IMPORT_BY_NAME, *PIMAGE_IMPORT_BY_NAME;

We are now ready to find the INT in file on disk, but first we must find the array of IMAGE_IMPORT_DESCRIPTOR's. We can't really use the RVA 0x00008BF4 directly, since the RVA only can be used when the PE file is loaded into memory. We must in some way translate the RVA to an Offset in the PE file (notepad.exe). I will present a concept here, that I call RVA-to-Offset translation.

First we check where the sections will be loaded in memory, more specific, let's check the section table in the PE file (notepad.exe). See below.

PE Insider - Section Table

We can see that the .text section will be loaded at RVA 0x00001000 with section size 0x00008F40 bytes, and the next section (.data section) will be loaded at RVA 0x0000A000. The array of IMAGE_IMPORT_DESCRIPTOR's starts at RVA 0x8BF4, i.e. within the .text section. This means that the array is 0x00008BF4 - 0x00001000 = 0x00007BF4 bytes from the start of the .text section. On disk, the .text section starts at Offset 0x400 according to the PointerToRawData column. So the array of IMAGE_IMPORT_DESCRIPTOR's on disk, can be found at Offset 0x400+0x00007BF4 = 0x7FF4.

Now when we have translated the RVA 0x00008BF4 to Offset 0x7FF4 in file on disk, we are going to check out what's going on at this location.

Below is a HEXVIEW of the IMAGE_IMPORT_DESCRIPTOR array, name strings (DLLs) and IMAGE_THUNK_DATA arrays. We are going to interpret the HEXVIEW byte by byte, so remember that x86 is using little-endian architecture!

PE Insider - HEXVIEW

The array of IMAGE_IMPORT_DESCRIPTOR's (within the black border) starts at Offset 0x7FF4 (as calculated above) and the first member in the first struct is the OriginalFirstThunk with the value 0x00008DB0 (first underscored value).

For an easier overview above, I have underscored each OriginalFirstThunk member, in each IMAGE_IMPORT_DESCRIPTOR. Counting elements in the IMAGE_IMPORT_DESCRIPTOR array, gives us 14 descriptors. Note that the last one is a NULL Descriptor that marks the end of the array. Each descriptor represents a DLL and its imported functions. This means that notepad.exe uses 13 DLLs. For instance, let's check the first descriptor and its name member. The name member is the fourth member in the IMAGE_IMPORT_DESCRIPTOR struct. At this location, we find the value 0x00008D0C, which is a RVA, which points to the name string. Using the RVA-to-Offset Translation, the RVA 0x00008D0C is translated to Offset 0x810C. This Offset is located directly after the IMAGE_IMPORT_DESCRIPTOR array above, and contains the Ascii string advapi32.dll (and 2 NULL terminations).

Well, back to the OriginalFirstThunk. As mentioned above the first OriginalFirstThunk has the value 0x00008DB0. This is a RVA, which points to an array of IMAGE_THUNK_DATA's. Using the RVA-to-Offset Translation, the RVA 0x00008DB0 will be translated to the Offset 0x81B0. At this location (see HEXVIEW above), we find an array (within the red border). Actually, this array is the Import Name Table (INT) that we was looking for in the first place. In this case, the INT (for the first DLL), contains 6 elements, where the last element is a NULL Data element. In other words, notepad.exe is importing 5 functions from advapi32.dll.

As you can see in the HEXVIEW above, the first element in the IMAGE_THUNK_DATA array contains the value 0x00009138. Remember that the loader must know how to deal with this value, so it checks the first bit, which in this case is a 0, meaning that it is the AddressOfData member. AddressOfData is a RVA, which points to a IMAGE_IMPORT_BY_NAME struct. Using the RVA-to-Offset translation, the RVA 0x00009138 corresponds to the Offset 0x8538 on disk. Let's check out what's going on here. Remember that the information at this location is an IMAGE_IMPORT_BY_NAME struct.

PE Insider - HEXVIEW

From the HEXVIEW at Offset 0x8538, we find the word 0x0268 (underscored), which is the hint member. Then follows the name (Ascii string) of the imported function, in this case ReqQueryValueExW (and two NULL terminations). Then follows another IMAGE_IMPORT_BY_NAME struct, with a hint and an Ascii string, and so on.

So finally we know where the INT is located, and how to read it. The loader is using the INT to fill the IAT. For instance, the loader realize that notepad.exe is importing 5 functions (by name, and not by ordinal) from advapi32.dll and will look up their virtual addresses, and write them to notepad.exe's IAT.

You are welcome to leave comments, complaints or questions!

Dec 7, 2014

IAT in notepad.exe

A couple of weeks ago, I was inspecting and comparing the .text section of a PE file (notepad.exe) and the corresponding .text section which was loaded in memory. I realized that the first part of the .text section was not even close to be equal (file on disk v.s. file in memory). I googled a lot about this and finally learned that some part of the .text section may consist of the Import Address Table (IAT).

In this post I'm going to show you what I learned and what is going on in the IAT of notepad.exe. As usual, I'm using Windows Vista 32 bit.

If you don't know it, the IAT is a lookup table. This table is filled with virtual addresses to functions in other modules (in this case, functions outside notepad.exe) when the executable is loaded into memory. I will describe it in more detail later in this post. Do not confuse the IAT with the Import Name Table (INT). The INT is also a part of the PE file, but is not the same as the IAT. However, they do have some connections.

Let's start where I started. It was a sunny day and I was in a very good mood for some coding. Well, more seriously, I was inspecting the .text section using PE Insider from Cerbero. This is what I saw in the HEX view. Note that the .text section starts at offset 0x400 in the file.

PE Insider - Content of PE file (file on disk)

Back a few weeks, when I was inspecting notepad.exe in PE Insider, I thought that the .text section only contained binary machine code. And the binary machine code is not supposed to change when the file is loaded to memory. So I was expecting that the contents to be more or less the same in memory. Let's check out the memory when notepad.exe is executed. To do this I'm using WinDbg.


WinDbg - Content of memory (file loaded in memory)

The .text section starts at 0x00721000 since Windows decided to place notepad.exe at memory location 0x00720000 this time (remember that Windows Vista use ASLR and that notepad is ASLR compatible), and the Relative Virtual Address (RVA) for the .text section (according to the PE file, i.e. notepad.exe) is 0x1000.

It really puzzled me first time I compared the file content with the memory content. There were a lot of differences. First I thought WinDbg was injecting some code to handle the debug break, or maybe it was some unwanted code in the memory? I started to compare the contents (PE Insider - WinDbg) manually (byte by byte) and realized at some memory location, the contents started to be the same. See below.


PE Insider - Content of PE file (file on disk)
WinDbg - Content of memory (file loaded in memory)


Somewhere around the red marking, the contents starts to be equal. Why was only the first part differing? I was thinking that PE Insider maybe was not showing the correct file content. So I downloaded another PE viewer, PEBrowser Pro from SmidgeonSoft. I started PEBrowser Pro, opened notepad.exe and clicked the .text section, which opened a new window with the contents below.

PEBrowser Pro - .text section disassembled

PEBrowser Pro is a little more sophisticated than PE Insider, it can disassembles the .text section among other things. This is what we see above. But why did it start disassemble at 0x100138D and not 0x1001000? Note that PEBrowser Pro thinks notepad.exe should be loaded at its preferred load address which is 0x1000000 according to the ImageBase field in the Optional Header of notepad.exe (which means the .text section starts at 0x1001000). Apparently PEBrowser Pro realize that the first bytes in the .text section is not actually binary machine code, but something else. By having a look at the treeview of notepad.exe in PEBrowser Pro, it gave me some hints.

PEBrowser Pro - Treeview of file contents

Yes, apparently the .text section may consist of more than just binary machine code. It contains the IAT among other things.

But how do PEBrowser Pro know that the binary machine code starts at 0x100138D? I wasn't sure about this, but I guessed that the header of the PE file (notepad.exe) may give a clue. Back in PE Insider and viewing the Data Directories in the Optional Header I saw this.

PE Insider - Data Directories


The Data Directories, more specific, the Import Address Table Directory, tells us that the table starts at RVA 0x1000 i.e. this is where the .text section starts! Its size is 0x388 bytes. At this point, things began to be clear for me.

Next thing to understand was how to interpret the bytes in the IAT.

After some googling, I realized that when the PE file (notepad.exe) is loaded into memory, the IAT is filled with virtual addresses. This is virtual addresses in other modules loaded within the same virtual memory space as notepad.exe.

So let's take an example. Let's investigate the first address in the IAT (in memory), the address at RVA 0x1000. Below we can see that the first address in the IAT is 0x77ae765e. This virtual address is in another module, more specific in advapi32.dll and it is the address of the function "RegQueryValueExW".


WinDbg - IAT (file loaded in memory)

Below we see some part of the disassembled function.

WinDbg - Disassembling RegQueryValueExW


Well, now we know the meaning of the IAT bytes when the PE file is loaded in memory, but why does they differ comparing to the bytes on disk?

As mentioned above, the IAT is filled with addresses during load time, so the IAT can't really be filled before load time, since we have no idea where in virtual memory the DLLs will be loaded (since Windows Vista 32 bit use ASLR). Normally the IAT table (file on disk) contains information how to find the virtual addresses for each function in other DLLs (i.e. how to fill it self with proper addresses during load time). The Import Name Table (INT) contains the necessary information for this mission. Usually the INT and the IAT contains same information in the file on disk.

However, in the case of notepad.exe, the IAT (file on disk) is already filled with virtual addresses! The reason is because notepad.exe is bound to several DLLs. Using dumpbin (dumpbin /IMPORTS notepad.exe) we can see which DLLs notepad.exe is bound to.


dumpbin of notepad's import

I will describe binding here briefly, but I will probably have a more detailed explanation in another post.

Binding an EXE to a DLL makes the start up time shorter, when launching the application. Why? Because the virtual addresses to functions, outside the module, are already in the EXE (file on disk) and the loader does not need to bother about this. However, in order for the loader to not be bothered, there are some constraints that's need to be fulfilled. The virtual addresses in the IAT in the EXE (file on disk) assumes that the DLL will be loaded at its preferred load address, i.e. according to the ImageBase field in the Optional Header of the DLL. Further, the loaded DLL must be the version that the EXE expect (correct timestamp). Above notepad.exe expect the versions of all DLLs to be from Jan 19 2008.

Since Windows Vista 32 bit use ASLR, and notepad is ASLR compatible, the DLLs will probably not be loaded at its preferred load address. Further, most DLLs has probably been updated (since 2008), i.e. new versions/timestamps. These issues will force the loader to use the INT information even though the IAT is already filled with virtual addresses (since these virtual addresses on disk are incorrect in this case).

So finally, the answer to my introductory question, why is the first part of the .text section differing so much (file on disk v.s. file in memory)? The first part of the .text section is the IAT, which contains virtual addresses. File on disk contains virtual addresses to functions in DLLs of a certain (old) version (due to binding), but in memory, these virtual addresses are updated by the loader (if needed).

You are welcome to leave comments, complaints or questions!