In this post I'm going to show you what I learned and what is going on in the IAT of notepad.exe. As usual, I'm using Windows Vista 32 bit.
If you don't know it, the IAT is a lookup table. This table is filled with virtual addresses to functions in other modules (in this case, functions outside notepad.exe) when the executable is loaded into memory. I will describe it in more detail later in this post. Do not confuse the IAT with the Import Name Table (INT). The INT is also a part of the PE file, but is not the same as the IAT. However, they do have some connections.
Let's start where I started. It was a sunny day and I was in a very good mood for some coding. Well, more seriously, I was inspecting the .text section using PE Insider from Cerbero. This is what I saw in the HEX view. Note that the .text section starts at offset 0x400 in the file.
PE Insider - Content of PE file (file on disk) |
Back a few weeks, when I was inspecting notepad.exe in PE Insider, I thought that the .text section only contained binary machine code. And the binary machine code is not supposed to change when the file is loaded to memory. So I was expecting that the contents to be more or less the same in memory. Let's check out the memory when notepad.exe is executed. To do this I'm using WinDbg.
WinDbg - Content of memory (file loaded in memory) |
The .text section starts at 0x00721000 since Windows decided to place notepad.exe at memory location 0x00720000 this time (remember that Windows Vista use ASLR and that notepad is ASLR compatible), and the Relative Virtual Address (RVA) for the .text section (according to the PE file, i.e. notepad.exe) is 0x1000.
It really puzzled me first time I compared the file content with the memory content. There were a lot of differences. First I thought WinDbg was injecting some code to handle the debug break, or maybe it was some unwanted code in the memory? I started to compare the contents (PE Insider - WinDbg) manually (byte by byte) and realized at some memory location, the contents started to be the same. See below.
PE Insider - Content of PE file (file on disk) |
WinDbg - Content of memory (file loaded in memory) |
PEBrowser Pro - .text section disassembled |
PEBrowser Pro is a little more sophisticated than PE Insider, it can disassembles the .text section among other things. This is what we see above. But why did it start disassemble at 0x100138D and not 0x1001000? Note that PEBrowser Pro thinks notepad.exe should be loaded at its preferred load address which is 0x1000000 according to the ImageBase field in the Optional Header of notepad.exe (which means the .text section starts at 0x1001000). Apparently PEBrowser Pro realize that the first bytes in the .text section is not actually binary machine code, but something else. By having a look at the treeview of notepad.exe in PEBrowser Pro, it gave me some hints.
PEBrowser Pro - Treeview of file contents |
Yes, apparently the .text section may consist of more than just binary machine code. It contains the IAT among other things.
But how do PEBrowser Pro know that the binary machine code starts at 0x100138D? I wasn't sure about this, but I guessed that the header of the PE file (notepad.exe) may give a clue. Back in PE Insider and viewing the Data Directories in the Optional Header I saw this.
PE Insider - Data Directories |
The Data Directories, more specific, the Import Address Table Directory, tells us that the table starts at RVA 0x1000 i.e. this is where the .text section starts! Its size is 0x388 bytes. At this point, things began to be clear for me.
Next thing to understand was how to interpret the bytes in the IAT.
After some googling, I realized that when the PE file (notepad.exe) is loaded into memory, the IAT is filled with virtual addresses. This is virtual addresses in other modules loaded within the same virtual memory space as notepad.exe.
So let's take an example. Let's investigate the first address in the IAT (in memory), the address at RVA 0x1000. Below we can see that the first address in the IAT is 0x77ae765e. This virtual address is in another module, more specific in advapi32.dll and it is the address of the function "RegQueryValueExW".
WinDbg - IAT (file loaded in memory) |
Below we see some part of the disassembled function.
WinDbg - Disassembling RegQueryValueExW |
Well, now we know the meaning of the IAT bytes when the PE file is loaded in memory, but why does they differ comparing to the bytes on disk?
As mentioned above, the IAT is filled with addresses during load time, so the IAT can't really be filled before load time, since we have no idea where in virtual memory the DLLs will be loaded (since Windows Vista 32 bit use ASLR). Normally the IAT table (file on disk) contains information how to find the virtual addresses for each function in other DLLs (i.e. how to fill it self with proper addresses during load time). The Import Name Table (INT) contains the necessary information for this mission. Usually the INT and the IAT contains same information in the file on disk.
However, in the case of notepad.exe, the IAT (file on disk) is already filled with virtual addresses! The reason is because notepad.exe is bound to several DLLs. Using dumpbin (dumpbin /IMPORTS notepad.exe) we can see which DLLs notepad.exe is bound to.
dumpbin of notepad's import |
I will describe binding here briefly, but I will probably have a more detailed explanation in another post.
Binding an EXE to a DLL makes the start up time shorter, when launching the application. Why? Because the virtual addresses to functions, outside the module, are already in the EXE (file on disk) and the loader does not need to bother about this. However, in order for the loader to not be bothered, there are some constraints that's need to be fulfilled. The virtual addresses in the IAT in the EXE (file on disk) assumes that the DLL will be loaded at its preferred load address, i.e. according to the ImageBase field in the Optional Header of the DLL. Further, the loaded DLL must be the version that the EXE expect (correct timestamp). Above notepad.exe expect the versions of all DLLs to be from Jan 19 2008.
Since Windows Vista 32 bit use ASLR, and notepad is ASLR compatible, the DLLs will probably not be loaded at its preferred load address. Further, most DLLs has probably been updated (since 2008), i.e. new versions/timestamps. These issues will force the loader to use the INT information even though the IAT is already filled with virtual addresses (since these virtual addresses on disk are incorrect in this case).
So finally, the answer to my introductory question, why is the first part of the .text section differing so much (file on disk v.s. file in memory)? The first part of the .text section is the IAT, which contains virtual addresses. File on disk contains virtual addresses to functions in DLLs of a certain (old) version (due to binding), but in memory, these virtual addresses are updated by the loader (if needed).
You are welcome to leave comments, complaints or questions!
No comments:
Post a Comment