Showing posts with label manifest. Show all posts
Showing posts with label manifest. Show all posts

Feb 7, 2016

DLL loading and initializing

A couple of years ago, I was developing a Windows application using Borland's IDE. This is a Rapid Application Development (RAD) tool. It is very easy to create a Windows application by just clicking the controls you want, and add the code for the event you want.

I realized that my Windows application appearance was very boring comparing to other applications. I learned that my application was using the old Standard theme, and the other ones were using Visual Styles. After some researching, I was able to take advantage of the Visual Styles as well, by adding a manifest to my application. The secret was to load version 6 (or higher) of comctl32.dll (and not version 5).

A couple of weeks ago, I started to dig into the details of the loaded comctl32.dll and the role of the manifest. What was inside version 6 of comctl32, which was not present in version 5? During my research, I learned how and when the DLL was loaded into the address space of the process, and further, I learned how and when DLLs in general are loaded into the address space and initialized. This is the subject for this post. In another post, I will expand the discussion and go into details for comctl32.dll, Visual Styles, atom tables and Activation Context.

Let's start to check out a very simple Windows application, it is built in Borlands IDE in Debug mode, using only Win32 DLLs (not Borlands Dynamic libraries).

A simple application in the IDE

A simple application
This application is clearly not using Visual Styles. It is using the old Standard theme, it has the old boring 3D style.

Now let's check out which DLLs are loaded are loaded into the address space of the process for this simple application. We can easily see them, using one of my favorite tool, Process Explorer.

A simple application with its DLLs in Process Explorer
Amazing, so many DLLs needed just for a simple window with a button.

When I saw the loaded DLLs for the first time, the first question which came to my mind was why comctl32.dll is loaded twice? With different versions? Within this process, comctl32.dll was loaded both as the 5.82- and 6.10- version.

To simplify this post, I will only deal with the 5.82 version of comctl32.dll, which is responsible for the old boring Standard theme. As far as I can tell at the moment, the 6.10 version of comctl32.dll is needed for the non-client area of my simple application. This can be proven by removing the border from my simple application. Just set the BorderStyle property to bsNone in the Borland IDE.

Non-client area of my simple application (border)


BorderStyle property in the IDE


A simple application without border (only client area)
A simple application without border with its DLLs in Process Explorer
Above, we see that only the 5.82 version of comctl32 is loaded. As mentioned before, the 5.82 version of comctl32.dll is responsible for the old boring 3D style. Further, we can see that shlwapi.dll is also missing. To conclude, my simple application without border, obviously has a need of total 19 DLLs for various reasons.

From now on, I will refer my simple application without border, to just "button.exe".

Alright, now we have concluded that button.exe needs 19 DLLs loaded into the address space of the process. Let's look at some properties of these DLLs.

First, I will look into which DLLs are implicitly linked. We can do that by wake up my good old friend Dependency Walker. Dependency Walker will show us which DLLs are implicitly linked to button.exe.


Implicitily linked DLLs (and their dependencies) in Dependency Walker
Above, these are the 7 DLLs, which are implicitly linked, i.e. they are defined in the Import Name Table (INT) of button.exe. One of my previous post is discussing the INT for notepad.exe, you may want to check it out here.

We can conclude that none of the 7 DLLs are delay loaded (from button.exe point of view). However, some of the seven DLLs has dependencies on other DLLs, which obviously are implictily linked and delay loaded. For instance, user32.dll has three delay loaded DLLs, which is indicated by the hourglass (msimg32, powrprof, windsta). For information about delay loaded DLLs, you may want to check out this link.

The next property I'm going to investigate, is if the DLL has an embedded manifest.
I have not used a manifest for button.exe. However, any of the loaded DLLs in the address space of the process may use a manifest. One way to view the embedded manifest, is to use the sigcheck tool from Sysinternals.

Sigcheck from Sysinternals can be used to view embedded manifest. Advapi32.dll has no embedded manifest.

Below is a summarize of what we have concluded so far from the DLLs within the address space of the process button.exe. The first table is showing the implicit linked DLLs, and the second one, the DLLs loaded at run-time for various reasons.

DLL name Version Path Embedded manifest
advapi32.dll 6.0.6002 System32 No
kernel32.dll 6.0.6002 System32 No
version.dll 6.0.6002 System32 No
comctl32.dll 5.82.6002 WinSxS No
gdi32.dll 6.0.6002 System32 No
user32.dll 6.0.6002 System32 No
oleaut32.dll 6.0.6002 System32 No

DLL name Version Path Embedded manifest
clbcatq.dll 2001.12.6931 System32 No
fshook32.dll - - No
imm32.dll 6.0.6002 System32 No
lpk.dll 6.0.6002 System32 No
msctf.dll 6.0.6002 System32 No
msvcrt.dll 7.0.6002 System32 No
ntdll.dll 6.0.6002 System32 No
ole32.dll 6.0.6002 System32 No
psapi.dll 6.0.6000 System32 No
rpcrt4.dll 6.0.6002 System32 No
usp10.dll 1.626.6002 System32 No
uxtheme.dll 6.0.6001 System32 Yes


I've chosen to not investigate fshook32.dll further, since this DLL is not really a Windows DLL, it is just a DLL which is hooked into the process. And as a matter of fact, fshook32.dll is dependent on psapi.dll, so that's is the reason why psapi.dll is loaded into the address space.

So now we know which DLLs are loaded into the address space of the process. The next thing to investigate is when they are loaded.

This can easily be seen in WinDbg. When firing up WinDbg with button.exe, we see that all the 19 DLLs are loaded into the address space of the process.


WinDbg console output when running my simple application
First we can see that button.exe is loaded into the address space. Then follows a set of loaded DLLs (before the first chance exception). These are the implicit linked DLLs, including their dependent DLLs. For instance, rpcrt4.dll is not implicitily linked to button.exe, but advapi32.dll is. Since advapi32.dll is dependent on rpcrt4.dll, rpcrt4.dll is loaded as well.

After the first chance exception, a new set of DLLs is loaded into the address space. These are DLLs which are explicitily loaded, including their dependent DLLs.

I will now summarize in a timeline when a DLL is loaded, and when its entry point function is executed. To find out the exact order each DLLs loading and initializing, I will set a breakpoint in ldrpmapdll in ntdll.dll. When this function is executed, the DLL is loaded into the address space of the process. When I know the DLL is loaded, I'm able to set a new breakpoint in entry point function of the DLL to find out when it is initialized.

Instead of setting a breakpoint in ldrpmapdll, I could have set one in LoadLibrary instead. However, LoadLibrary is loading the DLL into the address space as well as initialize it.

Note that ldrpmapdll is only executed if the DLL is not loaded into the address space. The program can call for LoadLibrary several times with the same DLL, but if the DLL is already loaded, ldrpmapdll will not be executed. In other words, ldrpmapdll is only executed once for each DLL. In my research below, I am only taking notice when the DLL is loaded for the first (and only) time.

I will also set breakpoint in button's entry function as well as in WinMain. However, I'm not able to use symbols in WinDbg for my Borland executable, so I insert one in the code.

Added debugbreak in button's WinMain code
Alright, that is how the procedure will look like, it is now time to actually do this.

Let's start and open button.exe in WinDbg and set a breakpoint in each loaded DLL's entry point after the first chance exception.

Using !dlls in WinDbg

When using the !dlls command, we can see the entry point for each loaded DLL. At this moment, only implicit linked DLLs (and their dependent DLLs) has been loaded. In the screenshot above, we also see the loaded button.exe, which entry point is 0x00401374.

We can see that ndll.dll has an entry point at address 0x00000000, which seems strange. I don't really know what this means, but I can guess ntdll does not have a regular entry point. If you know, please let me know.

So know let's set a breakpoint in each entry point function as well as the ldrpmapdll function.

Breakpoints in WinDbg
Remember that I've set a breakpoint (in code) in button.exe.

Now it is time to start step through the execution.

In the beginning, nothing particular is happening, rpcrt4, advapi32, msvcrt, version is initialized. Then it is time for user32 to be initialized and the entry point function UserClientDLLInitialize is called. During this initialization code, a lot of stuff is happening.

First, the debugger breaks in the function ldrpmapdll. Apparently, user32.dll entry point functions calls for a function _InitializeImmEntryTable, which calls for the LoadLibraryW function.
When stepping further, we can see that imm32.dll is loaded into the address space. Immediately after the DLL is loaded, I use the !dlls command again to find out the entry point and set a breakpoint. This is shown below.


Callstack in WinDbg for ldrpmapdll
 

imm32.dll is loaded into the address space




Setting a breakpoint when imm32.dll is loaded into the address space
When stepping further again, another break appears in the ldrpmapdll function. When stepping further, we will see that msctf.dll is loaded into the address space. As I did with imm32.dll, I will immediately set a breakpoint in msctf's entry point function.

This time we have loaded an implicit linked DLL to imm32.dll into the address space


msctf is loaded into the address space



Setting a breakpoint when msctf.dll is loaded into the address space
At this moment, two additional DLLs has been loaded into the address space; imm32.dll and msctf.dll. Msctf.dll was loaded because imm32.dll was dependent on it.

When stepping further, following two breakpoints is observed.

Entry point for msctf.dll is executed

Entry point for imm32.dll is executed

We can see that the two newly loaded DLLs are initialized in reversed order. Msctf.dll was loaded after imm32.dll, but msctf.dll was initialized before imm32.dll.

When stepping further, we will see that lpk.dll is loaded into the address space, as well as the DLLs it depends on (in this case lpk.dll depends on usp10.dll). This is the same procedure as was done for imm32.dll and msctf.dll, so I will not go into detail.

We are about to load lpk.dll

As I did for imm32.dll and msctf.dll, I've set breakpoints in their entry points function. I can conclude that lpk.dll and usp10.dll is loaded and initialized in the same manner as imm32.dll and msctf.dll.

Finally, we are finished with the entry point for user32.dll.

When stepping further, WinDbg breaks in gdi32, comctl32.dll, ole32, oleaut32 initialization functions.

The next break appears in our button entry point function.

Break in button.exe entry point
Once I've entered the entry point function, I remove the breakpoint. I've encountered some strange behavior when keeping this breakpoint for unknown reasons. The callstack has ended up in the kernel part of the memory space.

Removing the breakpoint in button.exe entry point function
Stepping further, we will see that two additional DLLs are loaded into the address space. These are fshook32.dll and psapi.dll. They are loaded and initialized the same way as imm32.dll and msctf.dll.

When fshook32.dll and psapi.dll are loaded and initialized, another break appears in ldrpmapdll. This time it is uxtheme.dll. It is loaded and initialized.

Stepping further again, we have finally reached the WinMain function in button.exe. Below, we can see the code I've manually added before.

Breakpoint in WinMain in button.exe

When WinMain is executed, another break appears in ldrpmapdll.

Clbcatq.dll is delay-loaded

Above, I am not showing the complete callstack, but we can understand that this DLL is delay-loaded, thanks to the __delayLoadHelper2 function in the callstack. When stepping further, we see that clbcatq.dll is loaded into the address space.

The fact the clbcatq.dll is delay-loaded, can be proven by checking Dependency Walker and open ole32.dll.

Clbcatq.dll is a delay-loaded DLL.
Clbcatq.dll was the final DLL to be loaded, so know our application has finally started, and we can see the window on the screen.



button.exe has finally created the window visible on the screen.

To conclude the flow of loaded/initialized DLLs, I've drawn a simple flow diagram.
Flow of the loaded/initialized DLLs
The timeline starts at the button (the green cylinder), and ends at clbcatq (red rectangle). I was not able to create a straight timeline, since it was too many events, so I had to do a timeline with some turns.

As we can see, we start by loading button.exe into the address space, then follows a set of DLLs loaded into the address space. Rpcrt4.dll is the first DLL which will be initialized (execution of the entry point function). When user32.dll is initialized, a lot of stuff is happening, so that's why I've drawn a red border around some blocks, to show that all these DLLs are processed within the user32 initialization code.

Between button.exe entry point and WinMain, some additional DLLs are loaded and initialized. And finally, during WinMain execution, a dealy-loaded DLL is loaded into the address space.

Another interesting note; during the stepping in button.exe in WinDbg, a break was made in each entry point, except for kernel32.dll. I guess it is because kernel32.dll is initialized before we have a chance to set a breakpoint. Ntdll.dll and kernel32.dll are probably guaranteed to be loaded and initialized before anything else is loaded.

You are welcome to leave comments, complaints or questions!

Jan 5, 2015

PE file with empty main()

If you build an empty console program, how many bytes is needed for the Portable Executable (PE)? And what's inside the PE file?

In this post I'm experimenting with an empty console program. I'm reducing the PE file, so it just contains the headers and the binary machine code. I'm using Visual Studio 2010 Express and building the PE files in Release mode. Further, I'm using PE Insider from Cerbero and PEBrowserPro from Smidgeonsoft.

Let's compile and link the following empty program below.
int main()
{
   return 0;
}

A program doing nothing. So simple as it can be. Now, let's check out the size of this program, using file properties.

File properties - size of file


Alright, 6144 bytes of binary data is needed for this empty program. Why is this size needed, and what kind of binary data is in there? Let's fire up PE Insider, and first check out the header size, and then the section table.

PE Insider - size of headers

PE Insider - section table

Above, the SizeOfHeaders present the (HEX)size of the headers, and the SizeOfRawData, present the (HEX)size for each section needed on disk.

Let's try to understand the number 6144. This is actually the sum of the size of the headers and the size of the sections. So let's sum it to verify this:

0x400+0x800+0x600+0x200+0x200+0x200 = 0x1800 (DEC: 6144)

Okay, now we understand why the size is 6144. But what's inside all this binary data?

First let's check out the .text section, i.e. the code. This section needs 0x800 bytes. However, my empty program is doing nothing!

PE Insider - .text section

Above, the .text section starts at offset 0x400 (file on disk), and there is a lot of things going on.

Again, the main function is doing nothing. But there is a lot of other code around in the .text section. It's code from the C Runtime library. For instance, the main function is not the first function called when executing the program, the first function called is the mainCRTStartup function. This is a function in the C Runtime library (and part of the PE file). How do I know this? Well, each program has an entry point, which is specified in one of the headers in the PE file. Let's check it out.

PE Insider - Entry point
 
PEBrowserPro - disassemble view of entry point

Okay, the entry point is at Relative Virtual Address (RVA) 0x12A0 (of course within the .text section). Thanks to the disassemble view from PEBrowserPro, we can see what's going on there. At this RVA, the mainCRTStartup is located.

Next test to do; let's tell Visual Studio to use another entry point (i.e. not the mainCRTStartup). We can do this in the Property Pages dialog.


Visual Studio - specifying my own entry point

After compiling and linking, let's check the file properties again.

File properties - size of file

Wow! The file size is reduced! From 6144 bytes to 3072 bytes. Let's check out the section table again.

PE Insider - section table

Comparing to the screenshots above, the .text section is reduced from 0x800 to 0x200, the .rdata section from 0x600 to 0x200, the .data section 0x200 to 0x000. The .rsrc and the .reloc section remain the same size.

So how do the .text section look like now when we specified our own entry point?

PE Insider - .text section (main entry point)


PEBrowserPro - disassemble view - .text section (main entry point)

The only thing going on in the .text section is the return 0 statement. We have managed to get rid of the C Runtime Library code.

Note that there is a lot of 0's in the .text section. This is just zeropadding, so each section can start at a multiple of 0x200.

Now let's continue with the other sections in the PE file. The purpose for the .reloc section, is to help the loader to do some relocation if the executable not is loaded at its preferred load address. If we remove the DYNAMICBASE switch, the .reloc section will be removed. This means that the executable always will be loaded at its preferred load address, and does not need any relocation information. Let's remove the DYNAMICBASE switch.


Visual Studio - Remove dynamicbase

 Next, compile and link again and check out the file properties, and the section table again.

File properties - size of file
PE Insider - section table

Voila! The file size is decreased from 3072 bytes to 3036 bytes. The .reloc section is gone.

Let's continue with the other sections. What's going on in the .rdata section and the .rsrc section?

PE Insider - .rdata section
 
PE Insider - .rsrc section

Thanks to the ASCII view, we can figure out that Visual Studio embed a manifest in .rsrc section. We can also see that there is a PDB path in the .rdata section.

If we want to get rid of this, we just go to the Property Pages and remove the manifest as well as the debug information.


Visual Studio - Removing manifest



Visual Studio - Removing debugging information

Again, compile and link and check out the file size, the section table, and the .rdata section.

File properties - size of file
PE Insider - section table
PE Insider - .rdata section, remaining data

Not much left. File size is reduced from 3036 bytes to 2048 bytes. The .rsrc section is gone. The .text section contains a couple of bytes of binary machine code and the .data section is empty. There remain some data in the .rdata section. I'm not completely sure what this data is. If you know, please tell me. However, it can be removed by saying no to Whole Program Optimization.


Visual Studio - Whole Program Optimization

Compile and link the PE file and let's have a final look at the file properties and the section table.

File properties - size of file


PE Insider - section table

Well, that's about it! The file size is reduced from 2048 bytes to 1024 bytes. The PE file now just contains the headers and the .text section with a couple of binary machine codes.

You are welcome to leave comments, complaints or questions!