Feb 7, 2016

DLL loading and initializing

A couple of years ago, I was developing a Windows application using Borland's IDE. This is a Rapid Application Development (RAD) tool. It is very easy to create a Windows application by just clicking the controls you want, and add the code for the event you want.

I realized that my Windows application appearance was very boring comparing to other applications. I learned that my application was using the old Standard theme, and the other ones were using Visual Styles. After some researching, I was able to take advantage of the Visual Styles as well, by adding a manifest to my application. The secret was to load version 6 (or higher) of comctl32.dll (and not version 5).

A couple of weeks ago, I started to dig into the details of the loaded comctl32.dll and the role of the manifest. What was inside version 6 of comctl32, which was not present in version 5? During my research, I learned how and when the DLL was loaded into the address space of the process, and further, I learned how and when DLLs in general are loaded into the address space and initialized. This is the subject for this post. In another post, I will expand the discussion and go into details for comctl32.dll, Visual Styles, atom tables and Activation Context.

Let's start to check out a very simple Windows application, it is built in Borlands IDE in Debug mode, using only Win32 DLLs (not Borlands Dynamic libraries).

A simple application in the IDE

A simple application
This application is clearly not using Visual Styles. It is using the old Standard theme, it has the old boring 3D style.

Now let's check out which DLLs are loaded are loaded into the address space of the process for this simple application. We can easily see them, using one of my favorite tool, Process Explorer.

A simple application with its DLLs in Process Explorer
Amazing, so many DLLs needed just for a simple window with a button.

When I saw the loaded DLLs for the first time, the first question which came to my mind was why comctl32.dll is loaded twice? With different versions? Within this process, comctl32.dll was loaded both as the 5.82- and 6.10- version.

To simplify this post, I will only deal with the 5.82 version of comctl32.dll, which is responsible for the old boring Standard theme. As far as I can tell at the moment, the 6.10 version of comctl32.dll is needed for the non-client area of my simple application. This can be proven by removing the border from my simple application. Just set the BorderStyle property to bsNone in the Borland IDE.

Non-client area of my simple application (border)


BorderStyle property in the IDE


A simple application without border (only client area)
A simple application without border with its DLLs in Process Explorer
Above, we see that only the 5.82 version of comctl32 is loaded. As mentioned before, the 5.82 version of comctl32.dll is responsible for the old boring 3D style. Further, we can see that shlwapi.dll is also missing. To conclude, my simple application without border, obviously has a need of total 19 DLLs for various reasons.

From now on, I will refer my simple application without border, to just "button.exe".

Alright, now we have concluded that button.exe needs 19 DLLs loaded into the address space of the process. Let's look at some properties of these DLLs.

First, I will look into which DLLs are implicitly linked. We can do that by wake up my good old friend Dependency Walker. Dependency Walker will show us which DLLs are implicitly linked to button.exe.


Implicitily linked DLLs (and their dependencies) in Dependency Walker
Above, these are the 7 DLLs, which are implicitly linked, i.e. they are defined in the Import Name Table (INT) of button.exe. One of my previous post is discussing the INT for notepad.exe, you may want to check it out here.

We can conclude that none of the 7 DLLs are delay loaded (from button.exe point of view). However, some of the seven DLLs has dependencies on other DLLs, which obviously are implictily linked and delay loaded. For instance, user32.dll has three delay loaded DLLs, which is indicated by the hourglass (msimg32, powrprof, windsta). For information about delay loaded DLLs, you may want to check out this link.

The next property I'm going to investigate, is if the DLL has an embedded manifest.
I have not used a manifest for button.exe. However, any of the loaded DLLs in the address space of the process may use a manifest. One way to view the embedded manifest, is to use the sigcheck tool from Sysinternals.

Sigcheck from Sysinternals can be used to view embedded manifest. Advapi32.dll has no embedded manifest.

Below is a summarize of what we have concluded so far from the DLLs within the address space of the process button.exe. The first table is showing the implicit linked DLLs, and the second one, the DLLs loaded at run-time for various reasons.

DLL name Version Path Embedded manifest
advapi32.dll 6.0.6002 System32 No
kernel32.dll 6.0.6002 System32 No
version.dll 6.0.6002 System32 No
comctl32.dll 5.82.6002 WinSxS No
gdi32.dll 6.0.6002 System32 No
user32.dll 6.0.6002 System32 No
oleaut32.dll 6.0.6002 System32 No

DLL name Version Path Embedded manifest
clbcatq.dll 2001.12.6931 System32 No
fshook32.dll - - No
imm32.dll 6.0.6002 System32 No
lpk.dll 6.0.6002 System32 No
msctf.dll 6.0.6002 System32 No
msvcrt.dll 7.0.6002 System32 No
ntdll.dll 6.0.6002 System32 No
ole32.dll 6.0.6002 System32 No
psapi.dll 6.0.6000 System32 No
rpcrt4.dll 6.0.6002 System32 No
usp10.dll 1.626.6002 System32 No
uxtheme.dll 6.0.6001 System32 Yes


I've chosen to not investigate fshook32.dll further, since this DLL is not really a Windows DLL, it is just a DLL which is hooked into the process. And as a matter of fact, fshook32.dll is dependent on psapi.dll, so that's is the reason why psapi.dll is loaded into the address space.

So now we know which DLLs are loaded into the address space of the process. The next thing to investigate is when they are loaded.

This can easily be seen in WinDbg. When firing up WinDbg with button.exe, we see that all the 19 DLLs are loaded into the address space of the process.


WinDbg console output when running my simple application
First we can see that button.exe is loaded into the address space. Then follows a set of loaded DLLs (before the first chance exception). These are the implicit linked DLLs, including their dependent DLLs. For instance, rpcrt4.dll is not implicitily linked to button.exe, but advapi32.dll is. Since advapi32.dll is dependent on rpcrt4.dll, rpcrt4.dll is loaded as well.

After the first chance exception, a new set of DLLs is loaded into the address space. These are DLLs which are explicitily loaded, including their dependent DLLs.

I will now summarize in a timeline when a DLL is loaded, and when its entry point function is executed. To find out the exact order each DLLs loading and initializing, I will set a breakpoint in ldrpmapdll in ntdll.dll. When this function is executed, the DLL is loaded into the address space of the process. When I know the DLL is loaded, I'm able to set a new breakpoint in entry point function of the DLL to find out when it is initialized.

Instead of setting a breakpoint in ldrpmapdll, I could have set one in LoadLibrary instead. However, LoadLibrary is loading the DLL into the address space as well as initialize it.

Note that ldrpmapdll is only executed if the DLL is not loaded into the address space. The program can call for LoadLibrary several times with the same DLL, but if the DLL is already loaded, ldrpmapdll will not be executed. In other words, ldrpmapdll is only executed once for each DLL. In my research below, I am only taking notice when the DLL is loaded for the first (and only) time.

I will also set breakpoint in button's entry function as well as in WinMain. However, I'm not able to use symbols in WinDbg for my Borland executable, so I insert one in the code.

Added debugbreak in button's WinMain code
Alright, that is how the procedure will look like, it is now time to actually do this.

Let's start and open button.exe in WinDbg and set a breakpoint in each loaded DLL's entry point after the first chance exception.

Using !dlls in WinDbg

When using the !dlls command, we can see the entry point for each loaded DLL. At this moment, only implicit linked DLLs (and their dependent DLLs) has been loaded. In the screenshot above, we also see the loaded button.exe, which entry point is 0x00401374.

We can see that ndll.dll has an entry point at address 0x00000000, which seems strange. I don't really know what this means, but I can guess ntdll does not have a regular entry point. If you know, please let me know.

So know let's set a breakpoint in each entry point function as well as the ldrpmapdll function.

Breakpoints in WinDbg
Remember that I've set a breakpoint (in code) in button.exe.

Now it is time to start step through the execution.

In the beginning, nothing particular is happening, rpcrt4, advapi32, msvcrt, version is initialized. Then it is time for user32 to be initialized and the entry point function UserClientDLLInitialize is called. During this initialization code, a lot of stuff is happening.

First, the debugger breaks in the function ldrpmapdll. Apparently, user32.dll entry point functions calls for a function _InitializeImmEntryTable, which calls for the LoadLibraryW function.
When stepping further, we can see that imm32.dll is loaded into the address space. Immediately after the DLL is loaded, I use the !dlls command again to find out the entry point and set a breakpoint. This is shown below.


Callstack in WinDbg for ldrpmapdll
 

imm32.dll is loaded into the address space




Setting a breakpoint when imm32.dll is loaded into the address space
When stepping further again, another break appears in the ldrpmapdll function. When stepping further, we will see that msctf.dll is loaded into the address space. As I did with imm32.dll, I will immediately set a breakpoint in msctf's entry point function.

This time we have loaded an implicit linked DLL to imm32.dll into the address space


msctf is loaded into the address space



Setting a breakpoint when msctf.dll is loaded into the address space
At this moment, two additional DLLs has been loaded into the address space; imm32.dll and msctf.dll. Msctf.dll was loaded because imm32.dll was dependent on it.

When stepping further, following two breakpoints is observed.

Entry point for msctf.dll is executed

Entry point for imm32.dll is executed

We can see that the two newly loaded DLLs are initialized in reversed order. Msctf.dll was loaded after imm32.dll, but msctf.dll was initialized before imm32.dll.

When stepping further, we will see that lpk.dll is loaded into the address space, as well as the DLLs it depends on (in this case lpk.dll depends on usp10.dll). This is the same procedure as was done for imm32.dll and msctf.dll, so I will not go into detail.

We are about to load lpk.dll

As I did for imm32.dll and msctf.dll, I've set breakpoints in their entry points function. I can conclude that lpk.dll and usp10.dll is loaded and initialized in the same manner as imm32.dll and msctf.dll.

Finally, we are finished with the entry point for user32.dll.

When stepping further, WinDbg breaks in gdi32, comctl32.dll, ole32, oleaut32 initialization functions.

The next break appears in our button entry point function.

Break in button.exe entry point
Once I've entered the entry point function, I remove the breakpoint. I've encountered some strange behavior when keeping this breakpoint for unknown reasons. The callstack has ended up in the kernel part of the memory space.

Removing the breakpoint in button.exe entry point function
Stepping further, we will see that two additional DLLs are loaded into the address space. These are fshook32.dll and psapi.dll. They are loaded and initialized the same way as imm32.dll and msctf.dll.

When fshook32.dll and psapi.dll are loaded and initialized, another break appears in ldrpmapdll. This time it is uxtheme.dll. It is loaded and initialized.

Stepping further again, we have finally reached the WinMain function in button.exe. Below, we can see the code I've manually added before.

Breakpoint in WinMain in button.exe

When WinMain is executed, another break appears in ldrpmapdll.

Clbcatq.dll is delay-loaded

Above, I am not showing the complete callstack, but we can understand that this DLL is delay-loaded, thanks to the __delayLoadHelper2 function in the callstack. When stepping further, we see that clbcatq.dll is loaded into the address space.

The fact the clbcatq.dll is delay-loaded, can be proven by checking Dependency Walker and open ole32.dll.

Clbcatq.dll is a delay-loaded DLL.
Clbcatq.dll was the final DLL to be loaded, so know our application has finally started, and we can see the window on the screen.



button.exe has finally created the window visible on the screen.

To conclude the flow of loaded/initialized DLLs, I've drawn a simple flow diagram.
Flow of the loaded/initialized DLLs
The timeline starts at the button (the green cylinder), and ends at clbcatq (red rectangle). I was not able to create a straight timeline, since it was too many events, so I had to do a timeline with some turns.

As we can see, we start by loading button.exe into the address space, then follows a set of DLLs loaded into the address space. Rpcrt4.dll is the first DLL which will be initialized (execution of the entry point function). When user32.dll is initialized, a lot of stuff is happening, so that's why I've drawn a red border around some blocks, to show that all these DLLs are processed within the user32 initialization code.

Between button.exe entry point and WinMain, some additional DLLs are loaded and initialized. And finally, during WinMain execution, a dealy-loaded DLL is loaded into the address space.

Another interesting note; during the stepping in button.exe in WinDbg, a break was made in each entry point, except for kernel32.dll. I guess it is because kernel32.dll is initialized before we have a chance to set a breakpoint. Ntdll.dll and kernel32.dll are probably guaranteed to be loaded and initialized before anything else is loaded.

You are welcome to leave comments, complaints or questions!