Oct 27, 2014

Graphical memory layout for notepad process

I've seen several sites on the internet, which describes how the virtual memory layout looks like for a Win32 application. But these sites seldom shows a virtual memory layout with real addresses, they tend to show only conceptual views. So I made a graphical layout of my own, based on real virtual addresses.

Before proceeding, just a short recap about virtual addresses. For a 32 bit application, each process got its own private memory space, which goes from 0 to 232−1 (DEC: 4294967295, HEX: FFFFFFFF), where each position occupy a single byte. We normally say that a process has about 4 GB of memory. However, only the first 2 GB is dedicated as user-mode memory. The remaining 2 GB is occupied by the kernel-mode memory. (It is possible to extend the user-mode memory from 2 GB to 3 GB, but that is not considered in this post)

I'm using Process Explorer from Sysinternals, to find out the load addresses and image sizes. I'm using Windows Vista 32 bit as platform. Windows Vista can take advantage of the Address Space Layout Randomization (ASLR) feature. ASLR, among other things, will randomly choose a load address for the EXE and DLLs. I may have a specific post about ASLR in the future. However, in my case, the ASLR is in action, which can be verified from the screenshot below.


Screenshot of Process Explorer, showing some of the loaded images

Note that notepad.exe is mapped to the address 0xCF0000 and occupy 0x28000 bytes. Since ASLR is in action, the load address will probably differ in another boot session. Yes, that's right. A new load address will be given to notepad.exe each time you reboot your computer. But the load address remains if you only restart notepad during the same boot session.

Since the load address and the size of the image is known, I simply enter this information to Excel and plot a diagram. The diagram below is made from another boot session.



I've colored the upper 2 GB (kernel-mode memory) black. Process Explorer does not really tell us what's going on there. As you can see, the images does not occupy that much space. There is a lot of space for the heap, stacks and so on.

Unfortunately I was not able to use hex numbers on the Y-axis. But 2 147 483 648 is equal to 0x80000000, which is half of the 32 bit address space.

Oct 22, 2014

Punctuators and digraphs

Have you ever seen C++ code like below?
int main(int argc, char* argv<::>)
<%
 return 0;
%>
The code above is perfectly valid. It compiles fine in Borland Developer Studio 2006. But I would normally write this code like below instead.
int main(int argc, char* argv[])
{
 return 0;
}
In the first block of code, the punctuators are written as digraphs. Yes, some punctuators may exist as digraphs. A digraph is a sequence of two characters which is treated as a single character. In C++, there are six punctuators which can exist as a digraph.
(Note that there also exist trigraphs for some punctuators. A trigraph is a sequence of three characters. However, Borland Developer Studio 2006 seems not to support trigraphs.)

Below you can see the punctuator to the left and the corresponding digraph to the right.

{   <%
}   %>
[   <:
]   :>
#   %:
##   %:%:

But what is a punctuator?

This is cited from msdn

Punctuators in C++ have syntactic and semantic meaning to the compiler but do not, of themselves, specify an operation that yields a value. Some punctuators, either alone or in combination, can also be C++ operators or be significant to the preprocessor.
Any of the following characters are considered punctuators:

! % ^ & * ( ) – + = { } | ~
[ ] \ ; ' : " < > ? , . / #

Why does it exist digraphs for some punctuators?

Because C++ language must be writable in all versions of the ISO 646 character set. ISO 646 character set is a very old 7 bit standard set. Nowadays you don't normally use 7 bit sets, more likely a 8 bit set or Unicode. However, back in the old days ISO 646 was used. This set exist in several national versions and all these versions does not necessarily include all punctuators.

For instance, the German guys used their own version of the ISO 646 character set, called ISO 646 DE. Since these guys needed characters like ä and ü, these characters replaced the characters { and } in the set. You may want to read more about it on Wikipedia.

Apparently only six punctuators exists as digraphs (see above). I've tested them all in Borland Developer Studio 2006 and it works fine. There are actually other punctuators such \, ~, ^, and | which may not exist in all versions of the ISO 646 character set and these does not exists as digraphs only as trigraphs.

This subject is not really of concern today, but it thrilled me when I realized that you can write common punctuators as digraphs :).