Oct 22, 2014

Punctuators and digraphs

Have you ever seen C++ code like below?
int main(int argc, char* argv<::>)
<%
 return 0;
%>
The code above is perfectly valid. It compiles fine in Borland Developer Studio 2006. But I would normally write this code like below instead.
int main(int argc, char* argv[])
{
 return 0;
}
In the first block of code, the punctuators are written as digraphs. Yes, some punctuators may exist as digraphs. A digraph is a sequence of two characters which is treated as a single character. In C++, there are six punctuators which can exist as a digraph.
(Note that there also exist trigraphs for some punctuators. A trigraph is a sequence of three characters. However, Borland Developer Studio 2006 seems not to support trigraphs.)

Below you can see the punctuator to the left and the corresponding digraph to the right.

{   <%
}   %>
[   <:
]   :>
#   %:
##   %:%:

But what is a punctuator?

This is cited from msdn

Punctuators in C++ have syntactic and semantic meaning to the compiler but do not, of themselves, specify an operation that yields a value. Some punctuators, either alone or in combination, can also be C++ operators or be significant to the preprocessor.
Any of the following characters are considered punctuators:

! % ^ & * ( ) – + = { } | ~
[ ] \ ; ' : " < > ? , . / #

Why does it exist digraphs for some punctuators?

Because C++ language must be writable in all versions of the ISO 646 character set. ISO 646 character set is a very old 7 bit standard set. Nowadays you don't normally use 7 bit sets, more likely a 8 bit set or Unicode. However, back in the old days ISO 646 was used. This set exist in several national versions and all these versions does not necessarily include all punctuators.

For instance, the German guys used their own version of the ISO 646 character set, called ISO 646 DE. Since these guys needed characters like ä and ü, these characters replaced the characters { and } in the set. You may want to read more about it on Wikipedia.

Apparently only six punctuators exists as digraphs (see above). I've tested them all in Borland Developer Studio 2006 and it works fine. There are actually other punctuators such \, ~, ^, and | which may not exist in all versions of the ISO 646 character set and these does not exists as digraphs only as trigraphs.

This subject is not really of concern today, but it thrilled me when I realized that you can write common punctuators as digraphs :).

No comments:

Post a Comment