C++ in 2005: Can It Become A Java Beater?
by
Dare Obasanjo
Recently in an
interview with Linux World Bjarne Stroustrup described his
concerns for the C++ language as well as a wish list of libraries
he would like added to the standard when it is next up for review
in 2005. The following article is an analysis of Bjarne's
wishlist as it relates to C++ regaining the ground it has lost due
to Java's emerging popularity on the server. I'll also
discuss a few libraries I'd like to see added that were not
discussed by Bjarne.
NOTE: Bjarne Stroustrup does not mention Java in this
interview, Java vs. C++ is simply the direction I have decided to
take the results of the interview so that there is something to
compare the wishlist against.
Concurrency
Bjarne: I'd like to see a library supporting threads and a
related library supporting concurrency without shared memory.
Multithreaded programming is very important because it allows for
unique advantages including exploiting parallelism on machines with
multiple processors, making programs appear faster by parallelizing
disk bound I/O and allowing for better modularization of code.
Currently C++ developers who want to use threads in a standard
compliant manner must use the
PThreads library which uses the Mutual Exclusion model to
handle concurrency. Mutual Exclusion primarily involves using
Mutex variables and
Condition variables to handle thread synchronization and access
to shared data. The main problem with using PThreads in C++ is that
the POSIX thread standard is not designed for object oriented
programming and one constantly slams into limitations such as being
unable to specifically thread objects which hamper design
decisions.
There are a few alternatives to using PThreads if standards
compliance is not important.
RogueWave Software has what is probably the most popular C++
threads library called
Threads.h++ 2.0. There are also The Adaptive Communication
Environment (
ACE) and
ZThread libraries which both contain a number of wrappers to
both POSIX and Win32 threads and enable Object Oriented thread
programming in C++. Some people have been known to eschew threads
and instead use multiple processes and
shared memory to achieve their goals.
One of the greatest boons of Java is that it provides a
concurrency API which uses the Monitor model. In the monitor model,
access to certain shared data is done only through certain
synchronized functions and the object is locked while a
synchronized function is executing which effectively regulates
access to shared data. In the Monitor model there is no need for
mutexes or condition variables, all that is needed are wait(),
signal() and variations thereof.
Hopefully a thread library will be added that uses the Monitor
model or even the more user-friendly Serializer model which differs
from the Monitor model in that signalling of threads is done
automatically and shared resources can be accessed from outside the
Serializer even though concurrency is maintained.
Reflection
Bjarne: I'd like to see something like that supported
through a library defining the interface to extended type
information.
Currently both Java and C++ support Run-Time Type Identification
(RTTI),
which is a mechanism that allows one to determine the specific
class of an Object and is rather useful when dealing with
collections of various derived classes that are referenced via a
pointer to a single base class.
Many feel that RTTI is an essential part of Object Oriented
Programming, Doug
Lea's Usenix paper on RTTI lists several situations where
RTTI provides the best solution to certain problems. On the other
hand there is also a certain camp of OO purists who argue that
using RTTI is usually a sign of bad object oriented design, Scott Meyers is noted as having
stated "Anytime you find yourself writing code of the form
'If the object is of type T1 do something and if it is of type
T2 do something else,' slap yourself". Meyers is refering
to the fact that with the judicious use of polymorphism and
encapsulation, the type of an object shouldn't be an issue
because each object can perform operations on itself based on an
interface specified in a base class. Personally I believe that in
the general case Meyers is right but there are certain situations
(especially when modules are being written by different developers
or interaction is done across different modules) where RTTI is
preferrable to using proper OO.
Reflection is the logical next step to RTTI. Reflection enables
one to discover the fields, methods and constructors of a class at
runtime and manipulate them in various ways including invoking
methods dynamically at runtime and creating new instances of these
unknown objects. Reflection is primarily useful for developers who
create tools such as debuggers, class browsers, interpreters, and a
host of others that need to be able to extract information on
arbitrary objects and execute code within these objects at runtime.
I for one would like to see Reflection added to the C++ standard
because it will make writing class browsers in various IDEs a whole
lot easier.
Persistence
Bjarne: I'd like to see some support in the Standard
Library, probably in connection with the extended type information,
but I don't currently have any concrete suggestions.
Object Persistence also known as
Serialization is the ability to read and write objects via a
stream such as a file or network socket. Object Persistence is
useful in situations where the state of an object must be retained
across invokations of a program. Usually in such cases simply
storing data in a flat file is insufficient yet using a Database
Management System (
DBMS) is overkill.
There are many subtleties that make a creating an object
persistence library a non-trivial problem. Chief of which is the
fact that a reflection library or other similar mechanism is needed
to be able to dynamically obtain all the fields in a class and load
or write them to or from a stream. Secondly, obtaining the fields
in an object and persisting them to disk is a relatively easy task
when the fields are made up of simple types (int, float, char, etc)
but is problematic once the fields are actually objects which may
also contain objects, ad infitum. Finally an object persistence
format needs to be designed, in this regard I am torn between
proposing the use of XML so as to create a human readable,
extensible and easily validated format and a binary format to
reduce bloat and increase speed of reads & writes.
Although object persistence is interesting I'm not sure it is
something that needs to be explicitly addressed by being in the
standard but instead should be allowed to be solved by C++
developers as they see fit.
Hash tables
Bjarne: Of course, some variant of the popular hash_map will be
included.
This is a no-brainer. The current C++ standard has a sorted
hashtable declared in <map>
but does not have a
facility for developers who want a hash table without the overhead
of sorting. SGI's hash_map is
commonly used and is expected to make into the standard at the
soonest opportunity.
Constraints for template arguments
Bjarne: This can be simply, generally, and elegantly expressed
in C++ as is.
Templates are a C++ language facility that enable generic
programming via
parameteric polymorphism. The principal idea behind generic
programming is that many functions and procedures can be abstracted
away from the particular data structures on which they operate and
thus can operate on any type.
In practice, the fact that templates can work on any type of
object can lead to unforeseen and hard to detect errors in a
program. It turns out that although most people like the fact that
template functions can work on many types without the data having
to be related via inhertance (unlike Java), there is a clamor for a
way to specialize these functions so that they only accept or deny
a certain range of types.
The most common practice for constraining template arguments is to
have a constraints()
function that tries to assign an
object of the template argument class to a specified base
class's pointer. If the compilation fails then the template
argument did not meet the requirements. Of course, if you are going
to do this you might as well forego using templates and simply use
pointers to the base class and polymorphism thus avoiding the
cryptic compiler error messgaes usually associated with using
templates as well as code bloat associated with templates. Bjarne
Stroustrup has proposed adding constraints()
to the
standard. Here are links to code that shows how to use
template argument constraints and
constrain template arguments regarding built in types.
It should be noted that although Java™ currently doesn't
support generic programming, there are extensions of Java that do
such as
Pizza and
GenericJava. Also a
proposal to add generic types to Java has been submitted to the
Java Community Process and it seems generic programming has
been is scheduled to be added to the Java standard soon, some
people expect it to make it into version 1.4
Assertions
Bjarne: Many of the most useful assertions [a means of code
verification and error handling] can be expressed as templates.
Some such should be added to the Standard Library.
Assertions are a useful debugging technique where a predicate is
evaluated and if false causes the program to terminate while
printing the location of the failed assertion and the condition
that caused it to fail. Assertions are usually used during
development and removed from the code before the software actually
ships.
Errors which the programmer never expects to happen (E.g. age <
0) are prime candidates for using assertions. Typical locations for
assertions include; dealing with internal invariants within a
function which are usually dealt with via nested if statements
where the last else is a catchall that handles "can't
happen" values, dealing with control-flow invariants such as
the when the default
case in a switch
statement should never be reached , handling function preconditions
(E.g. argv != NULL), handling function post conditions (E.g.
x_squared = x * x then assert x_squared >= 0 ) or simply
verifying that the state of class is valid (E.g.
AVLTree.isbalanced()
).
Currently the only way to use assertions in a portable manner in
C++ is to use the ANSI C assert
MACRO located in assert.h.
There is also the useful
static_assert library available at the BOOST site which is
probably what the assertion library that will be proposed to the
standards commitee will be based on. Assertions are quite useful
for debugging and there should be little difficulty in adding a
more powerful version of assert to the Standard Library.
Regular expression matching
Bjarne: I'd like to see a pattern-matching library in the
standard.
Regular
Expressions are a powerful method of describing text patterns
and are the major reason that
Perl is now the hacker's language of choice for creating
programs that search or process text. Until quite recently C++
programmers had to use the C library functions regcmp and regex
located in
libgen.h on *nix or the
RegExp Object via COM on Windows if they wanted to do any
sophisticated text processing with regular expression. With the
advent of Dr. John Maddock's Regex++ this
is no longer the case.
As for actually adding regexes to the standard, I think this is a
case of unnecessarily bloating the standard. Java has done fine
without having regexes in the standard and there are a slew of Java
regex libraries including
OROMatcher, pat, and GNU Regexp.
Garbage collection
Bjarne: I'd like to see the C++ standard explicitly
acknowledge that it is an acceptable implementation technique for
C++, specifying that "concealed pointers" can be ignored
and what happens to destructors for collected garbage. (See section
C.4.1 of The C++ Programming Language for details.)
I have covered this in a
previous article on Garbage Collection and C++ and will thus
simply provide an pared down version of that article with minor
modifications:
Hans
Boehm's site on garbage collection has a well written page
that dissects the advantages and
disadvantages of garbage collection in C++.
Basically the advantages of Garbage Collection are:
- 30 to 40 percent faster development time.
- Less restrictive interfaces and more reusable code.
- Easier implementation of sophisticated data structures.
- Eliminates some premature deallocation errors.
- Uses equivalent or less CPU-time than a program that uses
explicit memory deallocation.
While the disadvantages are
- More interaction with the hard disk (virtual memory/paging)
due to examining all pointers in the application.
- May not work if programmers use various tricks and hacks
while coding (e.g. casting pointers to ints and back)
- May utilize more space than a program that uses explicit
memory deallocation.
- Unpredictable nature of collector runs may create unexpected
latency and time lags in the system.
Reference
counting although popular is not the only garbage collection
algorithm and in fact is considered by language purists as
unsatisfactory since it can't handle circular links. There are
many more algorithms including
Mark-Sweep Garbage Collection,
Mark-Compact Garbage Collection, Copying
Garbage Collection,
Generational Garbage Collection,and
Incremental and
Concurrent Garbage Collection. Han's Boehm's site
discusses mark-sweep garbage collection and not reference counting.
This doesn't mean that Mark-Sweep doesn't have its problems
(making two passes across memory and the space taken up by the
marks is expensive) but these may be remedied by using Generational
garbage collection.
Bjarne Stroustrup is very keen on having GC added to C++ but also
wants to make sure that the principle of "Only pay for it if
you use it" which has been the hallmark of C++ for years is
preserved. In my opinion, the advantages of using garbage
collection in C++ greatly outweigh any disadvantages.
GUI
Bjarne: It would be nice to have a standard GUI framework, but
I don't see how that could be politically feasible.
Not worth discussing because it isn't going to happen for a
variety of reasons:
- The committee will never agree on a UI design model (MVC or
UI delegate which will it be?)
- Will require too much work for library writers.
- More mature native toolkits will always be ahead of the
game.
Platform-independent system facilities
Bjarne: I'd like to see the Standard Library provide a
broader range of standard interfaces to common system resources
(where available), such as directories and sockets.
Again I must rhapsodize over Java and the way that its
Socket classes abstract away completely from the native socket
calls. I am all for creating more standard interfaces to system
calls beyond file I/O. Then one doesn't need to rewrite code
when moving a program from *nix to Windows simply because it opens
a socket or reads from a directory.
Conclusion
Wow, that was longer than I expected. I've decided to skip
describing the libraries I'd like to see added (I would like an
interface keyword) and just go straight to the question.
© 2001 Dare Obasanjo