Wednesday, May 5, 2010

Destructor

Last month we saw, among others, how we can give a struct well defined values by using constructors, and how C++ exceptions aid in error handling. This month we'll look at classes, a more careful study of object lifetime, especially in the light of exceptions. The stack example from last month will be improved a fair bit too.
A class
The class is the C++ construct for encapsulation. Encapsulation means publishing an interface through which you make things happen, and hiding the implementation and data necessary to do the job. A class is used to hide data, and publish operations on the data, at the same time. Let's look at the "Range" example from last month, but this time make it a class. The only operation that we allowed on the range last month was that of construction, and we left the data visible for anyone to use or abuse. What operations do we want to allow for a Range class? I decide that 4 operations are desirable:
Construction (same as last month.)
find lower bound.
find upper bound.
ask if a value is within the range. The second thing to ask when wishing for a function is (the first thing being what it's supposed to do) is in what ways things can go wrong when calling them, and what to do when that happens. For the questions, I don't see how anything can go wrong, so it's easy. We promise that the functions will not throw C++ exceptions by writing an empty exception specifier.
I'll explain this class by simply writing the public interface of it:
struct BoundsError {};
class Range
{
public:
Range(int upper_bound = 0, int lower_bound = 0)
throw (BoundsError);
// Precondition: upper_bound >= lower_bound
// Postconditions:
// lower == upper_bound
// upper == upper_bound
int lowerBound() throw ();
int upperBound() throw ();
int includes(int aValue) throw ();
private:
// implementation details.
};
This means that a class named "Range" is declared to have a constructor, behaving exactly like the constructor for the "Range" struct from last month, and three member functions (also often called methods,) called "lowerBound", "upperBound" and "includes". The keyword "public," on the fourth line from the top, tells that the constructor and the three member functions are reachable by anyone using instances of the Range class. The keyword "private" on the 3rd line from the bottom, says that whatever comes after is a secret to anyone but the "Range" class itself. We'll soon see more of that, but first an example (ignoring error handling) of how to use the "Range" class:
int main(void)
{
Range r(5);
cout << "r is a range from " <<>> i;
if (i == 0)
break;
cout << upper_bound =" 0," lower_bound =" 0)">= lower_bound
// Postconditions:
// lower == upper_bound
// upper == upper_bound
int lowerBound() throw ();
int upperBound() throw ();
int includes(int aValue) throw ();
private:
int lower;
int upper;
};
Range::Range(int upper_bound, int lower_bound)
throw (BoundsError)
: lower(lower_bound), /***/
upper(upper_bound) /***/
{
// Preconditions.
if (upper_bound <>= lower && aValue <= upper; /***/ } First, you see that the constructor is identical to that of the struct from last month. This is no coincidence. It does the same thing and constructors are constructors. You also see that "lowerBound", "upperBound" and "includes", look just like normal functions, except for the "Range::" thing. It's the "Range::" that ties the function to the class called Range, just like it is for the constructor. The lines marked /***/ are a bit special. They make use of the member variables "lower_bound" and "upper_bound." How does this work? To begin with, the member functions are tied to instances of the class, you cannot call any of these member functions without having an instance to call them on, and the member functions uses the member variables of that instance. Say for example we use two Range instances, like this: Range r1(5,2); Range r2(20,10); Then r1.lowerBound() is 2, r1.upperBound() is 5, r2.lowerBound() is 10 and r2.upperBound() is 20. So how come the member functions are allowed to use the member data, when it's declared private? Private, in C++, means secret for anyone except whatever belongs to the class itself. In this case, it means it's secret to anyone using the class, but the member functions belong to the class, so they can use it. So, where is the advantage of doing this, compared to the struct from last month? Hiding data is always a good thing. For example, if we, for whatever reason, find out that it's cleverer to represent ranges as the lower bound, plus the number of valid values between the lower bound and upper bound, we can do this, without anyone knowing or suffering from it. All we do is to change the private section of the class to: private: int lower_bound; int nrOfValues; And the implementation of the constructor to: Range::Range(int upper_bound, int lower_bound) throw (BoundsError) : lower(lower_bound), /***/ nrOfValues(upper_bound-lower_bound) /***/ ... And finally the implementations of "upperBound" and "includes" to: int Range::upperBound() throw () { return lower+nrOfValues; } int Range::includes(int aValue) throw () { return aValue >= lower && aValue <= lower+nrOfValues; } We also have another, and usually more important, benefit; a promise of integrity. Already with the struct, there was a promise that the member variable "upper" would have a value greater than or equal to that of the member variable "lower". How much was that promise worth with the struct? This much: Range r(5, 2); r.lower = 25; // Oops! Now r.lower > r.upper!!!
Try this with the class. It won't work. The only one allowed to make changes to the member variables are functions belonging to the class, and those we can control.
Destructor
Just as you can control construction of an object by writing constructors, you can control destruction by writing a destructor. A destructor is executed when an instance of an object dies, either by going out of scope, or when removed from the heap with the delete operator. A destructor has the same name as the class, but prepended with the ~ character, and it never accepts any parameters. We can use this to write a simple trace class, that helps us find out the life time of objects.
#include
class Tracer
{
public:
Tracer(const char* tracestring = "too lazy, eh?");
~Tracer(); // destructor
private:
const char* string;
};
Tracer::Tracer(const char* tracestring)
: string(tracestring)
{
cout << "+ " << string << endl;
}
Tracer::~Tracer()
{
cout << "- " << string << endl;
}
What this simple class does is to write its own parameter string, prepended with a "+" character, when constructed, and the same string, prepended by a "-" character, when destroyed. Let's toy with it!
int main(void)
{
Tracer t1("t1");
Tracer t2("t2");
Tracer t3;
for (unsigned u = 0; u < 3; ++u)
{
Tracer inLoop("inLoop");
}
Tracer* tp = 0;
{
Tracer t1("Local t1");
Tracer* t2 = new Tracer("leaky");
tp = new Tracer("on heap");
}
delete tp;
return 0;
}
When run, I get this behaviour (and so should you, unless you have a buggy compiler):
[d:\cppintro\lesson2]tracer.exe
+ t1
+ t2
+ too lazy, eh?
+ inLoop
- inLoop
+ inLoop
- inLoop
+ inLoop
- inLoop
+ Local t1
+ leaky
+ on heap
- Local t1
- on heap
- too lazy, eh?
- t2
- t1
What conclusions can be drawn from this? With one exception, the object on heap, objects are destroyed in the reversed order of creation (have a careful look, it's true, and it's always true.) We also see that the object, instantiated with the string "leaky" is never destroyed.
What happens with classes containing classes then? Must be tried, right?
class SuperTracer
{
public:
SuperTracer(const char* tracestring);
~SuperTracer();
private:
Tracer t;
};
SuperTracer::SuperTracer(const char* tracestring)
: t(tracestring)
{
cout << "SuperTracer(" << tracestring << ")" << endl;
}
SuperTracer::~SuperTracer()
{
cout << "~SuperTracer" << endl;
}
int main(void)
{
SuperTracer t1("t1");
SuperTracer t2("t2");
return 0;
}
What's your guess?
[d:\cppintro\lesson2]stracer.exe
+ t1
SuperTracer(t1)
+ t2
SuperTracer(t2)
~SuperTracer
- t2
~SuperTracer
- t1
This means that the contained object ("Tracer") within "SuperTracer" is constructed before the "SuperTracer" object itself is. This is perhaps not very surprising, looking at how the constructor is written, with a call to the "Tracer" class constructor in the initialiser list. Perhaps a bit surprising is the fact that the "SuperTracer" objects destructor is called before that of the contained "Tracer", but there is a good reason for this. Superficially, the reason might appear to be that of symmetry, destruction always in the reversed order of construction, but it's a bit deeper than that. It's not unlikely that the member data is useful in some way to the destructor, and what if the member data is destroyed when the destructor starts running? At best a destructor would then be totally worthless, but more likely, we'd have serious problems properly destroying our no longer needed objects.
So, the curious wonders, what about C++ exceptions? Now here we get into an interesting subject indeed! Let's look at two alternatives, one where the constructor of "SuperTracer" throws, and one where the destructor throws. We'll control this by a second parameter, zero for throwing in the constructor, and non-zero for throwing in the destructor. Here's the new "SuperTracer" along with an interesting "main" function.
class SuperTracer
{
public:
SuperTracer(int i, const char* tracestring)
throw (const char*);
~SuperTracer() throw (const char*);
private:
Tracer t;
int destructorThrow;
};
SuperTracer::SuperTracer(int i, const char* tracestring)
throw (const char*)
: t(tracestring),
destructorThrow(i)
{
cout << "SuperTracer(" << tracestring << ")" << endl;
if (!destructorThrow)
throw (const char*)"SuperTracer::SuperTracer";
}
SuperTracer::~SuperTracer() throw (const char*)
{
cout << "~SuperTracer" << endl;
if (destructorThrow)
throw (const char*)"SuperTracer::~SuperTracer";
}
int main(void)
{
try {
SuperTracer t1(0, "throw in constructor");
}
catch (const char* p)
{
cout << "Caught " << p << endl;
}
try {
SuperTracer t1(1, "throw in destructor");
}
catch (const char* p)
{
cout << "Caught " << p << endl;
}
try {
cout << "Let the fun begin" << endl;
SuperTracer t1(1, "throw in destructor");
SuperTracer t2(0, "throw in constructor");
}
catch (const char* p)
{
cout << "Caught " << p << endl;
}
return 0;
}
Here we can study different bugs in different compilers. Both GCC and VisualAge C++ have theirs. What bugs does your compiler have? Here's the result when running with GCC. Comments about the bug found are below the result:
[d:\cppintro\lesson2]s2tracer.exe
+ throw in constructor
SuperTracer(throw in constructor)
- throw in constructor
Caught SuperTracer::SuperTracer
+ throw in destructor
SuperTracer(throw in destructor)
~SuperTracer
Caught SuperTracer::~SuperTracer
Let the fun begin
+ throw in destructor
SuperTracer(throw in destructor)
+ throw in constructor
SuperTracer(throw in constructor)
- throw in constructor
~SuperTracer
Abnormal program termination
core dumped
The first 4 lines tell that when an exception is thrown in a constructor, the destructor for all so far constructed member variables are destructed, through a call to their destructor, but the destructor for the object itself is never run. Why? Well, how do you destroy something that was never constructed? The next four lines reveal the GCC bug. As can be seen, the exception is thrown in the destructor, however, the member Tracer variable is not destroyed as it should be (VisualAge C++ handles this one correctly.) Next we see the interesting case. What happens here is that an object is created that throws on destruction, and then an object is created that throws at once. This means that the first object will be destroyed because an exception is in the air, and when destroyed it will throw another one. The correct result can be seen in the execution above. Program execution must stop, at once, and this is done by a call to the function "terminate". The bug in VisualAge C++ is that it destroys the contained Tracer object before calling terminate.
What's the lesson learned from this? To begin with that it's difficult to find a compiler that correctly handles exceptions thrown in destructors. More important, though, think *very* carefully, before allowing a destructor to throw exceptions. After all, if you throw an exception because an exception is in the air, your program will terminate very quickly. If you have a bleeding edge compiler, you can control this by calling the function "uncaught_exception()" (which tells if an exception is in the air,) and from there decide what to do, but think carefully about the consequences

class

The class is the C++ construct for encapsulation. Encapsulation means publishing an interface through which you make things happen, and hiding the implementation and data necessary to do the job. A class is used to hide data, and publish operations on the data, at the same time. Let's look at the "Range" example from last month, but this time make it a class. The only operation that we allowed on the range last month was that of construction, and we left the data visible for anyone to use or abuse. What operations do we want to allow for a Range class? I decide that 4 operations are desirable:
Construction (same as last month.)
find lower bound.
find upper bound.
ask if a value is within the range. The second thing to ask when wishing for a function is (the first thing being what it's supposed to do) is in what ways things can go wrong when calling them, and what to do when that happens. For the questions, I don't see how anything can go wrong, so it's easy. We promise that the functions will not throw C++ exceptions by writing an empty exception specifier.
I'll explain this class by simply writing the public interface of it:
struct BoundsError {};
class Range
{
public:
Range(int upper_bound = 0, int lower_bound = 0)
throw (BoundsError);
// Precondition: upper_bound >= lower_bound
// Postconditions:
// lower == upper_bound
// upper == upper_bound
int lowerBound() throw ();
int upperBound() throw ();
int includes(int aValue) throw ();
private:
// implementation details.
};
This means that a class named "Range" is declared to have a constructor, behaving exactly like the constructor for the "Range" struct from last month, and three member functions (also often called methods,) called "lowerBound", "upperBound" and "includes". The keyword "public," on the fourth line from the top, tells that the constructor and the three member functions are reachable by anyone using instances of the Range class. The keyword "private" on the 3rd line from the bottom, says that whatever comes after is a secret to anyone but the "Range" class itself. We'll soon see more of that, but first an example (ignoring error handling) of how to use the "Range" class:
int main(void)
{
Range r(5);
cout << "r is a range from " << r.lowerBound() << " to "
<< r.upperBound() << endl;
int i;
for (;;)
{
cout << "Enter a value (0 to stop) :";
cin >> i;
if (i == 0)
break;
cout << endl << i << " is " << "with"
<< (r.includes(i) ? "in" : "out") << " the range"
<< endl;
}
return 0;
}
A test drive might look like this:
[d:\cppintro\lesson2]rexample.exe
r is a range from 0 to 5
Enter a value (0 to stop) :5
5 is within the range
Enter a value (0 to stop) :7
7 is without the range
Enter a value (0 to stop) :3
3 is within the range
Enter a value (0 to stop) :2
2 is within the range
Enter a value (0 to stop) :1
1 is within the range
Enter a value (0 to stop) :0
Does this seem understandable? The member functions "lowerBound", "upperBound" and "includes" are, and behave just like, functions, that in some way are tied to instances of the class Range. You refer to them, just like you do member variables in a struct, but since they're functions, you call them (by using the, in C++ lingo named, function call operator "()".)
Now to look at the magic making this happen by filling in the private part, and writing the implementation:
struct BoundsError {};
class Range
{
public:
Range(int upper_bound = 0, int lower_bound = 0)
throw (BoundsError);
// Precondition: upper_bound >= lower_bound
// Postconditions:
// lower == upper_bound
// upper == upper_bound
int lowerBound() throw ();
int upperBound() throw ();
int includes(int aValue) throw ();
private:
int lower;
int upper;
};
Range::Range(int upper_bound, int lower_bound)
throw (BoundsError)
: lower(lower_bound), /***/
upper(upper_bound) /***/
{
// Preconditions.
if (upper_bound < lower_bound) throw BoundsError();
// Postconditions.
if (lower != lower_bound) throw BoundsError();
if (upper != upper_bound) throw BoundsError();
}
int Range::lowerBound() throw ()
{
return lower; /***/
}
int Range::upperBound() throw ()
{
return upper; /***/
}
int Range::includes(int aValue) throw ()
{
return aValue >= lower && aValue <= upper; /***/
}
First, you see that the constructor is identical to that of the struct from last month. This is no coincidence. It does the same thing and constructors are constructors. You also see that "lowerBound", "upperBound" and "includes", look just like normal functions, except for the "Range::" thing. It's the "Range::" that ties the function to the class called Range, just like it is for the constructor.
The lines marked /***/ are a bit special. They make use of the member variables "lower_bound" and "upper_bound." How does this work? To begin with, the member functions are tied to instances of the class, you cannot call any of these member functions without having an instance to call them on, and the member functions uses the member variables of that instance. Say for example we use two Range instances, like this:
Range r1(5,2);
Range r2(20,10);
Then r1.lowerBound() is 2, r1.upperBound() is 5, r2.lowerBound() is 10 and r2.upperBound() is 20.
So how come the member functions are allowed to use the member data, when it's declared private? Private, in C++, means secret for anyone except whatever belongs to the class itself. In this case, it means it's secret to anyone using the class, but the member functions belong to the class, so they can use it.
So, where is the advantage of doing this, compared to the struct from last month? Hiding data is always a good thing. For example, if we, for whatever reason, find out that it's cleverer to represent ranges as the lower bound, plus the number of valid values between the lower bound and upper bound, we can do this, without anyone knowing or suffering from it. All we do is to change the private section of the class to:
private:
int lower_bound;
int nrOfValues;
And the implementation of the constructor to:
Range::Range(int upper_bound, int lower_bound)
throw (BoundsError)
: lower(lower_bound), /***/
nrOfValues(upper_bound-lower_bound) /***/
...
And finally the implementations of "upperBound" and "includes" to:
int Range::upperBound() throw ()
{
return lower+nrOfValues;
}
int Range::includes(int aValue) throw ()
{
return aValue >= lower && aValue <= lower+nrOfValues;
}
We also have another, and usually more important, benefit; a promise of integrity. Already with the struct, there was a promise that the member variable "upper" would have a value greater than or equal to that of the member variable "lower". How much was that promise worth with the struct? This much:
Range r(5, 2);
r.lower = 25; // Oops! Now r.lower > r.upper!!!
Try this with the class. It won't work. The only one allowed to make changes to the member variables are functions belonging to the class, and those we can control.

Building a stack

OK, now we have enough small pieces of C++ as a better C, to do something real. Let's build a stack of integers, using constructors and dynamic memory allocation.
#include
// declare our stack element with constructor.
struct StackElement {
StackElement(int aValue, StackElement* pTail);
// aValue and pTail can have any value!
int value;
StackElement* pNext;
// StackElement is not yet completed, but
// we can have pointers to incomplete types.
};
StackElement::StackElement(int aValue,
StackElement* pTail)
: value(aValue),
pNext(pTail)
{
// nothing to do in here as it's all taken care of in
// the initialiser list.
}
// Struct thrown if preconditions are violated.
struct Precond
{
Precond(char* p);
char* msg;
};
Precond::Precond(char* p) : msg(p) {}
// Use function overloading to print
// stack elements, strings and integers.
void print(StackElement* pElem) throw (Precond)
{
if (pElem == 0) // **1**
throw "0 pointer sent to print(StackElement*)";
cout <<>value << endl;
}
void print(const char* string = "") throw (Precond)
{ // just a new line if default parameter value used.
if string == 0)
throw "0 pointer sent to print(const char*)";
cout << string << endl;
}
void print(int i)
{
cout << i << endl;
}
int main(void)
{
try {
print("Simple stack example");
print(); // just a new line
StackElement* pStackTop = 0;
print("Phase one, pushing objects on the stack");
print();
{
for (unsigned count = 0; count < 20; ++count)
{
// Create new element first on the stack by
// setting the "next" pointer of the created
// element to the current top of the stack.
pStackTop = new StackElement(count, pStackTop);
if (pStackTop == 0) //**2**
{
cout << "Memory exhausted. Won't add more"
<< endl;
break;
}
print(count);
}
}
print("Phase two, popping objects from the stack");
print();
while (pStackTop != 0)
{
print(pStackTop);
StackElement* pOldTop = pStackTop;
pStackTop = pStackTop->pNext;
delete pOldTop;
}
return 0;
}
catch (Precond& p) {
cout << "Precondition violation: " << p.msg << endl;
return 1;
}
catch (...) {
cout << "Unknown error: Probably out of memory"
<< endl;
return 2;
}
}
At //**1** you see something unexpected. The pointer is initialised to 0, and not "NULL". There is no such thing as "NULL" in C++, so the number 0 is what we have, and use. The reason is, oddly as it may seem, that C++ is much pickier than C about type correctness. Typically in C, "NULL" is defined as "(void*)0." C++ never allows implicit conversion of a "void*" to any other pointer type, so this definition is not good enough. The integer 0, however, is implicitly convertible to any pointer type.
//**2** is an unpleasant little thing. Depending on how new your compiler is, the "new" operator will either return 0 (for old compilers) or throw an instance of "bad_alloc" (for new compilers) if the memory is exhausted.
So, what do you think of this small stack example? Is it good? Better than the C alternative with several different function names to remember, being careful to allocate objects of the correct size and initialise the struct members correctly? I think it is better. We don't have to overload our memory with many names, we don't have to worry about the size of the object to allocate, and we don't need to cast an incompatible type either (compare with "malloc") and initialisation of the member variables is localised to something that belongs to the struct; its constructor. We check our preconditions (no post conditions are used here) with C++ exceptions.

Dynamic memory allocation

The way dynamic memory allocation and deallocation is done in C++ differs from C in a way that on the surface may seem rather unimportant, but in fact the difference is enormous and very valuable.
Here's a small demo of dynamic memory allocation and deallocation (ignoring error handling).
int main(void)
{
int* pint = new int; // 1
*pint = 1;
int* pint2 = new int(2); // 2
Range* pr = new Range(1,10); // 3
// Range from the previous example.
delete pint; // 4
delete pint2; // 4
delete pr; // 4
return 0;
}
Dynamic memory is allocated with the "new" operator. It's a built in operator that guarantees that a large enough block of memory is allocated and initialised, (compare with "malloc" where it's your job to tell how large the block should be and what you get is raw memory) and returns a pointer of the requested type. At //1 you see an "int" being allocated on the heap, and the pointer variable "pint" being initialised with its value. At //2 and //3 you see another interesting thing about the "new" operator; it understands constructors. At //2 an "int" is allocated on heap, and the "int" is initialised with the value 2. At //3 a "Range" struct is allocated on the heap, and initialised by a call to the constructor defined in the previous example. As you can see at //4 dynamic memory is deallocated with the built in "delete" operator.

struct

You've seen one subtle difference, between structs in C and structs in C++. There are quite a few very visible differences too. Let's have a look at one of them; constructors. A constructor is a well defined way of initialising an object, in this case an instance of a struct. When you create an instance of a struct in C, you define your variable, and then give the components their values. Not so in C++. You declare your struct with constructors, and define your struct instance with well known initial values. Here's how you can do it.
struct BoundsError {};
struct Range
{
Range(int upper_bound = 0, int lower_bound = 0)
throw (BoundsError);
// Precondition: upper_bound >= lower_bound
// Postconditions:
// lower == upper_bound
// upper == upper_bound
int lower;
int upper;
};
"BoundsError" is a struct with no data, and it's used entirely for error checking. For this example, no data is needed for it. The struct "Range" is known to have two components "lower" and "upper" (usually referred to as member variables) and a constructor. You recognise a constructor as something that looks like a function prototype, declared inside the struct, and with the same name as the struct. Since C++ allows function overloading on parameter types, it is possible to specify multiple different constructors. Here you see yet something new; default parameter values. C++ allows functions to have default parameter values, and these values are used if you don't provide any when calling the function. In this case, it appears as if three constructors were called, one with no parameters, initialising both "upper" and "lower" to 0, one with one parameter, initialising "lower" to 0, and one with two parameters. The restriction on default parameter values is that you can only add them from the right on the parameter list. Here's a few examples for you:
void print(const char*p = "default value")
{
cout << p << endl;
}
void print(int a, int b=3)
{
cout << a << "," << b << endl;
}
void print(unsigned a=0, int b) // ERROR!!! Default
{
cout << a << "," << b << endl; // parameters
from the } // right only.
print(); // prints "default value");
print("something"); // calls print(const char*);
print(5); // prints 5,3
print(5,5); // prints 5,5
So far we have just said that the constructor exists, not what it does. Here comes that part:
Range::Range(int upper_bound, int lower_bound)
throw (BoundsError)
: lower(lower_bound), //*1
upper(upper_bound) //*1
{
// Preconditions.
if (upper_bound < lower_bound) throw BoundsError(); //*2
// Postconditions.
if (lower != lower_bound) throw BoundsError();
if (upper != upper_bound) throw BoundsError();
}
Quite a handful of new things for so few lines. Let's break them apart in three pieces:
Range::Range(int upper_bound, int lower_bound)
throw (BoundsError)
This is the constructor declarator. The first "Range" says we're dealing with the struct called "Range". The "::" is the scope operator (the same as in //**9** in the example for variable scope in the beginning in this article), means we're defining something that belongs to the struct. The rest of this line is the same as you saw in the declaration, except that the default parameter values are not listed here; they're an interface issue only. This might seem like redundant information, but it is not. If you forget "::", what you have is a function called "Range(int, int)" that returns a "Range". If you forget any of the "Range" you have a syntax error.
Now to the second piece, the one with //*1 comments.
In a constructor you can add what's called an initialiser list between the declarator and the function body. The initialiser list is a comma separated list of member variables that you give an initial value. This can of course be done in the function body as well, but if you can give the member variables a value in the initialiser list, you should. The reason is that the member variables will be initialised whether you specify it in a list it or not, and if you initialise them in the function body only, they will in fact be initialised twice. One thing that is important to remember with initialiser lists is that the order of initialisation is the order the member variables appear in the struct declaration. Do not try to change the order by reorganising the initialiser list because it will not work. [Some compilers will just rearrange the order of the list internally to be the right one and tell you they've done so; but don't rely on this -- Ed]
Last is the function body:
{
// Preconditions.
if (upper_bound < lower_bound) throw BoundsError(); //*2
// Postconditions.
if (lower != lower_bound) throw BoundsError();
if (upper != upper_bound) throw BoundsError();
}
This looks just like a normal function body. Member variables that we for some reason have been unable to initialise in the initialiser list can be taken care of here. In this case all were initialised before entering the function body, so nothing such is needed. Instead we check that the range is valid, and that the components were initialised as intended, and throw an exception if it isn't. "BoundsError()" at //*2, means a nameless instance of struct "BoundsError". Note that even if you define a constructor, for which the function body is empty, it's still needed

Programming by contract

While I haven't mentioned it, I've touched upon one of the most important ideas for improving software quality. As you may have noticed in the function prototypes, I did not only add an exception specifier, there was also a comment mentioning a precondition. That precondition is a contract between the caller of the function and the function itself. For example, the precondition for "divide" is "dividend != 0". This is a contract saying, "I'll do my work if, and only if, you promise not to give me a zero dividend." This is important, because it clarifies who is responsible for what. In this case, it means that it's the callers job to ensure that the dividend isn't 0. If the dividend is 0, the "divide" function is free to do whatever it wants. Throwing an exception, however, is a good thing to do, because it gives the caller a chance. Another very important part of programming by contract, that I have not mentioned, even briefly, is something called a post condition. While the precondition states what must be true when entering a function, the post condition states what must be true when leaving the function (unless left by throwing an exception). The functions used above do not use post conditions, which is very bad. Post conditions check if the function has done it's job properly, and if used right, also serves as a documentation for what the function does. Take for example the "divide" function. A post condition should say something about the result of it, in a way that explains what the function does. It could, for example, say:
Postconditions:
dividend*result<=divisor
dividend*(result+1)>=divisor
This states two things: It *is* a division function, not something else just happening to have that name, and the result is rounded and will stay within a promised interval.
To begin with, scrutinously using pre- and post-conditions force you to think about how your functions should behave, and that alone makes errors less likely to slip past. They make your testing much easier, since you have stated clearly what every function is supposed to do. (If you haven't stated what a function is supposed to do, how can you be sure it's doing the right thing?) Enforcing the pre- and post-conditions makes errors that do slip by anyway, easier to find.
When using "Programming by Contract", exceptions are a safety measure, pretty much like the safety belt in a car. The contract is similar to the traffic regulations. If everybody always follows the rules, and when in doubt, use common sense and behave carefully, no traffic accidents will ever happen. As we know, however, people do break the rules, both knowingly, and by mistake, and they don't always use common sense either. When accidents do happen, the safety belt can save your life, just as exceptions can. Note that this means that exceptions are *not* a control flow mechanism to be actively used by your program, just as much as the safety belt isn't (someone who makes active use of the safety belt as a control mechanism of the car would by most people be considered pretty dangerous, don't you think?) They're there to save you when things go seriously wrong.
OK, so, in the light of the above, have you found the flaw in my test program above yet? There's something in it that poorly matches what I've just mentioned above. Have a look again, you have the time I'm sure.
Found it? Where in the program do I check that I send the functions valid parameters? I don't. The whole program trust the exception handling to do the job. Bad idea. Don't do that, not for pre- and post-conditions anyway. Pre- and post-conditions are *never* to be violated, ever. It's inexcusable to violate them. That's actually what they're for, right? When having a precondition, always make sure you're conforming to it, don't trust the error handling mechanism. The error handling mechanism is there to protect you when you make mistakes, but you must always try to do your best yourself.

Error handling

Unlike C, C++ has a built in mechanism for dealing with the unexpected, and it's called exception handling (Note, if you're experienced in OS/2 programming in other languages, you might have used OS/2 exception handlers; this is not the same thing, this is a language construct, not an operating system construct.) Exceptions are a function's means of telling its caller that it cannot do what it's supposed to do. The classic C way of doing this, is using error codes as return values, but return values can easily be ignored, whereas exceptions can not. Exceptions also allow us to write pure functions, that either succeed in doing the job, thus returning a valid value, or fail and terminate through an exception. This last sentence is paramount to any kind of error handling. For a function there are only two alternatives; it succeeds with doing its job, or it does not. There's no "sort of succeeded." When a function succeeds, it returns as usual, and when it fails, it terminates through an exception. The C++ lingo for this termination is to "throw an exception." You can see this as an incredibly proud and loyal servant, that does what you tell it to, or commits suicide. When committing suicide, however, it always leaves a note telling why. In C++, the note is an instance of some kind of data, any kind of data, and being the animated type, the function "throws" the data towards its caller, not just leaves it neatly behind. Let's look at an example of exception handling, here throwing a character string:
#include
int divide(int divisor, int dividend) throw (const char*);
// Divides divisor with dividend.
// Precondition, dividend != 0
int main(void)
{
try {
int result = divide(50,2);
cout << "divide(" << 50 << ", " << 2
<< ") yields " << result << endl;
result = divide(50,0);
cout << "divide(" << 50 << ", " << 0
<< ") yields " << result << endl;
}
catch (const char* msg) {
cout << "Oops, caught: " << msg << endl;
}
return 0;
}
int divide(int divisor, int dividend) throw (const char*)
{
if (dividend == 0)
throw (const char*)"Division by zero attempted";
// Here we don't have to worry about dividend being zero
return divisor/dividend;
}
This mini program shows the mechanics of C++ exception handling. The function prototype for "divide" at //**1 adds an exception specification "throw (const char*)". Exceptions are typed, and a function may throw exceptions of different types. The exception specification is a comma separated list, showing what types of exceptions the function may throw. In this case, the only thing this function can throw is character strings, specified by "const char*".
Any attempt to do something, when you want to find out if it succeeded or not, must be enclosed in a "try/catch" block. At //**2 we see the "try" block. A try block is *always* followed by one or several "catch" blocks (//**3). If something inside the "try" block (in this case, a call to "divide") throws an exception, execution immediately leaves the "try" block and enters the "catch" block with the same type as the exception thrown. Here, when "divide" is called with a dividend of 0, a "const char*" is thrown, the "try" block is left and the "catch" block entered. If no "catch" block matches the type of exception thrown, execution leaves the function, and a matching "catch" block (if any) of its caller is entered. If no matching "catch" block is found when "main" is reached, the program terminates.
When a function finds that it cannot do whatever it is asked to do, it throws an exception, as shown at //**4. If the exception thrown does not match the exception specification of the function, the program terminates.
Compile and test the program:
[d:\cppintro\lesson1]gcc -fhandle-exceptions excdemo.cpp -lstdcpp
[d:\cppintro\lesson1]excdemo.exe
divide(50, 2) yields 25
Oops, caught: Division by zero attempted
(If you're using VisualAge C++, you don't need any special compiler flags to enable exception handling, and for Watcom C++, use /xs.)
This exception handling can be improved, though. As I mentioned above, exceptions are typed. This is a fact that can, and should, be exploited. In the program above, we have little information, other than that something's gone wrong, and that we can see exactly what by reading the string. The program, however, cannot do much about the error other than printing a message, and the message itself is not very informative either, since we don't know where the error originated from anyway. If we instead create a struct type, holding more information, we can do much better. If we create different struct types for different kinds of errors, we can catch the different types (separate "catch" blocks) and take corresponding action. Here's an attempt at improving the situation:
#include
// Simple math program, demonstrating different kinds of
// exception types.
// First 3 math error structs, overflow, underflow and
// zero_divide.
struct overflow // Unlike the case for C, in C++
{ // you don't need to "typedef" the
const char* msg; // struct to be able to access it
const char* function; // without the "struct" keyword.
const char* file;
unsigned long line;
};
struct underflow
{
const char* msg;
const char* function;
const char* file;
unsigned long line;
};
struct zero_divide
{
const char* msg;
const char* function;
const char* file;
unsigned long line;
};
unsigned add(unsigned a, unsigned b) throw (overflow);
// precondition a+b representable as unsigned.
unsigned sub(unsigned a, unsigned b) throw (underflow);
// precondition a-b representable as unsigned.
unsigned divide(unsigned a, unsigned b)
throw (zero_divide);
// Precondition b != 0
unsigned mul(unsigned a, unsigned b) throw (overflow);
// Precondition a*b representable as unsigned.
unsigned calc(unsigned a, unsigned b, unsigned c)
throw (overflow, underflow, zero_divide);
// Calculates a*(b-c)/b with functions above.
void printError(const char* func,
const char* file,
unsigned line,
const char* msg);
// Print an error message.
int main(void)
{
cout << "Will calculate (a*(b-c))/b" << endl;
int v1;
cout << "Enter a:";
cin >> v1;
int v2;
cout << "Enter b:";
cin >> v2;
cout << "Enter c:";
int v3;
cin >> v3;
for (;;)
{
try {
int result = calc(v1, v2, v3);
cout << "The result is " << result << endl;
return 0;
}
catch (zero_divide& z)// zero_divide& z means
{ // "z is a reference to a
// zero_divide". A reference, in
// this case, has the semantics
// of a pointer, that is, it
// refers to a variable located
// somewhere else, but
// syntactically it's like we
// were dealing with a local
// variable. More about
// references later.
cout << "Division by zero in:" << endl;
printError(z.function,
z.file,
z.line,
z.msg);
cout << endl;
cout << "Enter a new value for b:";
cin >> v2;
}
catch (underflow& u) //** reference
{
cout << "Underflow in:" << endl;
printError(u.function,
u.file,
u.line,
u.msg);
cout << endl;
cout << "Enter a new value for c:";
cin >> v3;
}
catch (overflow& o) //** reference
{
cout << "Overflow in:" << endl;
printError(o.function,
o.file,
o.line,
o.msg);
cout << endl;
cout << "Enter a new value for a:";
cin >> v1;
}
catch (...) // The ellipsis (...) matches any type.
{ // The disadvantage is that you cannot
// find out what type it was you caught.
cout << "Severe error: Caught unknown exception"
<< endl;
return 1;
}
}
}
unsigned add(unsigned a, unsigned b) throw (overflow)
{
unsigned c = a+b;
if (c < a c < b) { // If c is less than either a or
overflow of; // b, the value "wrapped"
of.function = "add";
of.file = __FILE__; // standard macro containing the
// name of the C++ file.
of.line = __LINE__; // standard macro containing the
// line number.
of.msg = "Overflow in addition";
throw of;
}
return c;
}
unsigned sub(unsigned a, unsigned b) throw (underflow)
{
unsigned c = a-b;
if (c > a) { // If c is greater than a
// the value "wrapped"
underflow uf;
uf.function ="sub";
uf.file = __FILE__;
uf.line = __LINE__;
uf.msg = "Underflow in subtraction";
throw uf;
}
return c;
}
unsigned divide(unsigned a, unsigned b)
throw (zero_divide)
{
if (b == 0) {
zero_divide zd;
zd.function = "divide";
zd.file = __FILE__;
zd.line = __LINE__;
zd.msg = "Division by zero";
throw zd;
}
return a/b;
}
unsigned mul(unsigned a, unsigned b) throw (overflow)
{
unsigned c = a*b;
// If c is less than either a or b, and neither of
// a or b is 0, then the value "wrapped".
if (a != 0 && b != 0 && (c < a c < b))
{
overflow of;
of.function = "mul";
of.file = __FILE__;
of.line = __LINE__;
of.msg = "Overflow in mul";
throw of;
}
return c;
}
unsigned calc(unsigned a, unsigned b, unsigned c)
throw (overflow, underflow, zero_divide)
{
// Calculates a*(b-c)/b with functions above.
try {
// We can only hope...
unsigned result = divide(mul(a,sub(b,c)),b);
return result;
}
catch (zero_divide& zd) { //** reference
zd.function = "calc"; // We can alter the struct to
// allow better tractability of
// the error.
throw; // An empty "throw" means "throw the exception
// just caught. This is only legal in a catch
// block.
}
catch (underflow& uf) {
uf.function = "calc";
throw;
}
catch (overflow& of) {
of.function = "calc";
throw;
}
}
void printError(const char* func,
const char* file,
unsigned line,
const char* msg)
{
cout << " " << func << endl
<< " " << file << '(' << line << ')'
<< endl << " \"" << msg << "\"" << endl;
}
If compiled and run:
[d:\cppintro\lesson1]icc /Q excdemo2.cpp
[d:\cppintro\lesson1]excdemo2.exe
Will calculate (a*(b-c)/b
Enter a:23
Enter b:34
Enter c:45
Underflow in:
calc
excdemo2.cpp(122)
"Underflow in subtraction"
Enter a new value for c:21
The result is 8
What do you think of this? Do you think the code is messy? How would you have implemented the same functionality without exception handling? The code is a bit messy, but part of the mess will be removed as you learn more about C++, and the other part is due to handling the error situations. It's frightening how frequent lack of error handling is, but as I read in a calendar "Unpleasant facts [errors] don't cease to exist just because you chose to ignore them." Also, errors are easier to handle if you take them into account in the beginning, instead of, as I've seen far too often, add error handling afterwards. There are problems with the code above, as will be mentioned soon, but one definite advantage gained by using exceptions is that the code for error handling is reasonably separated from the parts that does it's job. This separation will become even clearer as you learn more C++.

Function overloading

C++ allows you to define several different functions with the same name, providing their parameter list differs. Here's a small example of doing so:
#include
void print(const char* string)
{
cout << "print(\"" << string << "\")" << endl;
}
void print(int i)
{
cout << "print(" << i << ")" << endl;
}
int main(void)
{
print("string printing"); // calls print(const char*)
print(5); // calls print(int)
return 0;
}
Compiling and running yields:
[d:\cppintro\lesson1]icc /Q+ funcover.cpp
[d:\cppintro\lesson1]funcover.exe
print("string printing")
print(5)
Handled right, this can severely reduce the number of names you have to remember. To do something similar in C, you'd have to give the functions different names, like "print_int" and "print_string". Of course, handled wrong, it can cause a mess. Only overload functions when they actually do the same things. In this case, printing their parameters. Had the functions been doing different things, you would soon forget which of them did what. Function overloading is powerful, and it will be used a lot throughout this course, but everything with power is dangerous, so be careful.
To differentiate between overloaded function the parameter list is often included when mentioning their name. In the example above, I'd say we have the two functions "print(int)" and "print(const char*)", and not just two functions called "print." [Compilers will generally do something similar when reporting error messages as well, so if you have used function overloading in your program look closely at which of the functions the compiler is concerned about

Loop variable scope and defining variables

Now let's add some things to this program. For example, let it print the parameters it was started with.
#include
int main(int argc, char* argv[])
{
{
for (int count = 0; count < argc; ++count)
{
cout << count << " : " << argv[count]
<< endl;
}
}
return 0;
}
You can compile and run:
[d:\cppintro\lesson1]gcc parprint.cpp -lstdcpp
[d:\cppintro\lesson1]parprint tripp trapp trull
0 : D:\TMP\PARPRINT.EXE
1 : tripp
2 : trapp
3 : trull
The result is probably not surprising, but if you know C, this does look interesting, does it not? The loop variable used for the "for" loop on is defined in the loop itself. One of the minor, yet very useful, differences from C is that you can leave defining a variable until you need it. In this case, the variable "count" is not needed until the loop, thus we don't define it until the loop. The loop is in an extra block (a { ... } pair is called a block) for compatibility between older and newer C++ compilers. For older C++ compilers, such as Visual Age C++, the variable "count" is defined from the "for" loop and until the "}" before "return". For newer C++ compilers, the variable is defined within the loop only.
Here's a "for" loop convention that's useful when you want your programs to work both with new and old C++ compilers:
If you define a loop variable, and want it valid within the loop only, enclose the loop in an extra block:
{
for (unsigned var = 0; var < max; ++var)
{
...
}
}
If you define a loop variable and want it valid after the loop body, define the variable outside the loop:
unsigned var = 0;
for (; var < max; ++var )
{
...
}
The former guarantees that the loop variable will not be defined after the last "}", be it on a new or old compiler. The latter guarantees that the loop variable *will* be defined after the loop for both new and old compilers.
Usually you don't want to use the loop variable after the loop, so the latter construction is not used frequently.
Let's have a closer look at the rules for when a variable is available and when it is not.
#include
int i = 5; // **1** global variable.
int main(int argc, char**)
{
cout << i << endl; // **2** the global i.
int i = 2; // **3**
cout << i << endl; // **4** i from prev. line
if (argc > 1)
{
cout << i << endl; // **5** prints 2.
int i = 10; // **6**
cout << i << endl; // **7** Prints 10.
}
cout << i << endl; // **8** Prints 2.
cout << ::i << endl; // **9** prints 5
return 0;
}
First we can see two new things. C++ has two kinds of comments. The C style "/*", "*/" pair, and the one line "//" comment. For the latter, everything following "//" on a line is a comment [to the end of line -- Ed].
The other new thing is the parameter list of "main". The second parameter is nameless. C++ allows you to skip the name of a parameter you are not going to use. Normally of course you do not have a parameter in the list that you're not going to use, but the "main" function is supposed to have these parameters (or void) so there's not much choice here. By not giving the parameter a name we save ourselves from a compiler warning like 'Parameter "argv" of function "main" declared but never referenced'.
So then, let's look at the variables.
At **1** a global variable called "i" is defined, and initialised with the value 5.
At **2** this global variable is printed. Yes, there is a variable called "i" in "main" but it has not yet been defined, thus it is the global one that is printed. [Note in the example how the variable "i" in "main" is defined the line after the output at **2**; not just given a value there, but actually declared as a variable there -- Ed]
At **3** the auto variable "i" of "main" is defined. From this point on, trickery is needed (see **9**) to reach the global "i" from within "main," since this "i" hides the name.
This is why, at **4**, the variable defined at **3** is printed, and not the global alternative. The same thing happens at **5**.
At **6** yet another variable named "i" is defined, and this one hides the variable defined at **3** and the global alternative from **1**. The global "i" can still be reached (see **9**), but the one from **3** is now unreachable.
As expected, the variable printed at **7** is the one defined at **6**.
Between **7** and **8** however, the "i" defined at **6** goes out of scope. It ceases to exist, it dies. Thus, at **8** we can again reach the "i" defined at **3**.
At **9** a cheat is used to reach the global "i" from within main. The "::" operator, called the scope operator, tells C++ to look for the name in the global scope, which makes it reach the variable from **1**.
[Note: a good C++ compiler will warn you if you declare a variable with the same name as one already accessible to your function (eg like at **6**), to alert you to the possibility that you are referring to the wrong variable. This is often described as one variable "shadowing" the other one -- Ed]

Basic I/O

All intro courses in programming begin with a "Hello World" program [except those that don't -- Ed], and so does this one.
#include
int main(void)
{
cout << "Hello EDM/2" << endl;
return 0;
}
Line 1 includes the header , which is needed for the input/output operations. In C++ writing something on standard output is done by:
cout << whatever;
Here "whatever" can be anything that is printable; a string literal as "Hello EDM/2", a variable, an expression. If you want to print several things, you can do so at the same time with:
cout << expr1 << expr2 << expr3 << ...;
Again, expr1, expr2 and expr3 represents things that are printable.
In the "Hello EDM/2" program above, the last expression printed is "endl" which is a special expression (called a manipulator. Manipulators, and details about I/O, will be covered in a complete part sometime this fall) which moves the cursor to the next line. It's legal to cascade more expressions to print after "endl," and doing so means those values will be printed on the next line.
As usual, string escapes can be used when printing:
cout << "\"\t-\\\"\t" << endl;
gives the result:
" -\" "
The spaces in the printed result are tabs (\t.)
Reading values from standard input is done with:
cin >> lvalue;
Here "lvalue" must be something that can be assigned a value, for example a variable. Just as with printing, several things can be read at the same time by cascading the reads:

Introduction

C++ is a programming language substantially different from C. Many see C++ as "a better C than C," or as C with some add-ons. I believe that to be wrong, and I intend to teach C++ in a way that makes use of what the language can offer. C++ shares the same low level constructs as C, however, and I will assume some knowledge of C in this course. You might want to have a look at the C introduction course to get up to speed on that language