3.7.1 Garbage Collection

One problem with all programming languages that support the notion of pointers is the management of the memory associated with a pointer. A large computer program is usually comprised of several libraries. A library presents an application programming interface (API) to the user and hides the implementation details (e.g., internal data structures, memory management, ...). Such an API consists of a set of function and struct declarations for the case of the C programming language, or a set of concrete and interface classes for the case of an object-oriented designed library written in the C++ programming language. In any case, if a library performs some sort of memory management, two important questions arise

In short, the lifetime of the object (for object-oriented languages) must be determined. It is quite obvious that the library itself cannot determine the lifetime of the objects it creates as soon as the API contains any pointers to structs (because pointers to objects are passed to the application). On the other hand, the application may only take over responsibility of the memory if the library does not keep internal pointers or references to that object itself. The user of a library must have a clear understanding (and a good documentation) of at least some of the implementation details. Programming languages like JAVA or LISP obviously avoid the lifetime problem by the lack of pointers. Instead, references are used, and the memory associated with such references is controlled by a so-called garbage collector. A garbage collector keeps track of all references to an object and deletes the object if no references point to it.

C++ lacks a built in garbage collector. Therefore, a simple reference counting algorithm suggested in [2] was implemented in the WAFER-STATE-SERVER. The implementation uses the template polymorphism mechanism to provide type independence. This reference counting technique is referred to as the handle class throughout this work. A handle consists of two pointers. One points to the data to manage (the representation pointer), the other is a pointer to an integer variable holding a reference counter. Fig. 3.19

Figure 3.19: Concept of the handle class. Four stages in the lifetime of a handled object are depicted. From left to right: (1) A new point object is instantiated. The pointer returned from the new operator of the point class is passed to the handle constructor. The counter variable (cnt) is initialized to $ 1$. (2) A second handle is created by the copy constructor. The pointers to the point object and the counter variable are copied into the new handle. The counter is incremented. (3) A third handle to the point object is created. The counter is now set to $ 3$. (4) The third handle is destructed, the counter is decremented and holds a value of $ 2$.
\begin{figure}\psfig{file=pics/handle, width=\linewidth}\par\end{figure}

depicts an example of a handle for a Point class. When the first handle to a Point is created the handle class constructor receives a pointer to the Point object. The representation pointer is initialized to point to the memory location holding the Point object. Memory for the reference counter is allocated and the counter is set to $ 1$. For each new handle that is created from an existing one, the reference counter is incremented by $ 1$. In case a handle is deleted the reference counter is decremented by $ 1$. The last handle (reference counter is $ 1$) is responsible for freeing the memory of the Point and the integer variable. A special case is the assignment operator for handles. If one handle is to be assigned to another the counter of the left hand side handle is decremented (and the handled object is deleted if the reference count is 0) -- the handle detaches itself from the object it previously managed. Then the pointers (to the reference counter variable and the object) of the right hand side handle are taken over into the left hand side and the counter variable is incremented. Fig. 3.20
Figure 3.20: Assignment of handles. The left part depicts the situation before the assignment. Two objects (Point1 and Point2) are managed by two handles (Handle1 and Handle2). Handle2 is disassociated from the object. In this example Handle2 held the only pointer to that object, thus the object is deleted.
\begin{figure}\psfig{file=pics/handle-assignment, width=\linewidth}\par\end{figure}

depicts this operation.

Handles are always created on the runtime stack of the program never on the free store. Thus, when the handle loses its scope the destructor automatically disassociates the handle from the object and deallocates memory occupied by the object if necessary (last handle). This makes an explicit deallocation of the handled object unnecessary in most cases. A notable exception is the implementation of the WSS Writer, where the destructors of the handled objects are used to finish (close) a section. In this case an explicit assignment of a handle to the null pointer is used to force a disassociation and a call of the appropriate destructor.

The handle class overloads the operators *, ->, < and > to ensure full syntax compatibility to a real pointer. Access to the representation pointer is checked and an exception will be raised in case of a detected null pointer dereferencing. Note, that a handle imposes the memory overhead of one pointer and one integer variable compared to a non-handled counterpart. Therefore, the use of handles for variables that consume only small memory amounts like integers or doubles is not advised.

The performance overhead introduced by a handle depends on the compilation of the program. There is a switch to turn run time access checks of the representation pointer on or off. For optimized compiled programs that are compiled with the GNU compiler, all methods of the handle template are inlined and the access checks to the pointer are automatically turned off. The overhead remaining in that case is the call of the new operator for the integer variable and the overhead for the memory management of the underlying operating system. Parts of that overhead are eliminated by providing a specialized new operator for integers. This operator also reduces the memory overhead of the reference counter from a minimum (given by the operating system) of $ 8$ bytes to $ \approx 4$ bytes on a $ 32$ bit architecture. The operator is implemented using a memory pool similar to the one described in [2]. Note that all handle operations (assignment, copying, ...) must be guaranteed to be atomic or the result is undefined. Therefore, special care must be taken in multi threaded applications to ensure thread safety.

2003-03-27