ooc1011.html

10
Delegates Callback Functions


10.1Callbacks
An object points to its class description. The class description points to all dynami- cally linked methods applicable to the object. It looks as though we should be able to ask an object if it can respond to a particular method. In a way this is a safe- guard measure: given a dubious object we can check at run time if we are really allowed to apply a specific method to it. If we do not check, the method’s selector will certainly check and crash our program if the object cannot respond, i.e., if the object’s class description does not contain the method.
Why would we really want to know? We are out of luck if a method must be applied unconditionally to an object which does not know about it; therefore, there is no need to check. However, if it makes no difference to our own algorithm whether or not the method is applied, being able to ask makes for a more forgiving interface.
The situation arises in the context of callback functions. For example, if we are managing a window on a display, some inhabitants of the window might want to be informed when they are about to be covered up, displayed again, changed in size, or destroyed. We can inform our client by calling a function on which we both have agreed: either the client has given us the name of a function to be called for a par- ticular event, or we have agreed on a specific function name.
Registering a callback function, the first technique, looks like the more flexible approach. A client registers functions only for those events which are important from its point of view. Different clients may use different sets of callback func- tions, and there is no need to observe a common name space. ANSI-C actually uses some callback functions: bsearch() and qsort() receive the comparison func- tion relative to which they search and sort and atexit() registers functions to be called just before a program terminates.
Agreeing on specific function names looks even easier: a recognizer generated by lex will call a function yywrap() at the end of an input file and it will continue pro- cessing if this function does not return zero. Of course, this is impractical if we need more than one such function in a program. If bsearch() assumed its com- parison function to be called cmp, it would be much less flexible.

10.2Abstract Base Classes
Once we look at dynamically linked methods, agreeing on specific method names for callback purposes does not seem to be as limiting. A method is called for a par- ticular object, i.e., which code is executed for a callback depends on an object in addition to a specific method name.


Methods, however, can only be declared for a class. If we want to communi- cate with a client in the style of a callback function, we have to postulate an abstract base class with the necessary communication methods and the client object must belong to a subclass to implement these methods. For example:
% OrderedClass: Class  OrderedArray: Object {
%—
int cmp (const _self, int a, int b); void swap (_self, int a, int b);
%}
A sorting algorithm can use cmp() to check on two array elements by index, and it can use swap() to rearrange them if they are out of order. The sorting algorithm can be applied to any subclass of OrderedArray which implements these methods. OrderedArray itself is called an abstract base class because it serves only to declare the methods; this class should have no objects if the methods are not defined.
Abstract base classes are quite elegant to encapsulate calling conventions. For example, in an operating system there could be an abstract base class for a certain variety of device drivers. The operating system communicates with each driver using the methods of the base class and each driver is expected to implement all of these methods to communicate with the actual device.
The catch is that all methods of an abstract base class must be implemented for the client because they will be called. For a device driver this is perhaps obvi- ous, but a device driver is not exactly a representative scenario for callback func- tions. A window is much more typical: some clients have to worry about expo- sures and others could not care less — why should they all have to implement all methods?
An abstract base class restricts the architecture of a class hierarchy. Without multiple inheritance a client must belong to a particular part of the class tree headed by the abstract base class, regardless of its actual role within an application. As an example, consider a client of a window managing a list of graphical objects. The elegant solution is to let the client belong to a subclass of List but the implementa- tion of a window forces the client to be something like a WindowHandler. As we discussed in section 4.9 we can make an aggregate and let the client contain a List object, but then our class hierarchy evolves according to the dictate of the system rather than according to the needs of our application problems.
Finally, an abstract base class defining callback functions tends to define no private data components for its objects, i.e., the class declares but does not define methods and the objects have no private state. While this is not ruled out by the concept of a class it is certainly not typical and it does suggest that the abstract base class is really just a collection of functions rather than of objects and methods.

10.3Delegates

10.3 Delegates

113

Having made a case against abstract base classes we need to look for a better idea. It takes two to callback: the client object wants to be called and the host does the calling. Clearly, the client object must identify itself to the host, if it wants the host to send it a message, but this is all that is required if the host can ask the client what callbacks it is willing to accept, i.e., what methods it can respond to.
It is significant that our viewpoint has shifted: an object is now part of the call- back scenario. We call such an object a delegate. As soon as a delegate announces itself to the host, the host checks what callbacks the delegate can handle and later the host makes precisely those calls which the delegate expects.
As an example we implement a simple framework for a text filter, i.e., a pro- gram which reads lines from standard input or from files specified as arguments, manipulates them, and writes the results to standard output. As one application we look at a program to count lines and characters in a text file. Here is the main pro- gram which can be specified as part of the implementation file Wc.dc:
int main (int argc, char * argv [])
{	void * filter = new(Filter(), new(Wc()));
return mainLoop(filter, argv);
}
We create a general object filter and give it as a delegate an application-specific Wc object to count lines and characters. filter receives the arguments of our program and runs the mainLoop() with callbacks to the Wc object.
% WcClass: Class Wc: Object {
unsigned lines;	// lines in current file unsigned allLines;	// lines in previous files unsigned chars;	// bytes in current file unsigned allChars;	// bytes in previous files unsigned files;	// files completed
%—
int wc (_self, const Object @ filter,	\ const char * fnm, char * buf);
int printFile (_self, const Object @ filter,	\ const char * fnm);
int printTotal (_self, const Object @ filter);
%}
The methods in Wc do nothing but line and character counting and reporting the results. wc() is called with a buffer containing one line:
% Wc wc {	// (self, filter, fnm, buf)
%casts
++ self —> lines;
self —> chars += strlen(buf); return 0;
}
Once a single file has been processed, printFile() reports the statistics and adds them to the running total:


% Wc printFile {	// (self, filter, fnm)
%casts
if (fnm && strcmp(fnm, "—")) printf("%7u %7u %s\n",
self —> lines, self —> chars, fnm);
else
printf("%7u %7u\n", self —> lines, self —> chars);
self —> allLines += self —> lines, self —> lines = 0; self —> allChars += self —> chars, self —> chars = 0;
++ self —> files; return 0;
}
fnm is an argument with the current filename. It can be a null pointer or a minus sign; in this case we do not show a filename in the output.
Finally, printTotal() reports the running total if printFile() has been called more than once:
% Wc printTotal {	// (self, filter)
%casts
if (self —> files > 1)
printf("%7u %7u in %u files\n",
self —> allLines, self —> allChars, self —> files); return 0;
}
Wc only deals with counting. It does not worry about command line argu- ments, opening or reading files, etc. Filenames are only used to label the output, they have no further significance.

10.4 An Application Framework — Filter
Processing a command line is a general problem common to all filter programs. We have to pick off bundled or separated flags and option values, we must recognize two minus signs  as the end of the option list and a single minus sign  addition- ally as standard input, and we may need to read standard input or each file argu- ment. Every filter program contains more or less the same code for this purpose, and macros such as MAIN [Sch87, chapter 15] or functions such as getopt(3) help to maintain standards, but why regurgitate the code in the first place?
The class Filter is designed as a uniform implementation of command line pro- cessing for all filter programs. It can be called an application framework because it establishes the ground rules and basic structure for a large family of applications. The method mainLoop() contains command line processing once and for all and uses callback functions to let a client deal with the extracted arguments:
% mainLoop {	// (self, argv)
%casts
self —> progname = * argv ++;

10.4An Application Framework — ‘‘Filter’’	115

while (* argv && ** argv == ’—’)
{	switch (* ++ * argv) {
case 0:	// single —
—— * argv;	// ... is a filename
break;	// ... and ends options case ’—’:
if (! (* argv)[1])  // two ——
{	++ argv;	// ... are ignored break;	// ... and end options
}
default:	// rest are bundled flags do
if (self —> flag)
{	self —> argv = argv;
self —> flag(self —> delegate,
self, ** argv);
argv = self —> argv;
}
else
{	fprintf(stderr,
"%s: —%c: no flags allowed\n", self —> progname, ** argv);
return 1;
}
while (* ++ * argv);
++ argv; continue;
}
break;
}
The outer loop processes arguments until we reach the null pointer terminating the array argv[] or until an argument does not start with a minus sign. One or two minus signs terminate the outer loop with break statements.
The inner loop passes each character of one argument to the flag-function pro- vided by the delegate. If the delegate decides that a flag introduces an option with a value, the method argval() provides a callback from the delegate to the filter to retrieve the option value:
% argval {	// (self)
const char * result;
%casts
assert(self —> argv && * self —> argv);
if ((* self —> argv)[1])	// —fvalue result = ++ * self —> argv;
else if (self —> argv[1])	// —f value result = * ++ self —> argv;
else	// no more argument
result = NULL;


while ((* self —> argv)[1])	// skip text
++ * self —> argv;
return result;
}
The option value is either the rest of the flag argument or the next argument if any.
self > argv is advanced so that the inner loop of mainLoop() terminates.
Once the options have been picked off the command line, the filename argu- ments remain. If there are none, a filter program works with standard input. main- Loop() continues as follows:
if (* argv) do


else

result = doit(self, * argv); while (! result && * ++ argv);

result = doit(self, NULL);

if (self —> quit)
result = self —> quit(self —> delegate, self); return result;
}
We let a method doit() take care of a single filename argument. A null pointer represents the situation that there are no arguments. doit() produces an exit code: only if it is zero do we process more arguments.
% doit {	// (self, arg) FILE * fp;
int result = 0;
%casts
if (self —> name)
return self —> name(self —> delegate, self, arg);
if (! arg || strcmp(arg, "—") == 0) fp = stdin, clearerr(fp);
else if (! * arg)
{	fprintf(stderr, "%s: null filename\n",
self —> progname);
return 1;
}
else if (! (fp = fopen(arg, "r")))
{	perror(arg); return 1;
}
The client may supply a function to process the filename argument. Otherwise, doit() connects to stdin for a null pointer or a minus sign as an argument; other filenames are opened for reading. Once the file is opened the client can take over with yet another callback function or doit() allocates a dynamic buffer and starts reading lines:

10.5The ‘‘respondsTo’’ Method

117


if (self —> file)
result = self —> file(self —> delegate, self, arg, fp);
else
{	if (! self —> buf)
{	self —> blen = BUFSIZ;
self —> buf = malloc(self —> blen); assert(self —> buf);
}
while (fgets(self —> buf, self —> blen, fp)) if (self —> line && (result =
self —> line(self —> delegate, self, arg,
self —> buf)))
break;
if (self —> wrap)
result = self —> wrap(self —> delegate, self, arg);
}
if (fp != stdin) fclose(fp);
if (fflush(stdout), ferror(stdout))
{	fprintf(stderr, "%s: output error\n", self —> progname); result = 1;
}
return result;
}
With two more callback functions the client can receive each text line and perform cleanup actions once the file is complete, respectively. These are the functions that wc uses. doit() recycles the file pointer and checks that the output has been suc- cessfully written.
If a client class implements line-oriented callbacks from the Filter class, it should be aware of the fact that it deals with text lines. fgets() reads input until its buffer overflows or until a newline character is found. Additional code in doit() extends the dynamic buffer as required, but it only passes the buffer to the client, not a buffer length. fgets() does not return the number of characters read, i.e., if there is a null byte in the input, the client has no way to get past it because the null byte might actually mark the end of the last buffer of a file with no terminating new- line.

10.5The respondsTo Method
How does an object reach its delegate? When a Filter object is constructed it receives the delegate object as an argument. The class description Filter.d defines function types for the possible callback functions and object components to hold the pointers:
typedef void (* flagM) (void *, void *, char);
typedef int (* nameM) (void *, const void *, const char *); typedef int (* fileM) (void *, const void *, const char *,
FILE *);


typedef int (* lineM) (void *, const void *, const char *,
char *); typedef int (* wrapM) (void *, const void *, const char *); typedef int (* quitM) (void *, const void *);
% Class Filter: Object { Object @ delegate;
flagM flag;	// process a flag
nameM name;	// process a filename argument
fileM file;	// process an opened file
lineM line;	// process a line buffer
wrapM wrap;	// done with a file
quitM quit;	// done with all files
const char * progname; // argv[0]
char ** argv;	// current argument and byte
char * buf;	// dynamic line buffer
unsigned blen;	// current maximum length
%
int mainLoop (_self, char ** argv); const char * argval (_self);
const char * progname (const _self); int doit (_self, const char * arg);
%}
Unfortunately, ANSI-C does not permit a typedef to be used to define a function header, but a client class like Wc can still use the function type to make sure its callback function matches the expectations of Filter:
#include "Filter.h"
% Wc wc {	// (self, filter, fnm, buf)
%casts
assert((lineM) wc == wc);
...
The assertion is trivially true but a good ANSI-C compiler will complain about a type mismatch if lineM does not match the type of wc():
In function `Wc_wc’:
warning: comparison of distinct pointer types lacks a cast
We still have not seen why our filter knows to call wc() to process an input line. Filter_ctor() receives the delegate object as an argument and it can set the interesting components for filter:
% Filter ctor {
struct Filter * self = super_ctor(Filter(), _self, app); self —> delegate = va_arg(* app, void *);
self —> flag = (flagM) respondsTo(self —> delegate, "flag");
...
self —> quit = (quitM) respondsTo(self —> delegate, "quit");
return self;
}


The trick is a new statically linked method respondsTo() which may be applied to any Object. It takes an object and a search argument and returns a suitable function pointer if the object has a dynamically linked method corresponding to the search argument.
The returned function pointer could be a selector or the method itself. If we opt for the method, we avoid the selector call when the callback function is called; however, we also avoid the parameter checking which the selector performs. It is better to be safe than to be sorry; therefore, respondsTo() returns a selector.
Designing the search argument is more difficult. Because respondsTo() is a general method for all types of methods we cannot perform type checking at com- pile time, but we have already shown how the delegate can protect itself. Regard- less of type checking we could still let respondsTo() look for the selector it is sup- posed to return, i.e., the search argument could be the desired selector. Selector names, however, are part of the global name space of a program, i.e., if we look for a selector name we are implicitly restricted to subclasses of the class where the selector was introduced. However, the idea was not to be restricted by inheritance aspects. Therefore, respondsTo() uses a string as the search argument.
We are left with the problem of associating a string with a dynamically linked method. Logically this can be done in one of two places: when the method is declared in the class description file or when it is implemented in the implementa- tion file. Either way it is a job for ooc because the association between the string tag and the method name must be stored in the class description so that respondsTo() can find it there. The class description, however, is constructed by ooc. We use a simple syntax extension:
% WcClass: Class Wc: Object {
...
%—
line:	int wc (_self, const Object @ filter,	\ const char * fnm, char * buf);
wrap:	int printFile (_self, const Object @ filter,	\ const char * fnm);
quit:	int printTotal (_self, const Object @ filter);
%}
In a class description file like Wc.d a tag may be specified as a label preceding a dynamically linked method. By default, the method name would be used as a tag. An empty label suppresses a tag altogether — in this case respondsTo() cannot find the method. Tags apply to dynamically linked methods, i.e., they are inherited. To make things more flexible, a tag can also be specified as a label in a method header in the implementation file. Such a tag is valid only for the current class.

10.6Implementation
respondsTo() must search the class description for a tag and return the corresponding selector. Thus far, the class description only contains pointers to the methods. Clearly, the method entry in a class description must be extended:


typedef void (* Method) ();	// for respondsTo()
%prot
struct Method {
const char * tag;	// for respondsTo()
Method selector;	// returned by respondsTo()
Method method;	// accessed by the selector
};
% Class Object {
...
Method respondsTo (const _self, const char * tag);
Method is a simple function type defined in the interface file for Object. Each method is recorded in a class description as a component of type struct Method which contains pointers to the tag, the selector, and the actual method. respondsTo() returns a Method. ANSI-C compilers will gripe about implicit casts from and to this type.
Given this design, a few more changes are required. In Object.dc we need to change the static initialization of the class descriptions Object and Class to use struct Method:
static const struct Class _Object = {
{ MAGIC, & _Class },
"Object", & _Object, sizeof(struct Object),
{ "",	(Method) 0,	(Method) Object_ctor },
{ "",	(Method) 0,	(Method) Object_dtor },
{ "differ", (Method) differ,(Method) Object_differ },
...
};
The -r report in r.rep uses the link report in va.rep to generate an entry in the class description for the class representation file. The new version of the link report is very simple:
% link	// component of metaclass structure struct Method `method ;
Finally, the init report in c.rep and c-R.rep uses the meta-ctor-loop in etc.rep to
generate the loop which dynamically fills the class description. Here we also have to work with the new types:
% meta—ctor—loop	// selector/tag/method tuples for `class
`t while ((selector  = va_arg(ap,  Method))) `n
`t { `t	const char * tag = va_arg(ap,  ` \
const char *); `n
`t `t Method method = va_arg(ap,  Method); `n `n
`{%— `%link—it `}
`t } `n


% link—it	// check and insert one selector/method pair
`t `t if (selector  == (Method) `method ) `n
`t	`t	{ `t	if (tag) `n
`t	`t	`t `t self —> `method .tag = tag, `n
`t	`t	`t `t self —> `method .selector  = selector; `n
`t	`t	`t self —> `method .method = method; `n
`t	`t	`t continue; `n
`t	`t	} `n
Rather than selector/method pairs we now specify selector/tag/method tuples as arguments to the metaclass constructor. This must be built into the init report in c.rep. Here is the initialization function for Wc generated by ooc:
static const void * _Wc;
const void * Wc (void) { return _Wc ? _Wc :
(_Wc = new(WcClass(),
"Wc", Object(), sizeof(struct Wc), wc, "line", Wc_wc,
printFile, "wrap", Wc_printFile, printTotal, "quit", Wc_printTotal, (void *) 0));
}
Given the selector/tag/method tuples in a class description, respondsTo() is easy to write. Thanks to the class hierarchy, we can compute how many methods a class description contains and we can implement respondsTo() entirely in the Object class, even though it handles arbitrary classes:
% respondsTo {
if (tag && * tag) {
const struct Class * class = classOf(_self);
const struct Method * p = & class —> ctor; // first int nmeth =
(sizeOf(class) — offsetof(struct Class, ctor))
/ sizeof(struct Method);	// # of Methods
do
if (p —> tag && strcmp(p —> tag, tag) == 0) return p —> method ? p —> selector : 0;
while (++ p, —— nmeth);
}
return 0;
}
The only drawback is that respondsTo() explicitly contains the first method name ever, ctor, in order to calculate the number of methods from the size of the class description. While ooc could obtain this name from the class description of Object, it would be quite messy to construct a report for ooc to generate respondsTo() in a general fashion.


10.7Another application — sort
Let us implement a small text sorting program to check if Filter really is reusable, to see how command line options are handled, and to appreciate that a delegate can belong to an arbitrary class.
A sort filter must collect all text lines, sort the complete set, and finally write them out. Section 7.7 introduced a List based on a dynamic ring buffer which we can use to collect the lines as long as we add a sorting method. In section 2.5 we implemented a simple String class; if we integrate it with our class hierarchy we can use it to store each line in the List.
Let us start with the main program which merely creates the filter with its delegate:
int main (int argc, char * argv [])
{	void * filter = new(Filter(), new(Sort(), 0));
return mainLoop(filter, argv);
}
Because we can attach the callback methods to any class, we can create the delegate directly in a subclass of List:
% SortClass: ListClass Sort: List { char rflag;
%—
void flags (_self, Object @ filter, char flag);
int line (_self, const Object @ filter, const char * fnm, \
char * buf);
int quit (_self, const Object @ filter);
%}
To demonstrate option handling we recognize r as a request to sort in reverse order. All other flags are rejected by the flags() method which has flag as a tag for respondsTo():
% flag: Sort flags {
%casts
assert((flagM) flags == flags);
if (flag == ’r’)
self —> rflag = 1;
else
fprintf(stderr, "usage: %s [—r] [file...]\n",
progname(filter)),
exit(1);
}
Given String and List, collecting lines is trivial:
% Sort line {
%casts
assert((lineM) line == line);
addLast(self, new(String(), buf)); return 0;
}

10.8Summary

123


Once all lines are in, the quit callback takes care of sorting and writing. If there are any lines at all, we let a new method sort() worry about sorting the list, and then we remove each line in turn and let the String object display itself. We can sort in reverse order simply by removing the lines from the back of the list:
% Sort quit {
%casts
assert((quitM) quit == quit);
if (count(self))
{	sort(self); do
puto(self —> rflag ? takeLast(self)
: takeFirst(self), stdout); while (count(self));
}
return 0;
}
What about sort()? ANSI-C defines the library function qsort() for sorting arbitrary arrays based on a comparison function. Luckily, List is implemented as a ring buffer in an array, i.e., if we implement sort() as a method of List we should have very little trouble:
static int cmp (const void * a, const void * b)
{
return differ(* (void **) a, * (void **) b);
}
% List sort {
%casts
if (self —> count)
{	while (self —> begin + self —> count > self —> dim) addFirst(self, takeLast(self));
qsort(self —> buf + self —> begin, self —> count,
sizeof self —> buf[0], cmp);
}
}
If there are any list elements, we rotate the list until it is a single region of the buffer and then pass the list to qsort(). The comparison function sends differ() to the list elements themselves — String_differ was based on strcmp() and can, therefore, be (ab-)used as a comparison function.

10.8Summary
An object points to its class description and the class description points to all the dynamically linked methods for the object. Therefore, an object can be asked if it will respond to a particular method. respondsTo() is a statically linked method for Object. It takes an object and a string tag as search argument and returns the appropriate selector if the tag matches a method for the object.
Tags can be specified to ooc as labels on the prototypes of dynamically linked methods in the class definition file, and as labels on a method header in the imple-


mentation file; the latter have precedence. By default, the method name is used as a tag. Empty tags cannot be found. For the implementation of respondsTo() a method is passed to a metaclass constructor as a triple selector/tag/method.
Given respondsTo(), we can implement delegates: a client object announces itself as a delegate object to a host object. The host queries the client with respondsTo() if it can answer certain method calls. If it does, the host will use these methods to inform the client of some state changes.
Delegates are preferable to registering callback functions and to abstract base classes for defining the communication between a host and a client. A callback function cannot be a method because the host does not have an object to call the method with. An abstract base class imposes unnecessary restrictions on application-oriented development of the class hierarchy. Similar to callback func- tions, we can implement for delegates just those methods which are interesting for a particular situation. The set of possible methods can be much larger.
An application framework consists of one or more objects which provide the typical structure of an application. If it is well designed, it can save a great deal of routine coding. Delegates are a very convenient technique to let the application framework interact with the problem-specific code.

10.9Exercises
Filter implements a standard command line where options precede filename argu- ments, where flags can be bundled, and where option values can be bundled or specified as separate arguments. Unfortunately, pr(1) is a commonly available pro- gram that does not fit this pattern. Is there a general solution? Can a flag introduce two or more argument values which all appear as separate arguments?
The line callback should be modified so that binary files can be handled correctly. Does it make sense to provide a byte callback? What is an alternative?
A much more efficient, although not portable, implementation would try to map a file into memory if possible. The callback interface does not necessarily have to be modified but a modification would make it more robust.
respondsTo() has to know the name of the first struct Method component of every class description. The reports -r in r-R.rep or rather init in c-R.rep can be modified to define a structure to circumvent this problem.
The init report can be modified to generate a puto() method for Class which uses the same technique as respondsTo() to display all method tags and addresses.
Piping the output of our sort program into the official sort(1) for checking may produce a surprise:
$ sort —r Sort.d | /usr/bin/sort —c —r
sort: disorder: int quit (_self, const Object @ filter);
There are more efficient ways for List_sort() to compact the list in the ring buffer before passing it to qsort(). Are we really correct in rotating it?

125

11
Class Methods Plugging Memory Leaks

Modern workstations have lots of memory. If a program looses track of a byte here and there it will probably not make a whole lot of difference. However, memory leaks are usually indicative of algorithmic errors — either the program reacts in unexpected ways to strange input or, worse, the program was inadvertently designed to break connections to dynamically allocated memory. In this chapter we will look at a general technology available with object-oriented programming which can be used, among other things, to combat memory leaks.

11.1An Example
All resources acquired by a program should be properly recycled. Dynamic memory is a resource and production programs should certainly be checked for memory leaks. As an example, consider what happens when we make a syntax error while using the calculator developed in the third and fifth chapter:
$ value
(3 * 4) — —
bad factor: ’’ 0x0
The recursive descent algorithm tries to build an expression tree. If something goes wrong, the error() function uses longjmp() to eliminate whatever is on the stack and continue processing in the main program. The stack, however, contains the pieces of the expression tree built thus far. If there is a syntax error, these pieces are lost: we have a memory leak. This is, of course, a standard problem in constructing interpreters.
NeXTSTEP provides a simple application MallocDebug which can be used to locate at least some of the more serious problems. If we link value with lMalloc- Debug, the standard versions of malloc() and related functions are replaced by a module that can communicate with the MallocDebug application. We start Malloc- Debug after value, connect the two, and push a button Leaks once we have received the first error message. Unfortunately, the output is simply:
No nodes.
MallocDebug uses a fairly naive method to check for leaks: it has a list of all allo- cated areas and scans the words in the client task to see if they point to allocated areas. Only areas to which no word in the client task points are considered to be memory leaks. For the input
(3 * 4) — —
sum() will have the first subtree built by product() before factor() runs into the end of the input line. However, when error() clips the stack from factor() back to main(), the address of the root of this subtree is still in the local variable result of sum() and, by chance, does not get overwritten in the longjmp(). The remaining


nodes are connected to the root, i.e., from the point of view of MallocDebug, all nodes can still be reached. However, if we enter another expression the old stack is overwritten and MallocDebug will find the leak.
value:
$ value
(3 * 4) — —
bad factor: ’’ 0x0 1 + 3
4
MallocDebug:
Zone:	Address:	Size:	Function:
default 0x050ec35c	12	mkBin, new, product, sum,
factor, product, sum, stmt
If value is compiled with debugging information, we can start a debugger in a second window and investigate the leak:
$ gdb value
GDB is free software ... (gdb) attach 746
Attaching program `value’, pid 746 0x5007be2 in read ()
(gdb) print * (struct Bin *) 0x050ec35c Reading in symbols for mathlib.c...done.
$1 = {
type = 0x8024, left = 0x50ec334, right = 0x50ec348
}
(gdb) print process(0x050ec35c) Reading in symbols for value.c...done.
$3 = void (gdb)
The GNU debugger can be attached to a running process. With print we can display the contents of the leaky node if we copy the address from the MallocDebug win- dow and supply the proper type: mkBin() was the original caller of malloc(), i.e., we must have obtained a struct Bin. As the output shows, print can even call a method like process() in value and display the result. The output from process() appears in the window where value is running:
$ value
(3 * 4) — —
bad factor: ’’ 0x0 1 + 3
4
12
The memory leak is alive and well.

11.2Class Methods

11.2 Class Methods

127

How do we plug this specific memory leak? The leak has occurred by the time error() returns to the main loop. Either we collect and release all expression pieces before longjmp() is executed, or we need a different way to reclaim the allocated nodes.
Collecting the pieces is a lost cause because they are held by various activa- tions of the functions involved in the recursive descent algorithm. Only each activa- tion knows what must be released, i.e., in place of a longjmp() we would have to cope with error returns in every function. This is likely to be botched once the pro- gram is later extended.
Designing a reclamation mechanism is a much more systematic approach for solving this problem. If we know what nodes are currently allocated for the expres- sion tree we can easily release them and reclaim the memory in case of an error. What we need are versions of new() and delete() which maintain a linear list of allocated nodes which a function like reclaim() can traverse to free memory. In short, for expression tree nodes we should overwrite what new() and delete() do.
delete() is sent to objects, i.e., it is a method that can be given dynamic linkage so that it may be overwritten for a subtree of the class hierarchy. new(), however, is sent to a class description. If we want to give new() dynamic linkage, we must add its pointer to the class description of the class description object, to which we want to send new():


aNode

Node

NodeClass


struct NodeClass

With this arrangement we can give new() dynamic linkage for the call
new(Node, ...)
However, we create a problem for the descriptions of class descriptions, i.e., at the right edge of this picture. If we start to introduce new method components in metaclass descriptions such as NodeClass, we can no longer use struct Class to store them, i.e., our diagram must be extended at least one more level to the right before we might be able to tie it to the original Class.


Why did we decide to store methods in class descriptions? We assume that we have many objects and few classes. Storing methods in class descriptions rather than in the objects themselves costs one level of indirection, i.e., the dere- ferencing of the pointer from the object to its class description, but it avoids the high memory requirement of letting each object contain all method pointers directly.
There are fewer class descriptions than other objects; therefore, the expense of storing a method address directly in the class description to which the method is applied is not as prohibitive as it would be for other objects. We call such methods class methods — they are applied to the class description in which they are stored rather than to the objects sharing this class description.
A typical class method is new() which would be overwritten to manipulate memory allocation: provide statistics or a reclamation mechanism; allocate objects in memory zones to improve the paging behavior of a program; share memory between objects, etc. Other class methods can be introduced, for example, if we want to circumvent the convention that new() always calls the constructor ctor().

11.3 Implementing Class Methods
The internal difference between a class method and another dynamically linked method is encapsulated in the selector. Consider exec(), a dynamically linked method to evaluate a node. The selector applies classOf() to get the class descrip- tion and looks for .exec in there:
double exec (const void * _self) { const struct NodeClass * class =
cast(NodeClass(), classOf(_self));
assert(class —> exec.method);
return ((double (*) ()) class —> exec.method)(_self);
}
In contradistinction, consider new(), a class method which is applied to a class description. In this case self refers to the class description itself and the selector looks for .new as a component of *self:
struct Object * new (const void * _self, ...) { struct Object * result;
va_list ap;
const struct Class * self = cast(Class(), _self);
assert(self —> new.method); va_start(ap, _self);
result = ((struct Object * (*) ()) self —> new.method)
(_self, & ap);
va_end(ap); return result;
}

11.3 Implementing Class Methods

Here is a picture describing the linkage of exec() and new():

129


aNode

Node

NodeClass


struct NodeClass

Both, class methods and dynamically linked methods, employ the same super- class selector because it receives the class description pointer as an explicit argu- ment.
struct Object * super_new (const void * _class,
const void * _self, va_list * app) { const struct Class * superclass = super(_class);
assert(superclass —> new.method); return
((struct Object * (*) ()) superclass —> new.method)
(_self, app);
}
Selectors are generated by ooc under control of the selectors report in etc.rep. Because the selectors differ for class methods and dynamically linked methods, ooc needs to know the method linkage. Therefore, class methods are specified in the class description file following the dynamically linked methods and the separator
%+. Here is an excerpt from Object.d:
% Class Object {
...
%
const Class @ classOf (const _self);	// object’s class
...
%—
void * ctor (_self, va_list * app);	// constructor
...
void delete (_self);	// reclaim instance
%+
Object @ new (const _self, ...);	// create instance
%}


delete() is moved to the dynamically linked methods and new() is introduced as a class method.
% Class Class: Object {
...
%
Object @ allocate (const _self);	// memory for instance
...
%}
Once we remove new() as a statically linked method for Class, we package the memory allocation part as a new statically linked method allocate().
Given % and %+ as separators, ooc knows the linkage of every method and the report selectors can be extended to generate the selectors shown above. Other reports generate selector declarations for the interface file, superclass selec- tor declarations and the layout of the metaclass description for the representation file, the loop in the metaclass constructor which recognizes selector/tag/method tri- ples and enters them into the class description, and, finally, the initialization func- tions for class and metaclass descriptions. All of these reports need to be extended. For example, in the report h in h.rep the declarations of dynamically linked methods are generated with
`{%— `%header ; `n `}n
A new loop adds the class method declarations:
`{%+ `%header ; `n `}n
`{+ is a loop over the class methods of the current class.
Once we access new() and delete() through selectors, we have to implement them for Object in Object.dc:
% Object new {
%casts
return ctor(allocate(self), app);
}
new() creates the area for the new object and calls the appropriate constructor to initialize it. allocate() contains most of the old code of new(). It obtains dynamic memory and installs the class description pointer so that the dynamic linkage of ctor() in new() works correctly:
% allocate {
struct Object * object;
%casts
assert(self —> size);
object = calloc(1, self —> size); assert(object);
object —> magic = MAGIC; object —> class = self; return object;
}


delete() calls the destructor dtor() as before and passes the result to free():
% Object delete {
%casts
free(dtor(self));
}
Whenever we add new methods to Object which are accessed through selectors, we must not forget to install them by hand in the class descriptions in Object.dc. As an example here is _Object:
static const struct Class _Object = {
{ MAGIC, & _Class },
"Object", & _Object, sizeof(struct Object),
{ "",	(Method) 0,	(Method) Object_ctor },
...
{ "delete", (Method) delete,(Method) Object_delete },
...
{ "",	(Method) 0,	(Method) Object_new },
};

11.4Programming Savvy — A Classy Calculator
With the technology for plugging memory leaks in place, we can now engineer our calculator to take advantage of the class hierarchy. First we need to add the descriptions from chapter 5 to the hierarchy.
Node
The basic building block for the expression tree is Node, an abstract base class. A
Number is a Node which contains a floating point constant:
// new(Number(), value)
% NodeClass Number: Node { double value;
%}
Our tree can grow if we have nodes with subtrees. A Monad has just one subtree, a Dyad has two:
% NodeClass Monad: Node { void * down;
%}
%prot
#define down(x)	(((struct Monad *)(x)) —> down)
% NodeClass Dyad: Node { void * left;
void * right;
%}
%prot
#define left(x)	(((struct Dyad *)(x)) —> left) #define right(x)	(((struct Dyad *)(x)) —> right)


Technically, .down, .left and .right should only be filled by the constructors for these nodes, but if we plan to copy a tree, a subclass may need to modify the pointers.
We use single subtrees to build two entirely different things. Val is used to get the value from a symbol in the symbol table and Unary represents an operator such as a minus sign:
% NodeClass Val: Monad {
%}
// new(Minus(), subtree)
% NodeClass Unary: Monad {
%}
% NodeClass Minus: Unary {
%}
One kind of Val is a Global which points to a Var or Const symbol and obtains its value from there. If we implement user defined functions we use a Parm to fetch the value of a single parameter.
// new(Global(), constant—or—variable)
// new(Parm(), function)
% NodeClass Global: Val {
%}
% NodeClass Parm: Val {
%}
We will derive symbol table entries from a base class Symbol which is independent of Node. Therefore, we need Val and its subclasses because we can no longer let an expression tree point directly to a Symbol which would not understand the exec() method.
There are many nodes with two subtrees. Add, Sub, Mult, and Div combine the values of their subtrees; we can simplify things by inserting Binary as a com- mon base class for these:
// new(Add(), left—subtree, right—subtree)
...
% NodeClass Binary: Dyad {
%}
% NodeClass Add: Binary {
%}
...
Just as Val is used to access symbol values, Ref is used to combine a symbol and an expression tree: Assign points to a Var and stores the value of its other subtree there; Builtin points to a Math symbol which computes the value of a library func- tion for the value of Builtin’s right subtree as an argument; User, finally, points to a Fun symbol which computes the value of a user defined function for the value of User’s other subtree as an argument.


// new(Assign(), var, right—subtree)
// new(Builtin(), math, arg—subtree)
// new(User(), fun, arg—subtree)
% NodeClass Ref: Dyad {
%}
% NodeClass Assign: Ref {
%}
% NodeClass Builtin: Ref {
%}
% NodeClass User: Ref {
%}
For the most part, the methods for Node subclasses can be copied from the solu- tion in chapter 5. Very little adapting is required. The following table shows how the various methods are linked into Node and its subclasses:

CLASS	DATA	METHODS
Node		see below
Number	value	ctor, exec
Monad	down	ctor
Val		exec
Global Parm		
Unary			dtor
Minus			exec
Dyad	left,	right	ctor
Ref			dtor
Assign			exec
Builtin			exec
User			exec
Binary			dtor
Add			exec
Sub			exec
Mult			exec
Div			exec
While we are violating the principle that constructors and destructors should be bal- anced, we do so for a reason: the destructors send delete() to their subtrees. This is acceptable as long as we delete an expression subtree, but we clearly should not send delete() into the symbol table. Val and Ref were introduced exactly to factor the destruction process.
At this point it looks as if we need not distinguish Global and Parm. However, depending on the representation of their symbols, we may have to implement dif- ferent exec() methods for each. Introducing the subclasses keeps our options open.


Symbol
Looking at possible expression trees we have discovered the necessary nodes. In turn, once we design the nodes we find most of the symbols which we need. Symbol is the abstract base class for all symbols that can be entered into a symbol table and located by name. A Reserved is a reserved word:
// new(Reserved(), "name", lex)
% Class Reserved: Symbol {
%}
A Var is a symbol with a floating point value. Global will point to a Var symbol and use value() to obtain the current value; Assign similarly uses setvalue() to deposit a new value:
// new(Var(), "name", VAR)
% Class Var: Symbol { double value;
%
double value (const _self);
double setvalue (_self, double value);
%}
A Const is a Var with a different constructor:
// new(Const(), "name", CONST, value)
% Class Const: Var {
%}
If we make Const a subclass of Var we avoid the glitches that setvalue() would have to access .value in the base class and that we would have to initialize a Var during construction. We will syntactically protect Const from being the target of an Assign.
A Math represents a library function. Builtin uses mathvalue() to pass an argument in and receive the function value as a result:
// new(Math(), "name", MATH, function—name) typedef double (* function) (double);
% Class Math: Symbol { function fun;
%
double mathvalue (const _self, double value);
%}
Finally, a Fun represents a user defined function with a single parameter. This sym- bol points to an expression tree which can be originally set or later replaced with setfun() and evaluated by a User node with funvalue():


// new(Fun(), "name", FUN)
% Class Fun: Var { void * fun;
%
void setfun (_self, Node @ fun); double funvalue (_self, double value);
%}
Ignoring recursion problems, we define Fun as a subclass of Var so that we can store the argument value with setvalue() and build a Parm node into the expres- sion wherever the value of the parameter is required. Here is the class hierarchy for Symbol:

CLASS	DATA		METHODS
Symbol	name,	lex	see below
Reserved			delete
Var	value		% value, setvalue
Const			ctor, delete
Fun	fun		% setfun, funvalue
Math	fun		ctor, delete
			% mathvalue
Again, almost all the code can be copied from chapter 5 and requires little adapting to the class hierarchy. Const and Math should never be deleted; therefore, we can add dummy methods to protect them:
% : Const delete {	// don’t respondTo delete
}
The only new idea are user defined functions which are implemented in the class
Fun:
% Fun setfun {
%casts
if (self —> fun) delete(self —> fun);
self —> fun = fun;
}
If we replace a function definition we must first delete the old expression tree, if any.
% Fun funvalue {
%casts
if (! self —> fun) error("undefined function");
setvalue(self, value); // argument for parameter return exec(self —> fun);
}
In order to compute the function value, we import the argument value so that Parm can use value() to retrieve it as a parameter value. exec() can then compute the function value from the expression tree.


Symtab
We could try to extend a List as a symbol table, but the binary search function used in chapter 5 must be applied to arrays and we only need the methods screen() and install():
// new(Symtab(), minimal—dimension) #include <stddef.h>
% Class Symtab: Object {
const void ** buf;	// const void * buf [dim] size_t dim;	// current buffer dimension
size_t count;	// # elements in buffer
%
void install (_self, const Symbol @ entry);
Symbol @ screen (_self, const char * name, int lex);
%}
The array is allocated just as for a List:
% Symtab ctor {
struct Symtab * self = super_ctor(Symtab(), _self, app);
if (! (self —> dim = va_arg(* app, size_t))) self —> dim = 1;
self —> buf = malloc(self —> dim * sizeof(void *)); assert(self —> buf);
return self;
}
search() is an internal function which uses binary() to search for a symbol with a particular name or to enter the name itself into the table:
static void ** search (struct Symtab * self, const char ** np)
{
if (self —> count >= self —> dim)
{	self —> buf = realloc(self —> buf,
(self —> dim *= 2) * sizeof(void *)); assert(self —> buf);
}
return binary(np, self —> buf, & self —> count,
sizeof(void *), cmp);
}
This is an internal function; therefore, we use a little trick: binary() will look for a symbol, but if it is not found binary() will enter the string at *np rather than a sym- bol. cmp() compares the string to a symbol — if we used a string class like Atom we could implement cmp() with differ():
static int cmp (const void * _key, const void * _elt)
{	const char * const * key = _key; const void * const * elt = _elt;
return strcmp(* key, name(* elt));
}


name() is a Symbol method returning the name of a symbol. We compare it to the string argument of search() and do not create a symbol before we know that the search really is unsuccessful.
With table search and entry in place, the actual Symtab methods are quite sim- ple to implement. install() is called with a second argument produced by new(). This way we can enter arbitrary Symbol objects into the symbol table:
% Symtab install { const char * nm; void ** pp;
%casts
nm = name(entry);
pp = search(self, & nm);
if (* pp != nm)	// found entry delete(* pp);
* pp = (void *) entry;
}
install() is willing to replace a symbol in the table.
% Symtab screen { void ** pp;
%casts
pp = search(self, & name);
if (* pp == name)	// entered name
{	char * copy = malloc(strlen(name) + 1);
assert(copy);
* pp = new(Symbol(), strcpy(copy, name), lex);
}
return * pp;
}
screen() either finds an entry by name or makes a new Symbol with a dynamically stored name. If we later decide that the table entry should rather belong to a sub- class of Symbol we can call install() to replace an entry in the table. While this is a bit inefficient, it requires no new functions for the symbol table interface.
The Abstract Base Classes
Symbol is the base class for symbol table entries. A Symbol consists of a name and a token value for the parser which are both passed in during construction:
Symbol.d
// new(Symbol(), "name", lex)	"name" must not change
% Class Symbol: Object { const char * name; int lex;
%
const char * name (const _self); int lex (const _self);
%}


Symbol.dc
% Symbol ctor {
struct Symbol * self = super_ctor(Symbol(), _self, app);
self —> name = va_arg(* app, const char *); self —> lex = va_arg(* app, int);
return self;
}
We let Symbol assume that external arrangements have been made for a symbol name to be reasonably permanent: either the name is a static string or the name must be saved dynamically before a symbol is constructed with it. Symbol neither saves the name nor deletes it. If screen() saves a name dynamically, and if we decide to replace a symbol using install(), we can simply copy the name from the previous symbol which is deleted by install() and avoid more traffic in dynamic memory. Using a class like Atom would be a much better strategy, however.
The really interesting class is Node, the abstract base class for all parts of an expression tree. All new nodes are collected into a linear list so that we can reclaim them in case of an error:
Node.d
% NodeClass: Class Node: Object { void * next;

%

%—

%+

%}
Node.dc


void sunder (_self); double exec (const _self);
void reclaim (const _self, Method how);

static void * nodes;	// chains all nodes
% Node new {
struct Node * result =
cast(Node(), super_new(Node(), _self, app));
result —> next = nodes, nodes = result; return (void *) result;
}
According to Webster’s, sunder means to ‘‘sever finally and completely or with violence’’ and this is precisely what we are doing:
% Node sunder {
%casts
if (nodes == self)	// first node nodes = self —> next;


else if (nodes)	// other node
{	struct Node * np = nodes;
while (np —> next && np —> next != self) np = np —> next;
if (np —> next)
np —> next = self —> next;
}
self —> next = 0;
}
Before we delete a node, we remove it from the chain:
% Node delete {
%casts
sunder(self); super_delete(Node(), self);
}
Plugging the Memory Leaks
Normally, the parser in parse.c will call delete() after it is done with an expression:
if (setjmp(onError))
{	++ errors; reclaim(Node(), delete);
}
while (gets(buf)) if (scan(buf))
{	void * e = stmt();
if (e)
{	printf("\t%g\n", exec(e)); delete(e);
}
}
If something goes wrong and error() is called, reclaim() is used to apply delete() to all nodes on the chain:
% Node reclaim {
%casts
while (nodes)
how(nodes);
}
This plugs the memory leak described at the beginning of this chapter — MallocDe- bug does not find any leaks, neither immediately after an error nor later. For test purposes we can
reclaim(Node, sunder);
after an error and let MallocDebug demonstrate that we really have lost nodes.
The elegance of the scheme lies in the fact that the entire mechanism is encap- sulated in the base class Node and inherited by the entire expression tree. Given


class functions, we can replace new() for a subtree of the class hierarchy. Replac- ing new() exactly for all nodes, but not for symbols or the symbol table, provides reclamation for broken expressions without damaging variables, functions, and the like.
Technically, reclaim() is declared as a class method. We do not need the ability to overwrite this function for a subclass of Node, but it does leave room for expan- sion. reclaim() permits a choice as to what should be applied to the chain. In case of an error this will be delete(); however, if we save an expression for a user defined function in a Fun symbol, we need to apply sunder() to the chain to keep the next error from wiping out the expression stored in the symbol table. When a function is replaced, setfun() will delete the old expression and delete() still uses sunder() — this is why sunder() does not demand to find its argument on the chain.

11.5Summary
Class Methods are applied to class descriptions, rather than to other objects. We need at least one class method: new() creates objects from a class description.
Just like other methods, class methods can have static or dynamic linkage, but the syntax of ooc only permits static linkage for class methods that apply to the root metaclass. Therefore, the term class method has been introduced here to only describe a method with dynamic linkage that is applied to a class description.
Since there are relatively few class descriptions, we can provide the dynamic linkage for a class method by storing it in the class description itself to which it applies. This has two advantages: we can overwrite class methods for a subclass without introducing a new metaclass to store it; and our basic scheme remains intact where objects point to class descriptions, class descriptions point to their own descriptions, and the latter can all be stored in struct Class, i.e., they will all point to Class, thus completing the class hierarchy in a clean fashion.
Defining new() as a class method for Object rather than as a method with static linkage for Class permits redefining new() for subtrees of the class hierarchy. This can be used for memory allocation tracking, memory sharing, etc. ooc makes no provisions for extending the data part of a class description. If it did, a class method could have local data applicable to its class as a whole and we could count objects per class, etc. static variables in an implementation file are not quite the same because they exist once for the class and all its subclasses.
There is a tradeoff between new() and a constructor. It is tempting to do all the work in new() and leave the constructor empty, but then invariants normally established by the constructor can break once new() is overwritten. Similarly, a constructor is technically capable of substituting a different memory area in place of the one passed in from new() — this was demonstrated in the implementation of Atom in section 2.6 — but a proper life cycle for this memory is difficult to main- tain.

11.6Exercises

141


As a rule of thumb, class methods like new() should only connect an allocation function with a constructor and refrain from doing any initializations themselves. Allocation functions such as allocate() should initialize the class description pointer
— too much can go horribly wrong if they do not. Reclamation functions such as delete() should let the destructor dispose of the resources which the constructor and the object’s life cycle have accumulated, and only pass the empty memory area to a recycler function like free():

allocate()		free()
	anObject	anObject
ctor()		dtor()
	aThing	aThing
new()		delete()
		
aThing
There is a balance: allocate() and free() deal with the same region of memory; by default, new() gives it to its constructor, delete() to its destructor; and the con- structor and destructor only deal with resources represented inside the object. new() and delete() should only be overwritten to interfere with the flow of memory from allocate() to free().

11.6 Exercises
For the ooc parser it makes absolutely no difference, if class methods are described before or after dynamically linked methods in the class description file, i.e., if %+ precedes or follows %. There is, however, a convincing point in favor of the arrangement described in this chapter. Why can the separators not be repeated to achieve an arbitrary mix of both types of methods?
There is a rather significant difference once delete() is implemented with dynamic linkage. What can no longer be passed to delete()?
It is not helpful to move value() back into the abstract base class Symbol and give it dynamic linkage there. mathvalue() is applied to a Math symbol and requires a function argument, value() is applied to a Var or Const symbol and has no use for an argument. Should we use variable argument lists?
We can detect recursion among user defined functions. We can use words like
$1 to support functions with more than one parameter. We can even add parame- ters with names that hide global variables.
If we add a generic pointer to the data area of Class in Object.d class methods can attach a chain of private data areas there. This can be used, e.g., to count objects or to provide object lists per class.