On the duality of operating system structures

On the Duality of Operating System Structures – Lauer and Needham, 1978

The pendulum currently says “threads and locks are bad, events are good.” Vigourous defences are mounted in favour of one system over the other, and manifestos are written. Nowadays this debate rages over the best way to build applications and frameworks, but it once raged over the best way to build operating systems. When Lauer and Needham looked at this question in 1978 they reached a conclusion that I suspect is still as surprising and controversial today as it was then: at the end of the day, they’re equivalent models (duals).

Our analysis is likely to be controversial. Like those of any empirical science, our conclusions cannot be accepted without a lot of thought and supporting experiments. We have found that the duality between the two categories of system design is a notion which defies belief amongst a non-trivial sample of our colleagues. The observations about the similarity of program logic, code, and performance are particularly hard to accept when the universe of discourse is not one of naturally occurring objects but man-made ones.

Lauer and Needham called the two classes of system Message-Oriented and Procedure-Oriented, and observed that most systems tend to use one style or the other.

Message-oriented systems are … characterized by facilities for passing messages (or events, or whatever they might be called in a particular system) easily and efficiently among processes. There also convenient facilities for queuing messages at destination processes until they can be acted upon. Processes are provided with primitive operations to send messages, wait for any message, wait for a particular class of message, and examine the state of the message queue.

In this style processes tend to be associated with system resources, and the needs of applications which the system exists to serve are encoded into data to be passed around in messages. “This style of system architecture is most common in the world of real-time systems and process control.” For those who think they’re breaking new ground by designing systems this way, the authors have news for you:

there are also a number of general purpose operating systems implemented in this way, including, for example, IBM’s OS/360.

The canonical message model defined in the paper includes messages and message identifiers, message channels and ports, operations for sending and receiving messages, and processes consisting of local data and algorithms that define message ports and receive messages on channels.

Note that if a whole system is built according to this style, then the sole means of interaction among the components of that system is by means of the message facility. Each process can operate in its own address space without interference from the others. Because of the serial way in which requests are handled, there is never any need to protect the state information of a process from multiple, simultaneous access and updating.

Procedure-oriented sytems by contrast are characterized by a protection and addressing mechanism oriented towards procedures and efficient procedure call facilities.

Cooperation among processes is achieved by some form of locks, semaphores, monitors, or other synchronizing data structures (we will use the term lock as a generic identification of these).

Sychronization of processes and queueing for congested resources occurs in the form of queues of processes waiting for locks associated with the corresponding data structures. Data is shared directly among processes, and processes tend to lock only small parts of the data structures for relatively short periods of time.

If a whole system is built in this style, then the sole means of interaction among its components is procedural. Processes move from one context to another by means of the procedure call facility across module boundaries, and they use asynchronous calls to stimulate concurrent activity. They depend upon monitor locks and condition variables to keep out of the way of each other. Thus no process can be
associated with a single address space unless that space be the whole system.

By showing how the primitives from one type of system correspond to those in the other, Lauer and Needham derive three observations:

  1. The two models are duals of each other. That is, a program or subsystem constructed
    strictly according to the primitives defined by one model can be mapped directly into
    a dual program or subsystem which fits the other model.

  2. The dual programs or subsystems are logically identical to each other. They can also be made textually very similar, differing only in non-essential details.

  3. The performance of a program or subsystem from one model, as reflected by its queue
    lengths, waiting times, service rates, etc. is identical to that of its dual system, given identical scheduling strategies. Furthermore, the primitive operations provided by
    the operating system of one model can be made as efficient as their duals of the other
    model.

You can follow the basic thrust of the argument by thinking about an object-oriented program, in which method invocations are replaced by message-sends. The OO community often uses language which expresses the duality in this context.

The principal conclusion we will draw from these observations is that the considerations for choosing which model to adopt in a given system are not found in the applications which that system is meant to support Instead, they lie in the substrate upon which the system is built and are a function of which set of primitive operations and mechanisms are easier to build or better suited to the constraints imposed by the machine architecture and hardware.

Which means that many heated discussions over which approach is better boil down to issues ‘which this analysis suggests are irrelevant!”

If this is the case, then the arguments about processes and synchronization which are often found in the corridors of organizations actively designing a new system (and which occasionally find their way into the literature) take on a decidedly non-technical tone. In our experience, they tend to be highly emotional, they consume far more energy than any other part of the system, they occasionally lead to organizational difficulties, and they are about issues which this analysis suggests are irrelevant.

There’s also a nice side-observation in the paper that waits in the middle of a process might be the concurrent equivalent of ‘goto’ (ii.e. a bad practice!).

More on this theme tomorrow…