Recursive Programming – Dijkstra 1960

* Updated link to one that is not behind a paywall – thanks to Graham Markall for the catch *

This paper deals with something we take so much for granted that it’s hard to imagine a time when it had yet to be introduced to the world. That time was 1960, the concept is the language runtime stack, and the author of the paper is none other than Dijkstra.

In fact, we’re so familiar with the stack, that it takes a while to get your head around what went before: each subroutine in a program had its own private fixed working space. That is, suppose you write a program that contains 30 subroutines – then there will be 30 reserved areas in memory, one for each subroutine to use.

Dijkstra points out a couple of difficulties with this arrangement:

In the first place, the storage allocations for all the subroutines together will, in general, occupy much more memory space than they ever need simultaneously, and the available memory space is therefore used rather uneconomically. Furthermore – and this is a more serious objection – it is then impossible to call a subroutine while one or more of the previous activations of the same subroutine have not yet come to an end, without losing the possibility of finishing them off properly later on.

In other words, if each subroutine has its own fixed storage area, then *recursive programming* is not possible (you’ll overwrite the state in the one fixed storage area for the subroutine). Since this rather limits the design space of programs, Dijkstra was interested in finding a technique that could eliminate the restriction:

The basic concept of the method is the so-called stack.

Dijkstra describes how to build a stack with a block of memory and a stack pointer – I’m going to assume you’re all familiar with the idea! The following insight is key to the use of the stack in this context:

If we mark off, on a time axis, the moments when a unit is added to or removed from the stack, by using an opening bracket for the addition of a unit, and a closing bracket for its removal, then we obtain a correctly nested bracket structure, in which the opening and closing brackets form pairs in the same way as they do in a normal algebraic expression involving brackets. This is closely related to the circumstance that we can use a stack for storing the intermediate results formed in the evaluation of an arbitrary algebraic expression by means of elementary algebraic operations. In this case, the interest is always restricted to the most recent element in the stack. As the intermediate results are used only once, use of an element implies its removal from the stack.

Take an expression

```
A + (B - C) * (D|E + F)
```

We know that we can record this in reverse polish notation as :

```
A, B, C, -, D, E, |, F, +, *, +
```

The above is well-known, and so elegant that we could not refrain from trying to extend this technique by consistent application of its principles.

The example above assumes that A, B, C etc are numerical values that can be found in memory. But Dijkstra points out that C could equally have been an expression (for example C = (P/(Q-R + S*T )), and by the time we have done evaluating C, the net result would be the same as if we had the value of C accessible directly.

In other words, it is immaterial to the “surroundings” in which the value C is used, whether the value C can be found ready-made in memory, or whether it is necessary to make temporary use of a number of the next stack locations for its evaluation.

Now suppose it wasn’t an expression that appears for the value of C, but a function (to be evaluated by a subroutine): “*this provides a strong argument for arranging the subroutine in such a way that it operates in the first free places of the stack, in just the same way as a compound term written out in full.*”

The stack can be used by a subroutine for its parameters, its local variables, and even anonymous intermediate results created during execution of the subroutine.

Inside the subroutine we store the most anonymous intermediate results in the “top” of the stack in just the same way. Every reference to a local quantiity, however, implies one is interested in a place that is situated deeper within the stack, and here one is interested in random access to the stack places, in other words we must be able to give the places deeper in the stack some kind of address.

The value of that reference point is, ‘to be derived from the value of the stack pointer at the moment of the call.’ With a final flourish, Dijkstra goes on to show that ‘link’ information must be preserved in the stack when calling subroutines, so that regardless of complexity we can pick up exactly where we left off (in the ALU)- this is to include a return address. We can now follow what happens when subroutine A calls subroutine B, and observe that:

In this process, nothings forbids A from being identical with B. The subroutine only has to appear in the memory once, but it may have more than one simultaneous “incarnation” from a dynamic point of view: the “innermost” activation causes the same piece of text to work in a higher part of the stack. Thus the subroutine has developed into a defining element that can be used completely recursively.

Tada!

Note: 1960 is also the year of “Recursive Functions of Symbolic Expressions and their Computation by Machine,” McCarthy’s famous paper which “describes a formalism for defining functions recursively,” and then shows how it can be implemented in the LISP programming system for the IBM 704.