On Distributed Communications Networks

On Distributed Communications Networks – Baran 1962

Way before it became fashionable to build large-scale distributed systems out of relatively unreliable commodity hardware, Baran was investigating the properties a network built in such a way might have. This paper is very much of its time, and all the more enjoyable for it: want to know how many weapons (each with a 0.5 probability of destroying its target) it takes to bisect one of Baran’s 32-link networks? Read on…

We’ll concentrate on his survivability analysis in a moment, but first of all here are some very prescient statements for 1962:

In communications, as in transportation, it is most economic for many users to share a common resource rather than each to build his own system – particularly when supplying intermittent or occasional service… Therefore we would like to consider the interconnection, one day, of many all digital links to provide a resource optimized for the handling of data for many intermittent users – a new common user system… Considering the size of the market there appears to be an incommensurately small amount of thinking about a national data plant designed primarily for data. Is it time now to start thinking about a new and possibly non-existent public utility, a common user digital data communication plant designed specifically for the transmission of digital data among a large set of subscribers?

I’m not sure it will catch on 😉

Baran defines survivability for a network as “the percentage of stations surviving a physical attack and remaining in electrical connection with the largest single group of surviving stations.” (You have to love all the Cold War overtones in the paper!).

Of the three basic network types: centralised, decentralised, and distributed Baran shows that the distributed version has very remarkable survivability characteristics.

Network types

The ‘redundancy level’ of a network is defined as the number of links in the network, divided by the number of links that would be needed to connect the same nodes in a minimum spanning tree. For the modelling, Baran also assumes ‘perfect switching’ – if you can draw a path between any two nodes in the graph, then you can communicate between them.

Let’s look at a decentralised network under these conditions, and assume not just random failures (Baran makes Chaos Monkey look totally underwhelming!) but an enemy attack with missiles…

To bisect a 32-link width network requires direction of 288 weapons each with a probability of kill, Pk = 0.5, or 160 with a Pk = 0.7, to produce over an 0.9 probability of successfully bisecting the network.

But I know something you don’t know… 🙂

If hidden alternative command is allowed, then the largest single group would still have an expected value of almost 50% of the initial stations surviving intact. If this raid misjudges complete availability of weapons, or complete knowledge of all links in the cross-section, or the effects of the weapons against each and every link, the raid fails.

Baran also looks at the case of firing 2000 weapons at a 1000 node network. Using Monte Carlo simulations, the paper explores the redundancy level – survivability relationship.

Two points are to be noticed.. First, extremely survivable networks can be built using a moderately low redundancy of connectivity level. Redundancy levels on the order of only three permit withstanding extremely heavy level attacks with negligible additional loss to communications. Secondly, the survivability curves have sharp break-points. A network of this type will withstand an increasing attack level until a certain point is reached, beyond which the network rapidly deteriorates. Thus, the optimum degree of redundancy can be chosen as a function of the expected level of attack. Further redundancy buys little.

The initial simulations only looked at node failures, when extended to cover link failures as well, “it appears that what would today be regarded as an unreliable link can be used in a distributed network almost as effectively as perfectly reliable links.”

If n is made sufficiently large, it can be shown that highly survivable system structures can be built – even in the thermonuclear era. In order to build such networks and systems we will have to use a large number of elements. We are interested in knowing how inexpensive these elements may be and still permit the system to operate reliably…. Our future systems design problem is that of building very reliable systems out of the described set of unreliable elements at lowest cost.”

1962! And a pretty good description of exactly the challenge we face today…

I’ll skip the section on using “mass-produced microwave crystal receiver/klystron oscillator units mounted on telegraph poles” as possibly “the cheapest way of building large networks of the type to be described.” 🙂 Instead let me cherry-pick two last elements: the motivation for moving to packet-switched networks, and ‘hot-potato’ routing.

Suppose you are transmitting one line of a 60 word-per-minute teletype message over a high-data ‘express-route’ operating at 1.5Mbps: a 1/3ms burst would be sent every 12 seconds.

Present common carrier communications networks, used for digital transmission, use links and concepts originally designed for another purpose – voice. These systems are built around a frequency division multiplexing link-to-link interface standard. The standard between links is that of data rate. Time division multiplexing appears so natural to data transmission that we might wish to consider an alternative approach – a standardized message block as a network interface standard.

Otherwise, “we may soon reach a point where more time is spent setting the switches in a conventional circuit switched system for short holding-time messages than is required for the actual transmission of the data.”

Standardized data blocks permit many simultaneous users each with widely different bandwidth requirements to economically share a broadband network made up of varied data rate links.

Hot-potato routing is introduced as a way of doing store-and-forward messaging with extremely little storage at the nodes:

Each node will attempt to get rid of its messages by choosing alternate routes if its preferred route is busy or destroyed. Each message is regarded as a ‘hot potato,’ and the nodes are not wearing gloves. Rather than hold the ‘hot potato,’ the node tosses the message to its neighbour, who will now try to get rid of the message.

There’s also a description of how nodes can learn optimum traffic routes to destinations, and how to keep this up to date as parts of the network change / are damaged by hostile missile attacks! Baran introduces a ‘forgetting procedure’ which places more belief upon more recent measurements and less on old measurements. It sounds quite Bayesian to me, though Baran does not use that language.