Skip to content

Information distribution aspects of design methodology

October 17, 2016

Information distribution aspects of design methodology Parnas, 1971

We’re continuing with Liskov’s listthis week, and today’s paper is another classic from David Parnas in which you can see some of the same thinking as in ‘On the criteria….’ Parnas talks about the modules of a system (for contemporary feel, we could call them ‘microservices’ once more), how we end up with inadvertent tangling / coupling between microservices, and what we can do to prevent that. One of his key recommendations is that information about a microservice be carefully controlled – releasing information about how it works internally outside of the team that are working on it is not just unhelpful, it is positively harmful. Writing in 1971, this information communication was primarily through documentation, and so this is what Parnas discusses. I wonder if in 2016 he would make the claim “making the source code for a microservice available outside of the team working on it is harmful” ??? That would certainly be a statement likely to cause robust debate!


Update: David Parnas himself posted a comment answering my question. Wow! Here’s his reply:

You wrote, “I wonder if in 2016 he would make the claim “making the source code for a microservice available outside of the team working on it is harmful” ??? That would certainly be a statement likely to cause robust debate!” The answer is YES. Papers that I wrote subsequently talk about the need to design and then document interfaces and then give other developers the interface documentation instead of the code. In fact, you can find that in the 1973 papers. The difference is that I have much better techniques for documentation today than I did forty years ago.

David Lorge Parnas


 

A system has structure: the set of modules (microservices) in the system, and the connections between them.

Many assume that the “connections” are control transfer points, passed parameters, and shared data… Such a definition of “connection” is a highly dangerous oversimplification which results in misleading structure descriptions. The connections between microservices are the assumptions which the microservices make about each other.

Consider a change that needs to be made. “What changes can be made to one microservice without involving change to other services?”

We may make only those changes which do not violate the assumptions made by other microservices about the service being changed.

Making design decisions

During the design and development of a system we make decisions which eliminate some possibilities for system structure. This evolves over time. Three common considerations during the decision process are:

  1. Making sure the system delivers a great experience for its users
  2. Ensuring the system can be delivered as quickly as possible
  3. Making sure the system can easily support future changes

Each of these considerations suggests an optimal partial order for decision making, but those orderings are usually inconsistent!

  • Delivering a great experience suggests using an outside-in approach whereby the external characteristics are decided first (rather than letting them be unnoticed implications of decisions about other aspects of system structure).
  • Delivering the system as quickly as possible requires dividing it early into separate microservices:

Competitive pressures may require the use of large groups to produce a system in a sharply limited period of time. Additional developers speed up a project significantly only after the project has been divided into sub-projects in such a way that separate groups can work with little interaction (i.e. spending significantly less time in inter-group decisions than in intra-group decisions).

The desire to make the division early in the project lifecycle so that the team can ‘get on with it’ encourages a splitting along familiar lines and in agreement with the organisational structure (Conway).

Time pressures encourage groups to make the split before the externals are defined. Consequently, we find some adverse effect on the usability of the product. Haste also makes poor internal structure more likely.

  • When it comes to changeability, Parnas makes an interesting observation: the earlier a decision was made, the more difficult it is likely to be to change it because other parts of the system grow assumptions about it.

These considerations suggest that the early decisions should be those which are the least likely to change; i.e. those based on “universal” truths or reasoning which takes into account little about a particular environment… the possibility of change suggests using the most general information first.

Since the thing that often changes the most is the external characteristics, starting there may make the system harder to change.

Good programmers fight against the system

The crux of Parnas’ argument hinges on this assertion:

A good programmer makes use of the available information given him or her.

Sometimes those uses are obvious: calling a subroutine in another module, or reusing a reference table. Sometimes they are less obvious, for example exploiting knowledge that a list is searched or sorted in a certain order.

Such uses of information have been so costly that we observe a strange reaction. The industry has started to encourage bad programming…. Derogatory names such as “kludger,” “hacker,” and “bit twiddler” are used for the sort of fellow who writes terribly clever programs which cause trouble later on. They are subtly but effectively discouraged by being assigned to work on small independent projects such as application routines (the Siberia of the software world) or hardware diagnostic routines (the coal mines). In both situations the programmer has little opportunity to make use of information about other modules.

(I wonder what the modern Siberia and coal mines would be?)…

Those that remain (the non-bit-twiddlers) are usually poor programmers. While a few refrain from using information because they know it will cause trouble, most refrain because they are not clever enough to notice that the information can be used. Such people also miss opportunities to use facts which should be used. Poor programs result.

A programmer can disastrously increase the connectivity of the system structure by using information he or she possesses about other services.

We must deliberately control information distribution

If you buy Parnas’ argument that if you make information available, good programmers can’t help but make use of it, whether that is in the overall good of the system or not, then an obvious solution presents itself: don’t make information available!

We can avoid many of the problems discussed here by rejecting the notion that design information (or source code?) should be accessible to everyone. Instead we should allow the designers, those who specify the structure, to control the distribution of design information as it is developed.

Say your system depends on an external service, it’s likely that all you know about that service is the documentation for its REST API, or the client SDK built on top. But when you depend on an internal service, you typically know much more…

We should not expect a programmer to decide not to use a piece of information, rather he should not possess information that he should not use.

The last word

I consider the internal restriction of information within development groups to be of far more importance than its restriction from users or competitors. Much of the information in a system document would only harm a competitor if they had it. (They might use it!).

And the modern ‘system document’ is generally the source code, written in a suitably high-level language.

9 Comments leave one →
  1. October 17, 2016 6:53 am

    You wrote, “I wonder if in 2016 he would make the claim “making the source code for a microservice available outside of the team working on it is harmful” ??? That would certainly be a statement likely to cause robust debate!” The answer is YES. Papers that I wrote subsequently talk about the need to design and then document interfaces and then give other developers the interface documentation instead of the code. In fact, you can find that in the 1973 papers. The difference is that I have much better techniques for documentation today than I did forty years ago.

    David Lorge parnas

    • October 17, 2016 8:35 am

      Many thanks for posting the reply. I hope my interpretations of your papers captured the spirit! Very humbled that you would be reading them at all. Thanks, Adrian.

  2. kaptainkipper permalink
    October 17, 2016 9:26 am

    Hi Adrian,

    Great post! One thought/observation – I don’t think we can always use microservice as a another name for module and draw the same conclusions. For example “Delivering the system as quickly as possible requires dividing it early into separate microservices” – I actually think that in many contexts the desire to deliver quickly means microservices might be a terrible idea because of the overhead they bring.

    I think that for some systems (many?), given sufficient care, that microservices can eventually result in faster time to market. But that faster time to market often only occurs after sufficient investment. So in the short term you’ll have to go slower, to go faster in the long term. In a way this is another tradeoff – you want to deliver faster now, but need a platform that allows you to deliver faster tomorrow too, so when delivering today, you need to make sure the platform is good for tomorrow.

    The ‘cost’ of creating a module is much lower than the ‘cost’ of creating and managing a microservice. News tools and platforms can reduce this, but I think that the cost will, for the near future at least, be an order of magnitude greater. The flipside of course is that I think for many situations, the upsides of process separation, independent deployability and scalability etc also can deliver greater upsides too.

    I also found the discussion about exposing code to be fascinating, and wish I’d read this particular paper back when I was writing my book! One of the reasons why I think modules are so infrequently used successfully is often that modern tools make it easy to import symbols from inside other modules – we let the IDE have access to our code, and use features in those IDEs to import things we probably shouldn’t be importing – treating modules more as a black box keep their use more coherent, but developers often rail against this.

    This is also interesting in the context of more people moving towards monorepos, a pattern which I dislike. This is often driven by the desire for people to be able to change more than one microservice at a time – something which I instinctively dislike, and which this paper and discussion now gives me a bit more of a firm intellectual foundation for. So thank you Adrian for the commentary, and thank you David for the original paper!

  3. mikhailfranco permalink
    October 17, 2016 9:33 am

    ‘The Last Word’ sounds like IBM’s strategy for WS-* SOA in particular, and open source in general: release reams and reams of the most ridiculously complex over-engineered ‘standards’ and source code, in the hope that the competition will use it, and hence fail miserably in the Sisyphean task of trying extract any value from it.

  4. Steve Loughran permalink
    October 17, 2016 10:27 am

    I’m glad to see this paper getting more publicity; along with How Buildings Learn it is one of my favourite papers on large system design

    If you look at what we do with Hadoop, we explicitly call out semantic compatibility of interfaces, rather than just language-level API signatures

    https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Compatibility.html#Semantic_compatibility

    That was written with Parnas’s work in mind, and an example of it at work is in the Hadoop FileSystem Specification; my eternal project to try and define what goes on: https://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/filesystem/filesystem.html

    The problem we have, however is not so much “all the source code is there to look at”; believe me, the code for HDFS is too complex to do that in any deep way. The problem we have is behaviours we have unintentionally implemented. Specifically, behaviours downstream code unintentionally coded against -their code worked and all was well- and in that act of use, implicitly add new aspects to that specification. Worst of all, we don’t even know about these expectations until we ship a new release and things break.

    The core example is HBase; there are not a few bugs filed against HDFS with the initial text being “HDFS-XXYY broke HBase”. As a key example, https://issues.apache.org/jira/browse/HADOOP-11708 : “CryptoOutputStream synchronization differences from DFSOutputStream break HBase”. After HDFS Encryption at Rest shipped, it turned out that Hadoop’s client-side HDFS output stream’s write operation was thread safe, even though the Java API spec of java.io.Output
    Stream say “no need”. You can see how the initial reporter proposed addressing this by making the HDFS concurrency optional, to stop people coding against a (possibly accidental) feature. You can also see how I and others rejected that on the basis that it was clearly part of the HDFS behaviour that apps expected -and that all we could was document the behaviour, emphasise that if you want to support HBase you’d better follow that concurrency behaviour -then get a patch in to CryptoOutputStream.

    We do not know the full effective API that other code relies on. Things that worry us: ordering of listing results. Posix: undefined, HDFS: turns out to be sorted; no deliberate decision there. When the data and metadata of a file being written becomes visible to other clients? Is close() on an output stream O(1)? No: its O(outstanding data), which, when writing to an Object Store, can become O(all-data).

    We learn these things as people complain, leaving us to choose between telling them off for using an unintentional implementation detail, or fixing our code and documenting it. And how do we know what semantics people code against we wait.

  5. gasche permalink
    October 17, 2016 4:24 pm

    An interesting paper and blog post, thanks! I think however that the part about “poor programmers” is a terrible way to talk about people, and if I had written a summary myself I would probably not have included it. (The comment on Siberia and coal mines is fun, though.)

  6. October 17, 2016 5:50 pm

    The idea of those 40+ year old papers was to replace information about the code with explicit information about the interface (the one they are supposed to use). People seem to have latched on to the idea of not releasing information about the code (or relying on its obscurity) but they have not been willing to invest the time needed to provide the replacement information. Methods of documenting the intended interface are not getting the attention of developers.

Trackbacks

  1. A design methodology for reliable software systems | the morning paper
  2. Information distribution aspects of design methodology – Parnas, D. L. (1971) – DailyDrip Papers

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: