Just a short observation to start the week this week, inspired by the All File Systems are Not Created Equal paper that we looked at last week.
Atomicity (or lack thereof) and (re-)ordering are common issues that crop up again and again in different guises to cause us problems in systems. It all boils down to these two factors:
- What seems to be a single atomic operation at one level of abstraction is often composed of (implemented as) multiple distinct operations at the next level down, and
- What appear to be sequential operations at one level of abstraction may be re-ordered (normally for performance reasons) at the next level down.
These two facts cause visibility issues, and crash recovery issues unless great care is taken. Visibility issues can arise when the results of a single higher level operation become partially visible (the effects of some of the lower level operations are seen, but not others). Visibility issues can also arise when the effects of a later operation are seen before an earlier one (out of order). If we now consider that the system might crash at any point then we also must contend with partial results (partially completed operations) and out of order persistent results that we need to recover from. Oh, and by the way, it’s turtles all the way down…
Examples:
- A put operation on a distributed K-V store that ideally appears atomic (strict linearizability) from the client’s perspective but in fact is composed of multiple updates to a collection of nodes under the covers – exposing anomalies if partial visibility of this work-in-progress occurs (read-your-writes, etc.).
- Out of order visibility of operations for a distributed datastore, leading to violations of causal consistency unless care is taken to prevent them.
- An application level transaction encompassing multiple operations where the effects should either all be visible, or none of them should be visible. (Protecting against partial visibility, and recovering to a consistent state).
- No-force and steal policies in ARIES that re-order when things are persistently stored, and recovery from partially completed transactions.
- Non-atomic file system calls in the POSIX interface, and re-ordering of system calls in the file system.
- Moving from single threaded to concurrent programming models (atomicity and ordering are what make this hard…)
- Non-atomic hardware instructions and relaxed memory models.
So if you’re designing a layered system, or using someone else’s abstraction (even a hardware one) it pays to think about atomicity and ordering.
Careful with the word “atomicity”, it means different things to different people/in different contexts. Do you mean “all or nothing”? Total order?
Also, don’t confuse the total ordering of writes with causality. For instance, serialisability requires a total order but not causal order (e.g. when implemented with 2PC). A client of a serialisable system might submit transactions T1 and T2 in that order, which will be committed in the total order T2 then T1.
It’s a fair cop – I was primarily thinking all-or-nothing since I was interested in the relationship between a single operation at a high layer of abstraction (hence intuitively perceived as atomic by a client of that layer), and the multiple operations at the next layer down typically involved in implementing it….
It’s this high-level, well-informed viewpoint, drawing comparisons and spotting similarities, that marks this blog out as being special. Keep up the good work.
I just want to add to what @zteve has mentioned earlier. It is because of this sort of concise pieces of information – which are lost in the day-to-day humdrum of usual IT yet very relevant if we want to understand and retain the basics – that I read this blog. I have to admit that I don’t understand every topic that is discussed, yet I feel that I have learnt a bit. Many thanks, @adriancoyler