Just a short observation to start the week this week, inspired by the All File Systems are Not Created Equal paper that we looked at last week.
Atomicity (or lack thereof) and (re-)ordering are common issues that crop up again and again in different guises to cause us problems in systems. It all boils down to these two factors:
- What seems to be a single atomic operation at one level of abstraction is often composed of (implemented as) multiple distinct operations at the next level down, and
- What appear to be sequential operations at one level of abstraction may be re-ordered (normally for performance reasons) at the next level down.
These two facts cause visibility issues, and crash recovery issues unless great care is taken. Visibility issues can arise when the results of a single higher level operation become partially visible (the effects of some of the lower level operations are seen, but not others). Visibility issues can also arise when the effects of a later operation are seen before an earlier one (out of order). If we now consider that the system might crash at any point then we also must contend with partial results (partially completed operations) and out of order persistent results that we need to recover from. Oh, and by the way, it’s turtles all the way down…
Examples:
- A put operation on a distributed K-V store that ideally appears atomic (strict linearizability) from the client’s perspective but in fact is composed of multiple updates to a collection of nodes under the covers – exposing anomalies if partial visibility of this work-in-progress occurs (read-your-writes, etc.).
- Out of order visibility of operations for a distributed datastore, leading to violations of causal consistency unless care is taken to prevent them.
- An application level transaction encompassing multiple operations where the effects should either all be visible, or none of them should be visible. (Protecting against partial visibility, and recovering to a consistent state).
- No-force and steal policies in ARIES that re-order when things are persistently stored, and recovery from partially completed transactions.
- Non-atomic file system calls in the POSIX interface, and re-ordering of system calls in the file system.
- Moving from single threaded to concurrent programming models (atomicity and ordering are what make this hard…)
- Non-atomic hardware instructions and relaxed memory models.
So if you’re designing a layered system, or using someone else’s abstraction (even a hardware one) it pays to think about atomicity and ordering.