BigDebug: Debugging primitives for interactive big data processing in Spark

BigDebug: Debugging primitives for interactive big data processing in Spark - Gulzar et al. ICSE 2016 BigDebug provides real-time interactive debugging support for Data-Intensive Scalable Computing (DISC) systems, or more particularly, Apache Spark. It provides breakpoints, watchpoints, latency monitoring, forward and backward tracing, crash monitoring, and a real-time fix-and-resume capability. The overheads are low for ... Continue Reading

The O-Ring Theory of DevOps

The O-Ring Theory of Economic Development - Kremer 1993 Something a little different today, loosely based on the paper cited above, but not a direct review of it. I'm hosting a retrospective evening for the GOTO London conference tonight and plan to share some of these ideas there... The pursuit of excellence is no longer ... Continue Reading

Holistic Configuration Management at Facebook

Holistic Configuration Management at Facebook - Tang et al. (Facebook) 2015 This paper gives a comprehensive description of the use cases, design, implementation, and usage statistics of a suite of tools that manage Facebook’s configuration end-to-end, including the frontend products, backend systems, and mobile apps. The configuration for Facebook's site is updated thousands of times ... Continue Reading

App-Bisect: Autonomous healing for microservices-based apps

App-Bisect: Autonomous healing for microservices-based apps - Rajagopalan & Jamjoon 2015 We've become comfortable with the idea of continuous deployment across multiple microservices, but what happens when that deployment introduces a problem? The standard answer comes in two parts: (a) use a canary when rolling out a new version to detect a potential problem before ... Continue Reading