Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices

Seer: leveraging big data to navigate the complexity of performance debugging in cloud microservices Gan et al., ASPLOS'19 Last time around we looked at the DeathStarBench suite of microservices-based benchmark applications and learned that microservices systems can be especially latency sensitive, and that hotspots can propagate through a microservices architecture in interesting ways. Seer is ... Continue Reading

Maelstrom: mitigating datacenter-level disasters by draining interdependent traffic safely and efficiently

Maelstrom: mitigating datacenter-level disasters by draining interdependent traffic safely and efficiently Veeraraghavan et al., OSDI'18 Here’s a really valuable paper detailing four plus years of experience dealing with datacenter outages at Facebook. Maelstrom is the system Facebook use in production to mitigate and recover from datacenter-level disasters. The high level idea is simple: drain traffic ... Continue Reading

Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding

Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding Hundman et al., KDD'18 How do you effectively monitor a spacecraft? That was the question facing NASA’s Jet Propulsion Laboratory as they looked forward towards exponentially increasing telemetry data rates for Earth Science satellites (e.g., around 85 terabytes/day for a Synthetic Aperture Radar satellite). Spacecraft are ... Continue Reading

Log20: Fully automated optimal placement of log printing statements under specified overhead threshold

Log20: Fully automated optimal placement of log printing statements under specified overhead threshold Zhao et al., SOSP’17 Logging has become an overloaded term. In this paper logging is used in the context of recording information about the execution of a piece of software, for the purposes of aiding troubleshooting. For these kind of logging statements ... Continue Reading

DBSherlock: A performance diagnostic tool for transactional databases

DBSherlock: A performance diagnostic tool for transactional databases Yoon et al. SIGMOD ’16 …tens of thousands of concurrent transactions competing for the same resources (e.g. CPU, disk I/O, memory) can create highly non-linear and counter-intuitive effects on database performance. If you’re a DBA responsible for figuring out what’s going on, this presents quite a challenge. ... Continue Reading