Omid reloaded: scalable and highly-available transaction processing

Omid, reloaded: scalable and highly-available transaction processing Shacham et al., FAST '17 Omid is a transaction processing service powering web-scale production systems at Yahoo that digest billions of events per day and push them into a real-time index. It's also been open-sourced and is currently incubating at Apache as the Apache Omid project. What's interesting ... Continue Reading

Deconstructing Xen

Deconstructing Xen Shi et al., NDSS 2017 Unfortunately, one of the most widely-used hypervisors, Xen, is highly susceptible to attack because it employs a monolithic design (a single point of failure) and comprises a complex set of growing functionality including VM management, scheduling, instruction emulation, IPC (event channels), and memory management. As of v4.0, Xen ... Continue Reading

Enlightening the I/O path: A holistic approach for application performance

Enlightening the I/O Path: A holistic approach for application performance Kim et al., FAST '17 Lots of applications contain a mix of foreground and background tasks. Since we're at the file system level here, for application, think Redis, MongoDB, PostgreSQL and so on. Typically user requests are considered foreground tasks, and tasks such as housekeeping, ... Continue Reading

Chronix: Long term storage and retrieval technology for anomaly detection in operational data

Chronix: Long term storage and retrieval technology for anomaly detection in operational data Lautenschlager et al., FAST 2017 Chronix (http://www.chronix.io/ ) is a time-series database optimised to support anomaly detection. It supports a multi-dimensional generic time series data model and has built-in high level functions for time series operations. Chronix also a scheme called "Date-Delta-Compaction" (DDC) ... Continue Reading

Redundancy does not imply fault tolerance: analysis of distributed storage reactions to single errors and corruptions

Redundancy does not imply fault tolerance: analysis of distributed storage reactions to single errors and corruptions Ganesan et al., FAST 2017 It's a tough life being the developer of a distributed datastore. Thanks to the wonderful work of Kyle Kingsbury (aka, @aphyr) and his efforts on Jepsen.io, awareness of data loss and related issues in ... Continue Reading