Granularity of Locks and Degree of Consistency in a Shared Data Base - Gray et al. 1975 This is part 2 of a 7 part series on (database) 'Techniques Everyone Should Know.' This is a paper of two halves, connected by the common theme of locking. The first part of the paper examines the tradeoff … Continue reading Granularity of Locks and Degree of Consistency in a Shared Data Base – Part I
Tag: Datastores
Databases of all shapes and sizes.
Access Path Selection in a Relational Database Management System
Access Path Selection in a Relational Database Management System - Selinger et al. 1979 This is part 1 of a 7 part series on (database) 'Techniques Everyone Should Know.' System R was a very influential Relational Database Management System (RDBMS) built at the IBM San Jose Research Laboratory starting in 1975. This paper introduces the … Continue reading Access Path Selection in a Relational Database Management System
(Database) Techiques Everyone Should Know
Welcome to 2016! To kick things off for the New Year, I thought we'd dip into the newly updated Red Book. In particular, I'm going to the next few days looking at the papers from Chapter 3, "Techniques Everyone Should Know". From Peter Bailis' introduction to the chapter: In this chapter, we present primary and … Continue reading (Database) Techiques Everyone Should Know
Fast Database Restarts at Facebook
Fast Database Restarts at Facebook - Goel et al. 2014 In security, you're only as secure as your weakest link in the chain. When it comes to agility, you're only as fast as your slowest link in the chain. Updating and evolving a stateless middle tier is usually pretty quick, but what if you need … Continue reading Fast Database Restarts at Facebook
A higher order estimate of the optimum checkpoint interval for restart dumps
A higher order estimate of the optimum checkpoint interval for restart dumps - Daly 2004 TL;DR: if you know how long it takes your system to create a checkpoint/snapshot (δ), and you know the expected mean-time between failures (M), then set the checkpoint interval to be √(2δM) - δ. OK, I grant that today's paper … Continue reading A higher order estimate of the optimum checkpoint interval for restart dumps
FaRM: Fast Remote Memory
FaRM: Fast Remote Memory - Dragojevic, et al. 2014 Yesterday we looked at Facebook's graph store,TAO, that can handle a billion reads/sec and millions of writes/sec. In today's choice a team from Microsoft Research reimplemented TAO, and beat those numbers by an order of magnitude! FaRM’s per-machine throughput of 6.3 million operations per second is … Continue reading FaRM: Fast Remote Memory
TAO: Facebook’s Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social Graph Bronson et al. (Facebook) 2013 A single Facebook page may aggregate and filter hundreds of items from the social graph. We present each user with content tailored to them, and we filter every item with privacy checks that take into account the current viewer. This extreme … Continue reading TAO: Facebook’s Distributed Data Store for the Social Graph
Staring into the abyss: An evaluation of concurrency control with one thousand cores
Staring into the abyss: An evaluation of concurrency control with one thousand cores - Yu et al. 2014 A look at the 7 major concurrency control algorithms for OLTP DBMSs , and how well they perform when scaled to large numbers (1024) of cores. Each algorithm is optimised for the best in-memory performance possible, but … Continue reading Staring into the abyss: An evaluation of concurrency control with one thousand cores
Scaling Concurrent Log-Structured Data Stores
Scaling Concurrent Log-Structured Data Stores - Golan-Gueta et al. 2015 Key-value stores based on log-structured merge trees are everywhere. The original design was intended to mitigate slow disk I/O. Once this is achieved, as we scale to more and more cores the authors find that in-memory contention now becomes the bottleneck (see yesterday's piece on … Continue reading Scaling Concurrent Log-Structured Data Stores
Musketeer – Part II: all for one, and one for all in data processing systems
Musketeer: all for one, one for all in data processing systems - Gog et al. 2015 Musketeer gives you portability of data processing workflows across across data processing systems. It can even analyse your workflow and recommend the best system to run it on, as well as combining systems for different parts of the workflow. … Continue reading Musketeer – Part II: all for one, and one for all in data processing systems