Incremental knowledge base construction using DeepDive

Incremental knowledge base construction using DeepDive Shin et al., VLDB 2015 When I think about the most important CS foundations for the computer systems we build today and will build over the next decade, I think about Distributed systems Database systems / data stores (dealing with data at rest) Stream processing (dealing with data in … Continue reading Incremental knowledge base construction using DeepDive

Write-limited sorts and joins for persistent memory

Write-limited sorts and joins for persistent memory Viglas, VLDB 2014 This is the second of the two research-for-practice papers for this week. Once more the topic is how database storage algorithms can be optimised for NVM, this time examining the asymmetry between reads and writes on NVM. This is premised on Viglas’ assertion that: Writes … Continue reading Write-limited sorts and joins for persistent memory

Let’s talk about storage and recovery methods for non-volatile memory database systems

Let's talk about storage and recovery methods for non-volatile memory database systems Arulraj et al., SIGMOD 2015 Update: fixed a bunch of broken links. I can't believe I only just found out about this paper! It's exactly what I've been looking for in terms of an analysis of the impacts of NVM on data storage … Continue reading Let’s talk about storage and recovery methods for non-volatile memory database systems

DBSherlock: A performance diagnostic tool for transactional databases

DBSherlock: A performance diagnostic tool for transactional databases Yoon et al. SIGMOD ’16 …tens of thousands of concurrent transactions competing for the same resources (e.g. CPU, disk I/O, memory) can create highly non-linear and counter-intuitive effects on database performance. If you’re a DBA responsible for figuring out what’s going on, this presents quite a challenge. … Continue reading DBSherlock: A performance diagnostic tool for transactional databases

Efficiently compiling efficient query plans for modern hardware

Efficiently Compiling Efficient Query Plans for Modern Hardware- Neumann, VLDB 2011 Updated with direct links to Databricks blog post now that it is published. A couple of weeks ago I had a chance to chat with Reynold Xin and Richard Garris from Databricks / Spark at RedisConf, where we were both giving talks. Reynold and … Continue reading Efficiently compiling efficient query plans for modern hardware

BTrDB: Optimizing Storage System Design for Timeseries Processing

BTrDB: Optimizing Storage System Design for Timeseries Processing - Anderson & Culler 2016 It turns out you can accomplish quite a lot with 4,709 lines of Go code! How about a full time-series database implementation, robust enough to be run in production for a year where it stored 2.1 trillion data points, and supporting 119M … Continue reading BTrDB: Optimizing Storage System Design for Timeseries Processing

Gorilla: A fast, scalable, in-memory time series database

Gorilla: A fast, scalable, in-memory time series database - Pelkonen et al. 2015 Error rates across one of Facebook's sites were spiking. The problem had first shown up through an automated alert triggered by an in-memory time-series database called Gorilla a few minutes after the problem started. One set of engineers mitigated the immediate issue. … Continue reading Gorilla: A fast, scalable, in-memory time series database

Granularity of Locks and Degree of Consistency in a Shared Data Base – Part II

Granularity of Locks and Degree of Consistency in a Shared Data Base - Gray et al. 1975 This is part 3 of a 7 part series on (database) 'Techniques Everyone Should Know.' Today we'll look at the second part of this paper which introduces the notion of differing degrees of consistency, and how we can … Continue reading Granularity of Locks and Degree of Consistency in a Shared Data Base – Part II