Barrier-enabled IO stack for Flash storage

Barrier-enabled IO stack for flash storage Won et al., FAST’18 The performance of Flash storage has benefited greatly from concurrency and parallelism - for example, multi-channel controllers, large caches, and deep command queues. At the same time, the time to program an individual Flash cell has stayed fairly static (and even become slightly worse in … Continue reading Barrier-enabled IO stack for Flash storage

Protocol aware recovery for consensus-based storage

Protocol aware recovery for consensus based storage Alagappan et al., FAST’18 Following on from their excellent previous work on ‘All file systems are not created equal’ (well worth a read if you haven’t encountered it yet), in this paper the authors look at how well some of our most reliable protocols — those used in … Continue reading Protocol aware recovery for consensus-based storage

Fail-slow at scale: evidence of hardware performance faults in large production systems

Fail-slow at scale: evidence of hardware performance faults in large production systems Gunawi et al., FAST’18 The first thing that strikes you about this paper is the long list of authors from multiple different establishments. That’s because it’s actually a study of 101 different fail-slow hardware incidents collected across large-scale cluster deployments in 12 different … Continue reading Fail-slow at scale: evidence of hardware performance faults in large production systems

Dynamic word embeddings for evolving semantic discovery

Dynamic word embeddings for evolving semantic discovery Yao et al., WSDM’18 One of the most popular posts on this blog is my introduction to word embeddings with word2vec (‘The amazing power of word vectors’). In today’s paper choice Yao et al. introduce a lovely extension that enables you to track how the meaning of words … Continue reading Dynamic word embeddings for evolving semantic discovery

Can you trust the trend? Discovering Simpson’s paradoxes in social data

Can you trust the trend? Discovering Simpson’s paradoxes in social data Alipourfard et al., WSDM’18 In ‘Same stats, different graphs,’ we saw some compelling examples of how summary statistics can hide important underlying patterns in data. Today’s paper choice shows how you can detect instances of Simpson’s paradox, thus revealing the presence of interesting subgroups, … Continue reading Can you trust the trend? Discovering Simpson’s paradoxes in social data

Putting data in the driver’s seat: optimising earnings for on-demand ride hailing

Putting data in the driver’s seat: optimising earnings for on-demand ride hailing Chaudhari et al., WSDM’18 (The link above is to the ACM Digital Library official version, which may not grant you access when clicked in your email client, but should do if you visit via the blog itself.) There is something deeply rooted in … Continue reading Putting data in the driver’s seat: optimising earnings for on-demand ride hailing