Mosaic: processing a trillion-edge graph on a single machine

Mosaic: Processing a trillion-edge graph on a single machine Maass et al., EuroSys'17 Unless your graph is bigger than Facebook's, you can process it on a single machine. With the inception of the internet, large-scale graphs comprising web graphs or social networks have become common. For example, Facebook recently reported their largest social graph comprises … Continue reading Mosaic: processing a trillion-edge graph on a single machine

BOAT: Building auto-tuners with structured Bayesian optimization

BOAT: Building auto-tuners with structured Bayesian optimization Dalibard et al., WWW'17 Due to their complexity, modern systems expose many configuration parameters which users must tune to maximize performance... From the number of machines used in a distributed application, to low-level parameters such as compiler flags, managing configurations has become one of the main challenges faced … Continue reading BOAT: Building auto-tuners with structured Bayesian optimization

CherryPick: Adaptively unearthing the best cloud configurations for big data analytics

CherryPick: Adaptively unearthing the best cloud configurations for big data analytics Alipourfard et al., NSDI'17 For big data analytics jobs, especially recurring jobs, finding a good cloud configuration (number and type of machines, CPU, memory ,disk and network options) can make a big different to overall cost and runtimes. Likewise, a poor choice can seriously … Continue reading CherryPick: Adaptively unearthing the best cloud configurations for big data analytics

Enlightening the I/O path: A holistic approach for application performance

Enlightening the I/O Path: A holistic approach for application performance Kim et al., FAST '17 Lots of applications contain a mix of foreground and background tasks. Since we're at the file system level here, for application, think Redis, MongoDB, PostgreSQL and so on. Typically user requests are considered foreground tasks, and tasks such as housekeeping, … Continue reading Enlightening the I/O path: A holistic approach for application performance

Self-driving database management systems

Self-driving database management systems Pavlo et al., CIDR 2017 We've previously seen many papers looking into how distributed and database systems technologies can support machine learning workloads. Today's paper choice explores what happens when you do it the other way round - i.e., embed machine learning into a DBMS in order to continuously optimise its … Continue reading Self-driving database management systems

Weld: A common runtime for high performance data analytics

Weld: A common runtime for high performance data analytics Palkar et al. CIDR 2017 This is the first in a series of posts looking at papers from CIDR 2017. See yesterday's post for my conference overview. We have a proliferation of data and analytics libraries and frameworks - for example, Spark, TensorFlow, MxNet, Numpy, Pandas, … Continue reading Weld: A common runtime for high performance data analytics

DQBarge: Improving data-quality tradeoffs in large-scale internet services

DQBarge: Improving data-quality tradeoffs in large-scale Internet services Chow et al. OSDI 2106 I'm sure many of you recall the 2009 classic "The Datacenter as a Computer," which encouraged us to think of the datacenter as a warehouse-scale computer. From being glad simply to have such a computer, the bar keeps on moving. We don't … Continue reading DQBarge: Improving data-quality tradeoffs in large-scale internet services

REX: A development platform and online learning approach for runtime emergent software systems

REX: A development platform and online learning approach for runtime emergent software systems Porter et al. OSDI 2016 If you can get beyond the (for my taste, ymmv) somewhat grand claims and odd turns of phrase (e.g., “how the software ‘feels’ at a given point in time” => metrics) then there’s something quite interesting at … Continue reading REX: A development platform and online learning approach for runtime emergent software systems