Mosaic: processing a trillion-edge graph on a single machine

May 30, 2017 ~ Adrian Colyer ~ 9 Comments

Mosaic: Processing a trillion-edge graph on a single machine Maass et al., EuroSys'17 Unless your graph is bigger than Facebook's, you can process it on a single machine. With the inception of the internet, large-scale graphs comprising web graphs or social networks have become common. For example, Facebook recently reported their largest social graph comprises ... Continue Reading

BOAT: Building auto-tuners with structured Bayesian optimization

May 18, 2017 ~ Adrian Colyer ~ 2 Comments

BOAT: Building auto-tuners with structured Bayesian optimization Dalibard et al., WWW'17 Due to their complexity, modern systems expose many configuration parameters which users must tune to maximize performance... From the number of machines used in a distributed application, to low-level parameters such as compiler flags, managing configurations has become one of the main challenges faced ... Continue Reading

CherryPick: Adaptively unearthing the best cloud configurations for big data analytics

May 4, 2017November 11, 2019 ~ Adrian Colyer ~ 14 Comments

CherryPick: Adaptively unearthing the best cloud configurations for big data analytics Alipourfard et al., NSDI'17 For big data analytics jobs, especially recurring jobs, finding a good cloud configuration (number and type of machines, CPU, memory ,disk and network options) can make a big different to overall cost and runtimes. Likewise, a poor choice can seriously ... Continue Reading

Improving user perceived page load time using gaze

April 26, 2017November 11, 2019 ~ Adrian Colyer ~ 2 Comments

Improving user perceived page load time using gaze Kelton, Ryoo, et al., NSDI 2017 I feel like I'm stretching things a little bit including this paper in an IoT flavoured week, but it does use at least bridge from the physical world to the virtual, if only via a webcam. What's really interesting here to ... Continue Reading

Stochastic program optimization

March 30, 2017November 11, 2019 ~ Adrian Colyer ~ 9 Comments

Stochastic program optimization Schkufza et al., CACM 2016 Yesterday we saw that DeepCoder can find solutions to simple programming problems using a guided search. DeepCoder needs a custom DSL, and a maximum program length of 5 functions. In 'Stochastic program optimization' Schkufza et al. also use a search strategy to generate code that meets a ... Continue Reading

Enlightening the I/O path: A holistic approach for application performance

March 13, 2017November 11, 2019 ~ Adrian Colyer ~ 12 Comments

Enlightening the I/O Path: A holistic approach for application performance Kim et al., FAST '17 Lots of applications contain a mix of foreground and background tasks. Since we're at the file system level here, for application, think Redis, MongoDB, PostgreSQL and so on. Typically user requests are considered foreground tasks, and tasks such as housekeeping, ... Continue Reading

Self-driving database management systems

January 17, 2017November 11, 2019 ~ Adrian Colyer ~ 3 Comments

Self-driving database management systems Pavlo et al., CIDR 2017 We've previously seen many papers looking into how distributed and database systems technologies can support machine learning workloads. Today's paper choice explores what happens when you do it the other way round - i.e., embed machine learning into a DBMS in order to continuously optimise its ... Continue Reading

Weld: A common runtime for high performance data analytics

January 16, 2017November 11, 2019 ~ Adrian Colyer ~ 8 Comments

Weld: A common runtime for high performance data analytics Palkar et al. CIDR 2017 This is the first in a series of posts looking at papers from CIDR 2017. See yesterday's post for my conference overview. We have a proliferation of data and analytics libraries and frameworks - for example, Spark, TensorFlow, MxNet, Numpy, Pandas, ... Continue Reading

DQBarge: Improving data-quality tradeoffs in large-scale internet services

December 9, 2016November 11, 2019 ~ Adrian Colyer ~ Leave a comment

DQBarge: Improving data-quality tradeoffs in large-scale Internet services Chow et al. OSDI 2106 I'm sure many of you recall the 2009 classic "The Datacenter as a Computer," which encouraged us to think of the datacenter as a warehouse-scale computer. From being glad simply to have such a computer, the bar keeps on moving. We don't ... Continue Reading

REX: A development platform and online learning approach for runtime emergent software systems

December 5, 2016November 11, 2019 ~ Adrian Colyer ~ Leave a comment

REX: A development platform and online learning approach for runtime emergent software systems Porter et al. OSDI 2016 If you can get beyond the (for my taste, ymmv) somewhat grand claims and odd turns of phrase (e.g., “how the software ‘feels’ at a given point in time” => metrics) then there’s something quite interesting at ... Continue Reading

the morning paper

a random walk through Computer Science research, by Adrian Colyer
Made delightfully fast by strattic

Performance