Azure accelerated networking: SmartNICs in the public cloud

May 1, 2018 ~ Adrian Colyer ~ 5 Comments

Azure accelerated networking: SmartNICs in the public cloud Firestone et al., NSDI'18 We’re still on the ‘beyond CPUs’ theme today, with a great paper from Microsoft detailing their use of FPGAs to accelerate networking in Azure. Microsoft have been doing this since 2015, and hence this paper also serves as a wonderful experience report documenting ... Continue Reading

The evolution of continuous experimentation in software product development

September 29, 2017 ~ Adrian Colyer ~ 14 Comments

The evolution of continuous experimentation in software product development Fabijan et al., ICSE'17 (Author personal version here) If you've been following along with the A/B testing related papers this week and thinking "we should probably do more of that in my company," then today's paper choice is for you. Anchored in experiences at Microsoft, the ... Continue Reading

Seven rules of thumb for web site experimenters

September 26, 2017 ~ Adrian Colyer ~ 6 Comments

Seven rules of thumb for web site experimenters Kohavi et al., KDD'14 Following yesterday's 12 metric interpretation pitfalls, today we're looking at 7 rules of thumb for designing web site experiments. There's a little bit of duplication here, but the paper is packed with great real world examples, and there is some very useful new ... Continue Reading

A dirty dozen: twelve common metric interpretation pitfalls in online controlled experiments

September 25, 2017 ~ Adrian Colyer ~ 10 Comments

A dirty dozen: twelve common metric interpretation pitfalls in online controlled experiments Dmitriev et al., KDD 2017 Pure Gold! Here we have twelve wonderful lessons in how to avoid expensive mistakes in companies that are trying their best to be data-driven. A huge thank you to the team from Microsoft for sharing their hard-won experiences ... Continue Reading

Azure Data Lake Store: a hyperscale distributed file service for big data analytics

July 4, 2017 ~ Adrian Colyer ~ 3 Comments

Azure data lake store: a hyperscale distributed file service for big data analytics Douceur et al., SIGMOD'17 Today's paper takes us inside Microsoft Azure's distributed file service called the Azure Data Lake Store (ADLS). ADLS is the successor to an internal file system called Cosmos, and marries Cosmos semantics with HDFS, supporting both Cosmos and ... Continue Reading

Dhalion: self-regulating stream processing in Heron

June 30, 2017 ~ Adrian Colyer ~ 1 Comment

Dhalion: Self-regulating stream processing in Heron Floratou et al., VLDB 2017 Dhalion follows on nicely from yesterday's paper looking at the modular architecture of Heron, and aims to reduce the "complexity of configuring, managing, and deploying" streaming applications. In particular, streaming applications deployed as Heron topologies, although the authors are keen to point out the ... Continue Reading

Gray failure: the Achilles’ heel of cloud-scale systems

June 15, 2017 ~ Adrian Colyer ~ 18 Comments

Gray failure: the Achilles' heel of cloud-scale systems Huang et al., HotOS'17 If you're going to fail, fail properly dammit! All this limping along in degraded mode, doing your best to mask problems, turns out to be one of the key causes of major availability breakdowns and performance anomalies in cloud-scale systems. Today's HotOS'17 paper ... Continue Reading

Usage patterns and the economics of the public cloud

May 16, 2017 ~ Adrian Colyer ~ 6 Comments

Usage patterns and the economics of the public cloud Kilcioglu et al., WWW'17 Illustrating the huge diversity of topics covered at WWW, following yesterday's look at recovering mobile user trajectories from aggregate data, today's choice studies usage variation and pricing models in the public cloud. The basis for the study is data from 'a major ... Continue Reading

Dependency-driven analytics: a compass for uncharted data oceans

January 20, 2017November 11, 2019 ~ Adrian Colyer ~ 4 Comments

Dependency-driven analytics: a compass for uncharted data oceans Mavlyutov et al. CIDR 2017 Like yesterday's paper, today's paper considers what to do when you simply have too much data to be able to process it all. Forget data lakes, we're in data ocean territory now. This is a problem Microsoft faced with their large clusters ... Continue Reading

Achieving human parity in conversational speech recognition

November 22, 2016November 11, 2019 ~ Adrian Colyer ~ 6 Comments

Achieving Human Parity in Conversational Speech Recognition Xiong et al. Microsoft Technical Report, 2016 The headline story here is that for the first time a system has been developed that exceeds human performance in one of the most difficult of all human speech recognition tasks: natural conversations held over the telephone. This is known as ... Continue Reading

the morning paper

a random walk through Computer Science research, by Adrian Colyer
Made delightfully fast by strattic

Microsoft