PlanAlyzer: assessing threats to the validity of online experiments

PlanAlyzer: assessing threats to the validity of online experiments Tosch et al., OOPSLA'19 It’s easy to make experimental design mistakes that invalidate your online controlled experiments. At an organisation like Facebook (who kindly supplied the corpus of experiments used in this study), the state of art is to have a pool of experts carefully review ... Continue Reading

Taiji: managing global user traffic for large-scale Internet services at the edge

Taiji: managing global user traffic for large-scale internet services at the edge Xu et al., SOSP'19 It’s another networking paper to close out the week (and our coverage of SOSP’19), but whereas Snap looked at traffic routing within the datacenter, Taiji is concerned with routing traffic from the edge to a datacenter. It’s been in ... Continue Reading

Scaling symbolic evaluation for automated verification of systems code with Serval

Scaling symbolic evaluation for automated verification of systems code with Serval Nelson et al., SOSP'19 Serval is a framework for developing automated verifiers of systems software. It makes an interesting juxtaposition to the approach Google took with Snap that we looked at last time out. I’m sure that Google engineers do indeed take extreme care ... Continue Reading

The inflection point hypothesis: a principled approach to finding the root cause of a failure

The inflection point hypothesis: a principled debugging approach for locating the root cause of a failure Zhang et al., SOSP'19 It’s been a while since we looked a debugging and troubleshooting on The Morning Paper (here’s a sample of earlier posts on the topic). Today’s paper introduces a root cause of failure detector for those ... Continue Reading