Gandalf: an intelligent, end-to-end analytics service for safe deployment in cloud-scale infrastructure

Gandalf: an intelligent, end-to-end analytics service for safe deployment in cloud-scale infrastructure, Li et al., NSDI'20 Modern software systems at scale are incredibly complex ever changing environments. Despite all the pre-deployment testing you might employ, this makes it really tough to change them with confidence. Thus it's common to use some form of phased rollout, ... Continue Reading

Experiences with approximating queries in Microsoft’s production big-data clusters

Experiences with approximating queries in Microsoft’s production big-data clusters Kandula et al., VLDB'19 I’ve been excited about the potential for approximate query processing in analytic clusters for some time, and this paper describes its use at scale in production. Microsoft’s big data clusters have 10s of thousands of machines, and are used by thousands of ... Continue Reading

RobinHood: tail latency aware caching – dynamic reallocation from cache-rich to cache-poor

RobinHood: tail latency aware caching - dynamic reallocation from cache-rich to cache-poor Berger et al., OSDI'18 It’s time to rethink everything you thought you knew about caching! My mental model goes something like this: we have a set of items that probably follow a power-law of popularity. We have a certain finite cache capacity, and ... Continue Reading

Columnstore and B+ tree – are hybrid physical designs important?

Columnstore and B+ tree - are hybrid physical designs important? Dziedzic et al., SIGMOD'18 Earlier this week we looked at the design of column stores and their advantages for analytic workloads. What should you do though if you have a mixed workload including transaction processing, decision support, and operational analytics? Microsoft SQL Server supports hybrid ... Continue Reading

Medea: scheduling of long running applications in shared production clusters

Medea: scheduling of long running applications in shared production clusters Garefalakis et al., EuroSys'18 (If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site). We’re sticking with schedulers today, and a really interesting system called Medea which is designed ... Continue Reading

ServiceFabric: a distributed platform for building microservices in the cloud

ServiceFabric: a distributed platform for building microservices in the cloud Kakivaya et al., EuroSys'18 (If you don’t have ACM Digital Library access, the paper can be accessed either by following the link above directly from The Morning Paper blog site). Microsoft’s Service Fabric powers many of Azure’s critical services. It’s been in development for around ... Continue Reading