Dremel: interactive analysis of web-scale datasets - Melnik et al. (Google), 2010. Dremel is Google's interactive ad-hoc query system that can run aggregate queries over trillions of rows in seconds. It scales to thousands of CPUs, and petabytes of data. It was also the inspiration for Apache Drill. Dremel borrows the idea of serving trees … Continue reading Dremel: interactive analysis of web-scale datasets
Tag: Analytics
Data analytics
The MADlib Analytics Library
The MADlib Analytics Library - MAD Skills, the SQL - Hellerstein et al. 2012 The way that we use large databases has evolved from being primarily in support of accounting and financial record-keeping, to primarily in support of predictive analytics over a wide range of potentially noisy data. Analytics at scale requires the marriage of … Continue reading The MADlib Analytics Library
Detecting Discontinuities in Large-Scale Systems
Detecting Discontinuities in Large-Scale Systems - Malik et al 2014. The 7th IEEE/ACM International Conference on Utility and Cloud Computing is coming to London in a couple of weeks time. Many of the papers don't seem to be online yet, but here's one that is. Malik et al. tackle the problem of long-term forecasting for … Continue reading Detecting Discontinuities in Large-Scale Systems
Shark: SQL and Rich Analytics at Scale
Shark: SQL and Rich Analytics at Scale, Xin et al 2013. Given the Databricks Spark result reported last week, it seems timely to look at a system built on top of Spark, Shark, that ultimately informed the Spark SQL project. [Shark] leverages a novel distributed memory abstraction to provide a unified engine that can run … Continue reading Shark: SQL and Rich Analytics at Scale