Azure Data Lake Store: a hyperscale distributed file service for big data analytics

Azure data lake store: a hyperscale distributed file service for big data analytics Douceur et al., SIGMOD'17 Today's paper takes us inside Microsoft Azure's distributed file service called the Azure Data Lake Store (ADLS). ADLS is the successor to an internal file system called Cosmos, and marries Cosmos semantics with HDFS, supporting both Cosmos and … Continue reading Azure Data Lake Store: a hyperscale distributed file service for big data analytics

Chimera: Large-Scale Classification Using Machine Learning, Rules, and Crowdsourcing

Chimera: Large-Scale Classification Using Machine Learning, Rules, and Crowdsourcing - Sun et al. 2014 (WalmartLabs) Large-scale classification, where we need to classify hundreds of thousands or millions of items into thousands of classes, is becoming increasingly common in this age of Big Data... So far, however, very little has been published on how large-scale classification … Continue reading Chimera: Large-Scale Classification Using Machine Learning, Rules, and Crowdsourcing

Enterprise Database Applications and the Cloud: A difficult road ahead

Enterprise Database Applications and the Cloud: A difficult road ahead - Stonebraker et al. 2014 In the rush to the cloud, stateless application components are well catered for but state always makes things more complicated. In this paper, Stonebraker et al. set out some of the reasons enterprise database applications present challenges to cloud migration. … Continue reading Enterprise Database Applications and the Cloud: A difficult road ahead

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing - Google 2014 Mesa is another in the tapestry of systems that support Google's advertising business. Previously editions of The Morning Paper have covered Photon, Spanner, F1, and F1's online schema update mechanism. Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related … Continue reading Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing

Detecting Discontinuities in Large-Scale Systems

Detecting Discontinuities in Large-Scale Systems - Malik et al 2014. The 7th IEEE/ACM International Conference on Utility and Cloud Computing is coming to London in a couple of weeks time. Many of the papers don't seem to be online yet, but here's one that is. Malik et al. tackle the problem of long-term forecasting for … Continue reading Detecting Discontinuities in Large-Scale Systems