PharmaLeaks: Understanding the business of online pharmaceutical affiliate programs

PharmaLeaks: Understanding the business of online pharmaceutical affiliate programs - McCoy et al., USENIX Security, 2012 Yesterday we looked at the technology infrastructure supporting spam-based advertising businesses. Today's paper gives a fascinating look at the business model. How this is possible is itself a very interesting story which we'll get to shortly. The authors gained ... Continue Reading

Dynamic Time Warping averaging of time series allows faster and more accurate classification

Dynamic Time Warping averaging of time series allows faster and more accurate classification - Petitjean et al. ICDM 2014 For most time series classification problems, using the Nearest Neighbour algorithm (find the nearest neighbour within the training set to the query) is the technique of choice. Moreover, when determining the distance to neighbours, we want ... Continue Reading

Searching and mining trillions of time series subsequences under Dynamic Time Warping

Searching and mining trillions of time series subsequences under dynamic time warping - Rakthanmanon et al. SIGKDD 2012 What an astonishing paper this is! By 2012, Dynamic Time Warping had been shown to be the time series similarity measure that generally performs the best for matching, but because of its computational complexity researchers and practitioners ... Continue Reading

Towards parameter-free data mining

Towards Parameter-Free Data Mining - Keogh et al. SIGKDD 2004 Another time series paper today from the Facebook Gorilla references. Keogh et al. describe an incredibly simple and easy to implement scheme that does surprisingly well with clustering, anomaly detection, and classification tasks over time series data. As per the title of the paper, it ... Continue Reading

NOVA: A Log-Structured File System for Hybrid Volatile/Non-Volatile Main Memories

NOVA: A Log-structured file system for hybrid volatile/non-volatile main memories - Xu & Swanson 2016 Another paper looking at the design implications of mixed DRAM and NVMM systems (it's the future!), this time in the context of file systems. (NVMM = Non-volatile Main Memory). Hybrid DRAM/NVMM storage systems present a host of opportunities and challenges ... Continue Reading

Uncovering bugs in Distributed Storage Systems during Testing (not in production!)

Uncovering bugs in Distributed Storage Systems during Testing (not in production!) - Deligiannis et al. 2016 We interviewed technical leaders and senior managers in Microsoft Azure regarding the top problems in distributed system development. The consensus was that one of the most critical problems today is how to improve testing coverage so that bugs can ... Continue Reading