PharmaLeaks: Understanding the business of online pharmaceutical affiliate programs - McCoy et al., USENIX Security, 2012 Yesterday we looked at the technology infrastructure supporting spam-based advertising businesses. Today's paper gives a fascinating look at the business model. How this is possible is itself a very interesting story which we'll get to shortly. The authors gained … Continue reading PharmaLeaks: Understanding the business of online pharmaceutical affiliate programs
Month: May 2016
Click Trajectories: End-to-end analysis of the spam value chain
Click Trajectories: End-to-end analysis of the spam value chain - Levchenko et al. IEEE Symposium on Security and Privacy, 2011 This week we're going to be looking at some of the less desirable corners of the internet: spam, malvertisements, click-jacking, typosquatting, and friends. To kick things off, today's paper gives an insight into the end-to-end … Continue reading Click Trajectories: End-to-end analysis of the spam value chain
Dynamic Time Warping averaging of time series allows faster and more accurate classification
Dynamic Time Warping averaging of time series allows faster and more accurate classification - Petitjean et al. ICDM 2014 For most time series classification problems, using the Nearest Neighbour algorithm (find the nearest neighbour within the training set to the query) is the technique of choice. Moreover, when determining the distance to neighbours, we want … Continue reading Dynamic Time Warping averaging of time series allows faster and more accurate classification
Time series classification under more realistic assumptions
Time series classification under more realistic assumptions - Hu et al. ICDM 2013 This paper sheds light on the gap between research results in time series classification, and what you're likely to see if you try to apply the results in the real world. And having identified the gap of course, the authors go on … Continue reading Time series classification under more realistic assumptions
Searching and mining trillions of time series subsequences under Dynamic Time Warping
Searching and mining trillions of time series subsequences under dynamic time warping - Rakthanmanon et al. SIGKDD 2012 What an astonishing paper this is! By 2012, Dynamic Time Warping had been shown to be the time series similarity measure that generally performs the best for matching, but because of its computational complexity researchers and practitioners … Continue reading Searching and mining trillions of time series subsequences under Dynamic Time Warping
Towards parameter-free data mining
Towards Parameter-Free Data Mining - Keogh et al. SIGKDD 2004 Another time series paper today from the Facebook Gorilla references. Keogh et al. describe an incredibly simple and easy to implement scheme that does surprisingly well with clustering, anomaly detection, and classification tasks over time series data. As per the title of the paper, it … Continue reading Towards parameter-free data mining
Finding surprising patterns in a time series database in linear time and space
Finding Surprising Patterns in a Time Series Database in Linear Time and Space - Keogh et al. SIGKDD 2002 In the Facebook Gorilla paper, the authors mentioned a number of additional time series analysis techniques they'd like to add to the system over time. Today's paper is one of them, and it deals with the … Continue reading Finding surprising patterns in a time series database in linear time and space
NOVA: A Log-Structured File System for Hybrid Volatile/Non-Volatile Main Memories
NOVA: A Log-structured file system for hybrid volatile/non-volatile main memories - Xu & Swanson 2016 Another paper looking at the design implications of mixed DRAM and NVMM systems (it's the future!), this time in the context of file systems. (NVMM = Non-volatile Main Memory). Hybrid DRAM/NVMM storage systems present a host of opportunities and challenges … Continue reading NOVA: A Log-Structured File System for Hybrid Volatile/Non-Volatile Main Memories
Uncovering bugs in Distributed Storage Systems during Testing (not in production!)
Uncovering bugs in Distributed Storage Systems during Testing (not in production!) - Deligiannis et al. 2016 We interviewed technical leaders and senior managers in Microsoft Azure regarding the top problems in distributed system development. The consensus was that one of the most critical problems today is how to improve testing coverage so that bugs can … Continue reading Uncovering bugs in Distributed Storage Systems during Testing (not in production!)
BTrDB: Optimizing Storage System Design for Timeseries Processing
BTrDB: Optimizing Storage System Design for Timeseries Processing - Anderson & Culler 2016 It turns out you can accomplish quite a lot with 4,709 lines of Go code! How about a full time-series database implementation, robust enough to be run in production for a year where it stored 2.1 trillion data points, and supporting 119M … Continue reading BTrDB: Optimizing Storage System Design for Timeseries Processing