StreamScope: Continuous Reliable Distributed Processing of Big Data Streams - Lin et al. NSDI '16 An emerging trend in big data processing is to extract timely insights from continuous big data streams with distributed computation running on a large cluster of machines. Examples of such data streams include those from sensors, mobile devices, and on-line … Continue reading StreamScope: Continuous reliable distributed processing of big data streams
Year: 2016
Efficiently compiling efficient query plans for modern hardware
Efficiently Compiling Efficient Query Plans for Modern Hardware- Neumann, VLDB 2011 Updated with direct links to Databricks blog post now that it is published. A couple of weeks ago I had a chance to chat with Reynold Xin and Richard Garris from Databricks / Spark at RedisConf, where we were both giving talks. Reynold and … Continue reading Efficiently compiling efficient query plans for modern hardware
The landscape of domain name typosquatting: techniques and countermeasures
The landscape of domain name typosquatting: techniques and countermeasures - Spaulding et al. arXiv upload 9 Mar 2016. We round up our series of posts on internet deceptions by looking at domain squatting. My "favourite" advanced technique is bitsquatting, which turns out to be a great demonstration of the inevitable failures that occur with sufficient … Continue reading The landscape of domain name typosquatting: techniques and countermeasures
Understanding malvertising through ad-injecting browser extensions
Understanding malvertising through ad-injecting browser extensions- Xing et al., WWW 2015. Be careful what browser extensions you install. Some ad networks have started to offer browser extension developers an opportunity to monetise their work, and in this study Xing et al. show that of the 292 Chrome browser extensions in their survey which inject ads, … Continue reading Understanding malvertising through ad-injecting browser extensions
Knowing your enemy: understanding and detecting malicious web advertising
Knowing your enemy: understanding and detecting malicious web advertising - Li at al. CCS, 2012 ... hackers and con-artists have found web ads to be a low-cost and highly effective means to conduct malicious and fraudulent activities. In this paper, we broadly refer to such ad-related malicious activities as malvertising, which can happen to any … Continue reading Knowing your enemy: understanding and detecting malicious web advertising
PharmaLeaks: Understanding the business of online pharmaceutical affiliate programs
PharmaLeaks: Understanding the business of online pharmaceutical affiliate programs - McCoy et al., USENIX Security, 2012 Yesterday we looked at the technology infrastructure supporting spam-based advertising businesses. Today's paper gives a fascinating look at the business model. How this is possible is itself a very interesting story which we'll get to shortly. The authors gained … Continue reading PharmaLeaks: Understanding the business of online pharmaceutical affiliate programs
Click Trajectories: End-to-end analysis of the spam value chain
Click Trajectories: End-to-end analysis of the spam value chain - Levchenko et al. IEEE Symposium on Security and Privacy, 2011 This week we're going to be looking at some of the less desirable corners of the internet: spam, malvertisements, click-jacking, typosquatting, and friends. To kick things off, today's paper gives an insight into the end-to-end … Continue reading Click Trajectories: End-to-end analysis of the spam value chain
Dynamic Time Warping averaging of time series allows faster and more accurate classification
Dynamic Time Warping averaging of time series allows faster and more accurate classification - Petitjean et al. ICDM 2014 For most time series classification problems, using the Nearest Neighbour algorithm (find the nearest neighbour within the training set to the query) is the technique of choice. Moreover, when determining the distance to neighbours, we want … Continue reading Dynamic Time Warping averaging of time series allows faster and more accurate classification
Time series classification under more realistic assumptions
Time series classification under more realistic assumptions - Hu et al. ICDM 2013 This paper sheds light on the gap between research results in time series classification, and what you're likely to see if you try to apply the results in the real world. And having identified the gap of course, the authors go on … Continue reading Time series classification under more realistic assumptions
Searching and mining trillions of time series subsequences under Dynamic Time Warping
Searching and mining trillions of time series subsequences under dynamic time warping - Rakthanmanon et al. SIGKDD 2012 What an astonishing paper this is! By 2012, Dynamic Time Warping had been shown to be the time series similarity measure that generally performs the best for matching, but because of its computational complexity researchers and practitioners … Continue reading Searching and mining trillions of time series subsequences under Dynamic Time Warping