Toward sustainable insights, or why polygamy is bad for you

Toward sustainable insights, or why polygamy is bad for you Binning et al., CIDR 2017 Buckle up! Today we're going to be talking about statistics, p-values, and the multiple comparisons problem. Some good background resources here are: Statistics Done Wrong, by Alex Reinhart p-values on wikipedia Misunderstandings of p-values, also on wikipedia For my own ... Continue Reading

SnappyData: A unified cluster for streaming, transactions, and interactive analytics

SnappyData: A unified cluster for streaming, transactions, and interactive analytics Mozafari et al., CIDR 2017 Update: fixed broken paper link, thanks Zteve. On Monday we looked at Weld which showed how to combine disparate data processing and analytic frameworks using a common underlying IR. Yesterday we looked at Peloton that adapts to mixed OLTP and ... Continue Reading

Searching and mining trillions of time series subsequences under Dynamic Time Warping

Searching and mining trillions of time series subsequences under dynamic time warping - Rakthanmanon et al. SIGKDD 2012 What an astonishing paper this is! By 2012, Dynamic Time Warping had been shown to be the time series similarity measure that generally performs the best for matching, but because of its computational complexity researchers and practitioners ... Continue Reading

Towards parameter-free data mining

Towards Parameter-Free Data Mining - Keogh et al. SIGKDD 2004 Another time series paper today from the Facebook Gorilla references. Keogh et al. describe an incredibly simple and easy to implement scheme that does surprisingly well with clustering, anomaly detection, and classification tasks over time series data. As per the title of the paper, it ... Continue Reading