European Union regulations on algorithmic decision-making and a “right to explanation” Goodman & Flaxman, 2016 In just over a year, the General Data Protection Regulation (GDPR) becomes law in European member states. This paper focuses on just one particular aspect of the new law, article 22, as it relates to profiling, non-discrimination, and the right … Continue reading European Union regulations on algorithmic decision making and a “right to explanation”
How good are query optimizers, really? Leis et al., VLBD 2015 Last week we looked at cardinality estimation using index-based sampling, evaluated using the Join Order Benchmark. Today's choice is the paper that introduces the Join Order Benchmark (JOB) itself. It's a great evaluation paper, and along the way we'll learn a lot about mainstream … Continue reading How good are query optimizers, really?
Cardinality estimation done right: Index-based join sampling Cardinality estimation done right: Index-based join sampling Leis et al., CIDR 2017 Let's finish up our brief look at CIDR 2017 with something closer to the core of database systems research - query optimisation. For good background on this topic a great place to start is Selinger's 1979 … Continue reading Cardinality estimation done right: index-based join sampling
The truth, the whole truth, and nothing but the truth: A pragmatic guide to assessing empirical evaluations Blackburn et al. ACM Transactions on Programming Languages and Systems 2016 Yesterday we looked at some of the ways analysts may be fooled into thinking they've found a statistically significant result when in fact they haven't. Today's paper … Continue reading The truth, the whole truth, and nothing but the truth: a pragmatic guide to assessing empirical evaluations
Toward sustainable insights, or why polygamy is bad for you Binning et al., CIDR 2017 Buckle up! Today we're going to be talking about statistics, p-values, and the multiple comparisons problem. Some good background resources here are: Statistics Done Wrong, by Alex Reinhart p-values on wikipedia Misunderstandings of p-values, also on wikipedia For my own … Continue reading Toward sustainable insights, or why polygamy is bad for you
Data provenance at Internet scale: Architecture, experiences, and the road ahead Chen et al., CIDR 2017 Provenance within the context of a single database has been reasonably well studied. In this paper though, Chen et al., explore what happens when you try to trace provenance in a distributed setting and at larger scale. The context … Continue reading Data provenance at internet scale: architecture, experiences, and the road ahead
Ground: A Data Context Service Hellerstein et al. , CIDR 2017 An unfortunate consequence of the disaggregated nature of contemporary data systems is the lack of a standard mechanism to assemble a collective understanding of the origin, scope, and usage of the data they manage. Put more bluntly, many organisations have only a fuzzy picture … Continue reading Ground: A data context service