Semi-supervised sequence learning

Semi-supervised sequence learning - Dai & Le, NIPS 2015. The sequence to sequence learning approach we looked at yesterday has been used for machine translation, text parsing, image captioning, video analysis, and conversational modeling. In Semi-supervised sequence learning, Dai & Le use a clever twist on the sequence-to-sequence approach to enable it to be used … Continue reading Semi-supervised sequence learning

Sequence to sequence learning with neural networks

Sequence to sequence learning with neural networks Sutskever et al. NIPS, 2014 Yesterday we looked at paragraph vectors which extend the distributed word vectors approach to learn a distributed representation of a sentence, paragraph, or document. Today's paper tackles what must be one of the sternest tests of all when it comes to assessing how … Continue reading Sequence to sequence learning with neural networks

Distributed representations of sentences and documents

Distributed representations of sentences and documents - Le & Mikolov, ICML 2014 We've previously looked at the amazing power of word vectors to learn distributed representation of words that manage to embody meaning. In today's paper, Le and Mikolov extend that approach to also compute distributed representations for sentences, paragraphs, and even entire documents. They … Continue reading Distributed representations of sentences and documents

How to build static checking systems using orders of magnitude less code

How to build static checking systems using orders of magnitude less code Brown et al., ASPLOS '16 You start with something simple. Then over time things get more and more complex and before you know it, it's hard to know what's going on. Today's paper is a delightful reminder of the power of stripping back … Continue reading How to build static checking systems using orders of magnitude less code

Why do record/replay tests of web applications break?

Why do Record/Replay Tests of Web Applications Break? - Hammoudi et al. ICST '16 Your web application regression tests created using record/replay tools are fragile and keep breaking. Hammoudi et al. set out to find out why. If we knew that, perhaps we could design mechanisms to automatically repair broken tests, or to build more … Continue reading Why do record/replay tests of web applications break?

HCloud: Resource-efficient provisioning in shared cloud systems

HCloud: Resource-efficient provisioning in shared cloud systems - Delimitrou & Kozyrakis, ASPLOS '16 Do you use the public cloud? If so, I'm pretty confident you're going to find today's paper really interesting. Delimitrou & Kozyrakis study the provisioning strategies that provide the best balance between performance and cost. The sweet spot it turns out, is … Continue reading HCloud: Resource-efficient provisioning in shared cloud systems

SocialHash: An assignment framework for optimizing distributed systems operations on social networks

SocialHash: An assignment framework for optimizing distributed systems operations on social networks - Shalita et al., NSDI '16 Large scale systems frequently need to partition resources or load across multiple nodes. How you do that can make a big difference. A common approach is to use a random distribution (e.g. via consistent hashing), which usually … Continue reading SocialHash: An assignment framework for optimizing distributed systems operations on social networks

StreamScope: Continuous reliable distributed processing of big data streams

StreamScope: Continuous Reliable Distributed Processing of Big Data Streams - Lin et al. NSDI '16 An emerging trend in big data processing is to extract timely insights from continuous big data streams with distributed computation running on a large cluster of machines. Examples of such data streams include those from sensors, mobile devices, and on-line … Continue reading StreamScope: Continuous reliable distributed processing of big data streams

Efficiently compiling efficient query plans for modern hardware

Efficiently Compiling Efficient Query Plans for Modern Hardware- Neumann, VLDB 2011 Updated with direct links to Databricks blog post now that it is published. A couple of weeks ago I had a chance to chat with Reynold Xin and Richard Garris from Databricks / Spark at RedisConf, where we were both giving talks. Reynold and … Continue reading Efficiently compiling efficient query plans for modern hardware