Slicer: Auto-sharding for datacenter applications

December 2, 2016July 31, 2017 ~ adriancolyer ~ 4 Comments

Slicer: Auto-sharding for datacenter applications Adya et al. (Google) OSDI 2016 Another piece of Google's back-end infrastructure is revealed in this paper, ready to spawn some new open source implementations of the same ideas no doubt. Slicer is a general purpose sharding service. I normally think of sharding as something that happens within a (typically … Continue reading Slicer: Auto-sharding for datacenter applications

Smart Reply: Automated response suggestion for email

November 24, 2016July 31, 2017 ~ adriancolyer ~ 5 Comments

Smart Reply: Automated response suggestion for email Kannan, Kaufman, Karach, et al. KDD 2016 I’m sure you’ve come across (or at least heard of) Google Inbox’s smart reply feature for mobile email by now. It’s currently used for 10% of all mobile replies, which must equate to a very large number of messages per day. … Continue reading Smart Reply: Automated response suggestion for email

Mastering the game of Go with deep neural networks and tree search

September 20, 2016July 31, 2017 ~ adriancolyer ~ 4 Comments

Mastering the Game of Go with Deep Neural Networks and Tree Search Silver, Huang et al., Nature vol 529, 2016 Pretty much everyone has heard about AlphaGo’s tremendous Go playing success beating the European champion by 5 games to 0. In all the excitement at the time, less was written about how AlphaGo actually worked … Continue reading Mastering the game of Go with deep neural networks and tree search

Deep neural networks for YouTube recommendations

September 19, 2016July 31, 2017 ~ adriancolyer ~ 5 Comments

Deep Neural Networks for YouTube Recommendations Covington et al, RecSys '16 The lovely people at InfoQ have been very kind to The Morning Paper, producing beautiful looking "Quarterly Editions." Today's paper choice was first highlighted to me by InfoQ's very own Charles Humble. In it, Google describe how they overhauled the YouTube recommendation system using … Continue reading Deep neural networks for YouTube recommendations

Goods: organizing Google’s datasets

July 12, 2016July 31, 2017 ~ adriancolyer ~ 9 Comments

Goods: organizing Google’s datasets Havely et al. SIGMOD 2016 You can (try and) build a data cathedral. Or you can build a data bazaar. By data cathedral I’m referring to a centralised Enterprise Data Management solution that everyone in the company buys into and pays homage to, making a pilgrimage to the EDM every time … Continue reading Goods: organizing Google’s datasets

Distributed representations of sentences and documents

June 1, 2016July 27, 2017 ~ adriancolyer ~ 8 Comments

Distributed representations of sentences and documents - Le & Mikolov, ICML 2014 We've previously looked at the amazing power of word vectors to learn distributed representation of words that manage to embody meaning. In today's paper, Le and Mikolov extend that approach to also compute distributed representations for sentences, paragraphs, and even entire documents. They … Continue reading Distributed representations of sentences and documents

The amazing power of word vectors

April 21, 2016July 27, 2017 ~ adriancolyer ~ 183 Comments

For today's post, I've drawn material not just from one paper, but from five! The subject matter is 'word2vec' - the work of Mikolov et al. at Google on efficient vector representations of words (and what you can do with them). The papers are: Efficient Estimation of Word Representations in Vector Space - Mikolov et … Continue reading The amazing power of word vectors

Maglev: A Fast and Reliable Software Network Load Balancer

March 21, 2016July 27, 2017 ~ adriancolyer ~ 7 Comments

Maglev: A Fast and Reliable Software Network Load Balancer - Eisenbud et al. 2016 Maglev is Google's software load balancer used within all their datacenters. It offers greater scalability and availability than hardware load balancers, enables quick iteration, and is much easier to upgrade. Maglev is a just another distributed system running on the commodity … Continue reading Maglev: A Fast and Reliable Software Network Load Balancer

HyperLogLog in Practice: Algorithmic Engineering of a State of the Art Cardinality Estimation Algorithm

March 17, 2016July 27, 2017 ~ adriancolyer ~ 3 Comments

HyperLogLog in Practice: Algorithmic Engineering of a State of the Art Cardinality Estimation Algorithm - Heule et al. 2013 Continuing on the theme of approximations from yesterday, today's paper looks at what must be one of the best known approximate data structures after the Bloom Filter, HyperLogLog. It's HyperLogLog with a twist though - a … Continue reading HyperLogLog in Practice: Algorithmic Engineering of a State of the Art Cardinality Estimation Algorithm

Secrets, Lies, and Account Recovery: Lessons from the Use of Personal Knowledge Questions at Google

March 14, 2016July 27, 2017 ~ adriancolyer ~ 3 Comments

Secrets, Lies, and Account Recovery: Lessons from the Use of Personal Knowledge Questions at Google - Bonneau et al. 2015 What was your mother's maiden name? What was your city of birth? What was the name of your first school? I don't know about you, but I always groan inwardly when a website asks such … Continue reading Secrets, Lies, and Account Recovery: Lessons from the Use of Personal Knowledge Questions at Google