Petuum: A New Platform for Distributed Machine Learning on Big Data

Petuum: A New Platform for Distributed Machine Learning on Big Data - Xing et al. 2015 How do you perform machine learning with big models (big here could be 100s of billions of parameters!) over big data sets (terabytes or petabytes)? Take for example state of the art image recognition systems that have embraced large-scale … Continue reading Petuum: A New Platform for Distributed Machine Learning on Big Data

Asynchronous Complex Analytics in a Distributed Dataflow Architecture

Asynchronous Complex Analytics in a Distributed Dataflow Architecture - Gonzalez et al. 2015 Here's a theme we've seen before: the programming model offered by large scale distributed systems doesn't always lend itself to efficient algorithms for solving certain classes of problems. In today's paper, Gonzalez et al. examine the growing gap between efficient machine learning … Continue reading Asynchronous Complex Analytics in a Distributed Dataflow Architecture

Optimizing Search Engines using Clickthrough Data

Optimizing Search Engines using Clickthrough Data - Joachims, 2002 Today's choice is another KDD 'test-of-time' winner. The paper introduced the problem of ranking documents w.r.t. a query using not explicit user feedback but implicit user feedback in the form of clickthrough data. The author presented the Ranking SVM Algorithm to solve the proposed ranking problem. … Continue reading Optimizing Search Engines using Clickthrough Data

A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes

A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes - Lakkaraju et al. 2015 This is the first of a series of papers from the Knowledge Discovery and Data Mining (KDD'15) conference that we'll look at this week. Today's paper is all about helping high school students in the US who … Continue reading A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes

Distributed GraphLab: A framework for machine learning and data mining in the cloud

Distributed GraphLab: A framework for machine learning and data mining in the cloud - Low et al. 2012 Two years on from the initial GraphLab paper we looked at yesterday comes this extension to support distributed graph processing for larger graphs, including data mining use cases. In this paper, we extend the GraphLab framework to … Continue reading Distributed GraphLab: A framework for machine learning and data mining in the cloud

GraphLab: A new framework for parallel machine learning

GraphLab: A new framework for parallel machine learning - Low et al. 2010 In this paper we propose GraphLab, a new parallel framework for ML which exploits the sparse structure and common computational patterns of ML algorithms. GraphLab enables ML experts to easily design and implement efficient scalable parallel algorithms by composing problem specific computation, … Continue reading GraphLab: A new framework for parallel machine learning

A few useful things to know about machine learning

A few useful things to know about machine learning - Domingos 2012 Developing successful machine learning applications requires a substantial amount of 'black art' that is hard to find in textbooks This paper looks at twelve key lessons including pitfalls to avoid, important issues to focus on, and answers to common questions. The paper was … Continue reading A few useful things to know about machine learning