Petuum: A New Platform for Distributed Machine Learning on Big Data - Xing et al. 2015 How do you perform machine learning with big models (big here could be 100s of billions of parameters!) over big data sets (terabytes or petabytes)? Take for example state of the art image recognition systems that have embraced large-scale … Continue reading Petuum: A New Platform for Distributed Machine Learning on Big Data
Tag: Machine Learning
Asynchronous Complex Analytics in a Distributed Dataflow Architecture
Asynchronous Complex Analytics in a Distributed Dataflow Architecture - Gonzalez et al. 2015 Here's a theme we've seen before: the programming model offered by large scale distributed systems doesn't always lend itself to efficient algorithms for solving certain classes of problems. In today's paper, Gonzalez et al. examine the growing gap between efficient machine learning … Continue reading Asynchronous Complex Analytics in a Distributed Dataflow Architecture
Optimizing Search Engines using Clickthrough Data
Optimizing Search Engines using Clickthrough Data - Joachims, 2002 Today's choice is another KDD 'test-of-time' winner. The paper introduced the problem of ranking documents w.r.t. a query using not explicit user feedback but implicit user feedback in the form of clickthrough data. The author presented the Ranking SVM Algorithm to solve the proposed ranking problem. … Continue reading Optimizing Search Engines using Clickthrough Data
Mining High-Speed Data Streams
Mining High-Speed Data Streams - Domingos & Hulten 2000 This paper won a 'test of time' award at KDD'15 as an 'outstanding paper from a past KDD Conference beyond the last decade that has had an important impact on the data mining community.' Here's what the test-of-time committee have to say about it: This paper … Continue reading Mining High-Speed Data Streams
A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes
A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes - Lakkaraju et al. 2015 This is the first of a series of papers from the Knowledge Discovery and Data Mining (KDD'15) conference that we'll look at this week. Today's paper is all about helping high school students in the US who … Continue reading A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes
Distributed GraphLab: A framework for machine learning and data mining in the cloud
Distributed GraphLab: A framework for machine learning and data mining in the cloud - Low et al. 2012 Two years on from the initial GraphLab paper we looked at yesterday comes this extension to support distributed graph processing for larger graphs, including data mining use cases. In this paper, we extend the GraphLab framework to … Continue reading Distributed GraphLab: A framework for machine learning and data mining in the cloud
GraphLab: A new framework for parallel machine learning
GraphLab: A new framework for parallel machine learning - Low et al. 2010 In this paper we propose GraphLab, a new parallel framework for ML which exploits the sparse structure and common computational patterns of ML algorithms. GraphLab enables ML experts to easily design and implement efficient scalable parallel algorithms by composing problem specific computation, … Continue reading GraphLab: A new framework for parallel machine learning
Machine Learning Classification over Encrypted Data
Machine Learning Classification over Encrypted Data - Bost et al. 2015 This is the 2nd of three papers we'll be looking at this week from the NDSS '15 conference held earlier this month in San Diego. When it comes to providing an informed critique of the security techniques applied in this paper, I'm out of … Continue reading Machine Learning Classification over Encrypted Data
The Missing Piece in Complex Analytics
The Missing Piece in Complex Analytics: Low latency scalable model management and serving with Velox - Crankshaw et al. 2015. Analytics at scale can be used to create statistical models for making predictions about the world, but once the data scientists and analysts have done their initial work and a model has been built and … Continue reading The Missing Piece in Complex Analytics
A few useful things to know about machine learning
A few useful things to know about machine learning - Domingos 2012 Developing successful machine learning applications requires a substantial amount of 'black art' that is hard to find in textbooks This paper looks at twelve key lessons including pitfalls to avoid, important issues to focus on, and answers to common questions. The paper was … Continue reading A few useful things to know about machine learning