A brief history of Hadoop at Spotify

I was talking with some data engineers at Spotify and had a moment of nostalgia. 2008 I was writing my master's thesis at Spotify and had to run a Hadoop job to extract some data from the logs.

From: Erik Bernhardsson

Luigi talk tomorrow

At NYC Data Science meetup! Unfortunately the space is full but the talk will be livestreamed – check out the meetup web page for a link tomorrow.

From: Erik Bernhardsson

Deep learning for… Go

This is the last post about deep learning for chess/go/whatever. But this really cool paper by Christopher Clark and Amos Storkey was forwarded to me by Michael Eickenberg. It's about using convolutional neural networks to play Go.

From: Erik Bernhardsson

Deep learning for… chess (addendum)

My previous blog post about deep learning for chess blew up and made it to Hacker News and a couple of other places. One pretty amazing thing was that the Github repo got 150 stars overnight.

From: Erik Bernhardsson

Deep learning for... chess

I've been meaning to learn Theano for a while and I've also wanted to build a chess AI at some point. So why not combine the two? That's what I thought, and I ended up spending way too much time on it.

From: Erik Bernhardsson

Optimizing things: everything is a proxy for a proxy for a proxy

Say you build a machine learning model, like a movie recommender system. You need to optimize for something. You have 1-5 stars as ratings so let's optimize for mean squared error. Great. Then let's say you build a new model.

From: Erik Bernhardsson

Luigi conquering the world

I keep forgetting to buy a costume for Halloween every year, so this year I prepared and got myself a Luigi costume a month in advance. Only to realize I was going to be out of town the whole weekend.

From: Erik Bernhardsson

Annoying blog post

I spent a couple of hours this weekend going through some pull requests and issues to Annoy, which is an open source C++/Python library for Approximate Nearest Neighbor search. I set up Travis-CI integration and spent some time on one of the issues that multiple people had reported.

From: Erik Bernhardsson

The Filter Bubble is Silly and you Can't Guess What Happened Next

I'm at RecSys 2014, meeting a lot of people and hanging out at talks. Some of the discussions here was about the filter bubble which prompted me to formalize my own thoughts. I firmly believe that it's the role of a system to respect the user's intent.

From: Erik Bernhardsson

Detecting corporate fraud using Benford's law

Note: This is a silly application.

From: Erik Bernhardsson

The Strange Loop 2014

Last week I attended The Strange Loop in St Louis.

From: Sean Corfield: An Architect's View

Clojure in the Enterprise?

This was originally posted on corfield.

From: Sean Corfield: An Architect's View

Powered by JavaScript

The first annual Powered by JavaScript conference, organized by Manning Books, took place in St Louis this past week.

From: Sean Corfield: An Architect's View

Running Theano on EC2

Inspired by Sander Dieleman's internship at Spotify, I've been playing around with deep learning using Theano.

From: Erik Bernhardsson

In defense of false positives (why you can't fail with A/B tests)

Many years ago, I used to think that A/B tests were foolproof and all you need to do is compare the metrics for the two groups. The group with the highest conversion rate wins, right?

From: Erik Bernhardsson

Recurrent Neural Networks for Collaborative Filtering

I’ve been spending quite some time lately playing around with RNN’s for collaborative filtering.

From: Erik Bernhardsson

ClojureBridge

Adapted from a post I made on my old blog in January, 2014, about the first few workshops being planned.

From: Sean Corfield: An Architect's View

Some thoughts on Java 8

Originally posted on Google Plus on June 14th, 2014.Why Java 8 might win me back...

From: Sean Corfield: An Architect's View

Where do locals go in NYC?

One obvious thing to anyone living in NYC is how tourists cluster in certain areas. I was curious about the larger patterns around this, so I spent some time looking at data. The thing I wanted to understand is: what areas are dominated by tourists?

From: Erik Bernhardsson

How to build up a data team (everything I ever learned about recruiting)

During my time at Spotify, I've reviewed thousands of resumes and interviewed hundreds of people. Lots of them were rejected but lots of them also got offers. Finally, I've also had my share of offers rejected by the candidate.

From: Erik Bernhardsson

Getting Started

Sometimes it's very enlightening to look back at the beginning of a project to see how things got set up and how we started down the path that led to where we are today.

From: Sean Corfield: An Architect's View

The power of ensembles

From my presentation at MLConf, one of the points I think is worth stressing again is how extremely well combining different algorithms works.

From: Erik Bernhardsson

MLConf 2014

Just spent a day at MLConf where I was talking about how we do music recommendations. There was a whole range of great speakers (actually almost 2/3 women which was pretty cool in itself). Here are my slides:

From: Erik Bernhardsson

Gravatars on a Grails Application

Gravatars in a Grails Application

From: Dan Vega