Leaving Spotify

Febrary 6 was my last day at Spotify. In total I spent more than six years at Spotify and it was an amazing experience. I joined Spotify in Stockholm in 2008, mainly because a bunch of friends from programming competitions had joined already.

From: Erik Bernhardsson

Scala Data Pipelines for Music Recommendations

Chris Johnson‘s presentation from Data Day Texas:

From: Erik Bernhardsson

Everything I learned about technical debt

I just made it to Sweden suffering from jet lag induced insomnia, but this blog post will not cover that. Instead, I will talk a little bit about technical debt. The concept of technical debt always resonated with me, partly because I always like the analogy with “real” debt.

From: Erik Bernhardsson

I already found the best gifs

Just search for “hackers gif“. There you go. Fun for your work emails for the next 500 years. From the awesome movie Hackers. That movie together with The Warriors convinced me that I wanted to live in NYC when I was like… 14 years old.

From: Erik Bernhardsson

A brief history of Hadoop at Spotify

I was talking with some data engineers at Spotify and had a moment of nostalgia. 2008 I was writing my master's thesis at Spotify and had to run a Hadoop job to extract some data from the logs.

From: Erik Bernhardsson

Luigi talk tomorrow

At NYC Data Science meetup! Unfortunately the space is full but the talk will be livestreamed – check out the meetup web page for a link tomorrow.

From: Erik Bernhardsson

Deep learning for… Go

This is the last post about deep learning for chess/go/whatever. But this really cool paper by Christopher Clark and Amos Storkey was forwarded to me by Michael Eickenberg. It's about using convolutional neural networks to play Go.

From: Erik Bernhardsson

Deep learning for… chess (addendum)

My previous blog post about deep learning for chess blew up and made it to Hacker News and a couple of other places. One pretty amazing thing was that the Github repo got 150 stars overnight.

From: Erik Bernhardsson

Deep learning for... chess

I've been meaning to learn Theano for a while and I've also wanted to build a chess AI at some point. So why not combine the two? That's what I thought, and I ended up spending way too much time on it.

From: Erik Bernhardsson

Optimizing things: everything is a proxy for a proxy for a proxy

Say you build a machine learning model, like a movie recommender system. You need to optimize for something. You have 1-5 stars as ratings so let's optimize for mean squared error. Great. Then let's say you build a new model.

From: Erik Bernhardsson

Luigi conquering the world

I keep forgetting to buy a costume for Halloween every year, so this year I prepared and got myself a Luigi costume a month in advance. Only to realize I was going to be out of town the whole weekend.

From: Erik Bernhardsson

Annoying blog post

I spent a couple of hours this weekend going through some pull requests and issues to Annoy, which is an open source C++/Python library for Approximate Nearest Neighbor search. I set up Travis-CI integration and spent some time on one of the issues that multiple people had reported.

From: Erik Bernhardsson

The Filter Bubble is Silly and you Can't Guess What Happened Next

I'm at RecSys 2014, meeting a lot of people and hanging out at talks. Some of the discussions here was about the filter bubble which prompted me to formalize my own thoughts. I firmly believe that it's the role of a system to respect the user's intent.

From: Erik Bernhardsson

Detecting corporate fraud using Benford's law

Note: This is a silly application.

From: Erik Bernhardsson

Running Theano on EC2

Inspired by Sander Dieleman's internship at Spotify, I've been playing around with deep learning using Theano.

From: Erik Bernhardsson

In defense of false positives (why you can't fail with A/B tests)

Many years ago, I used to think that A/B tests were foolproof and all you need to do is compare the metrics for the two groups. The group with the highest conversion rate wins, right?

From: Erik Bernhardsson

Recurrent Neural Networks for Collaborative Filtering

I’ve been spending quite some time lately playing around with RNN’s for collaborative filtering.

From: Erik Bernhardsson

Where do locals go in NYC?

One obvious thing to anyone living in NYC is how tourists cluster in certain areas. I was curious about the larger patterns around this, so I spent some time looking at data. The thing I wanted to understand is: what areas are dominated by tourists?

From: Erik Bernhardsson

How to build up a data team (everything I ever learned about recruiting)

During my time at Spotify, I've reviewed thousands of resumes and interviewed hundreds of people. Lots of them were rejected but lots of them also got offers. Finally, I've also had my share of offers rejected by the candidate.

From: Erik Bernhardsson

The power of ensembles

From my presentation at MLConf, one of the points I think is worth stressing again is how extremely well combining different algorithms works.

From: Erik Bernhardsson

MLConf 2014

Just spent a day at MLConf where I was talking about how we do music recommendations. There was a whole range of great speakers (actually almost 2/3 women which was pretty cool in itself). Here are my slides:

From: Erik Bernhardsson

Music recommendations using cover images (part 1)

Scrolling through the Discover page on Spotify the other day it occurred to me that the album is in fact a fairly strong visual proxy for what kind of content you can expect from it. I started wondering if the album cover can in fact be used for recommendations.

From: Erik Bernhardsson

Luigi success

So Luigi, our open sourced workflow engine in Python, just recently passed 1,000 stars on Github, then shortly after passed mrjob as (I think) the most popular Python package to do Hadoop stuff. This is exciting!

From: Erik Bernhardsson

Welcome Echo Nest!

In case you missed it, we just acquired a company called Echo Nest in Boston. These people have been obsessed with understanding music for the past 8 years since it was founded by Brian Whitman and Tristan Jehan out of MIT Medialab.

From: Erik Bernhardsson