I Wanna Be Developer

Ratio metrics

We run a ton of A/B tests at Spotify and we look at a ton of metrics. Defining metrics is a little bit of an art form. Ideally you want to define success metrics before you run a test to avoid cherry picking metrics.

From: Erik Bernhardsson

1/23/14, 12:00 AM Web Development

Benchmarking nearest neighbor libraries in Python

Radim Rehurek has put together an excellent summary of approximate nearest neighbor libraries in Python. This is exciting, because one of the libraries he's covering, annoy, was built by me. After introducing the problem, he goes through the list of contestants and sticks with five remaining ones.

From: Erik Bernhardsson

1/12/14, 12:00 AM Web Development

More recommender algorithms

I wanted to share some more insight into the algorithms we use at Spotify. One matrix factorization algorithm we have used for a while assumes that we have user vectors $$ bf{a}_u $$ and item vectors $$ bf{b}_i $$ .

From: Erik Bernhardsson

12/20/13, 12:00 AM Web Development

Microsoft's new marketing strategy: give up

I think it's funny how MS at some point realized they are not the cool kids and there's no reason to appeal to that target audience. Their new marketing strategy finally admits what's been long known: the correlation between “business casual” and using Microsoft products:

From: Erik Bernhardsson

12/12/13, 12:00 AM Web Development

Bagging as a regularizer

One thing I encountered today was a trick using bagging as a way to go beyond a point estimate and get an approximation for the full distribution. This can then be used to penalize predictions with larger uncertainty, which helps reducing false positives.

From: Erik Bernhardsson

12/6/13, 12:00 AM Web Development

Model benchmarks

A lot of people have asked me what models we use for recommendations at Spotify so I wanted to share some insights. Here's benchmarks for some models. Note that we don't use all of them in production.

From: Erik Bernhardsson

11/2/13, 12:00 AM Web Development

statself.com

Btw I just put something up online that I spent a couple of evenings in my couch putting together: it's a website where you can track any numerical data on the web. Want to know how many Twitter followers you have?

From: Erik Bernhardsson

10/18/13, 12:00 AM Web Development

Implicit data and collaborative filtering

A lot of people these days know about collaborative filtering.

From: Erik Bernhardsson

9/16/13, 12:00 AM Web Development

Vote for our SXSW panel!

If you have a few minutes, you should check out mine and Chris Johnson‘s panel proposal.

From: Erik Bernhardsson

9/4/13, 12:00 AM Web Development

What's up with music recommendations?

I just answered a Quora question about what, if any, are the differences in the algorithms that are behind recommendations for music and movies.

From: Erik Bernhardsson

8/17/13, 12:00 AM Web Development

3D

Andy Sloane decided to call my 2D visualization and raise it to 3D. (Looks a little weird in the iframe but check out the link). It's based on a LDA model with 200 topics, so the artists tend to stick to clusters where each cluster is a topic.

From: Erik Bernhardsson

8/12/13, 12:00 AM Web Development

2D embedding of 5k artists = WIN

I'm at KDD in Chicago for a few days.

From: Erik Bernhardsson

8/11/13, 12:00 AM Web Development

Delivering Music Recommendations

I've turned into a lazy bastard and I'm just posting presentations on this blog, but here's one from Rohan Singh at Spotify talking about the backend infrastructure of the Discover page.

From: Erik Bernhardsson

8/9/13, 12:00 AM Web Development

ML+Hadoop at NYC Predictive Analytics

I was just at the NYC Predictive Analytics meetup talking about how we build machine learning algorithms using Hadoop to power music recommendations. Great meetup, where we had two speakers, me and Blake Shaw from Foursquare.

From: Erik Bernhardsson

8/3/13, 12:00 AM Web Development

HubSpot's Picture Shows how to Maintain Monocultures in the 21st Century

I thought this article about the company culture at HubSpot is kind of funny. “HubSpot's Awesome Presentation Shows how to Create a 21st Century Culture”. Just FYI: You're not different. You're a bunch of white hipsters aged 25-30 dressed up in the same theme.

From: Erik Bernhardsson

7/28/13, 12:00 AM Web Development

More Luigi: Presentation from OSCON

I was in Portland, OR for a few days hanging out at OSCON. Was fun. I also talked a bit about Luigi: Next week I'm presenting at the NYC Predictive Analytics meetup together with Blake Shaw from Foursquare.

From: Erik Bernhardsson

7/27/13, 12:00 AM Web Development

Optimizing over multinomial distributions

Sometimes you have to maximize some function $$ f(w_1, w_2, ldots, w_n) $$ where $$ w_1 + w_2 + ldots + w_n = 1 $$ and $$ 0 le w_i le 1 $$ . Usually, $$ f $$ is concave and differentiable, so there's one unique global maximum and you can solve it by applying gradient ascent.

From: Erik Bernhardsson

7/24/13, 12:00 AM Web Development

More Luigi!

Continuing in the same spirit of shameless self-promotion, here's some recent Luigi press: Reddit thread A Guide to Python Frameworks for Hadoop (slides from the NYC Hadoop User Group) This presentation from the Open Analytics NYC meetup about how Foursquare uses Luigi Luigi is in the middle of a...

From: Erik Bernhardsson

6/26/13, 12:00 AM Web Development

hdfs2cass

Just open sourced hdfs2cass which is a Hadoop job (written in Java) to do efficient Cassandra bulkloading. The nice thing is that it queries Cassandra for its topology and uses that to partition the data so that each reducer can upload data directly to a Cassandra node.

From: Erik Bernhardsson

6/19/13, 12:00 AM Web Development

NoDoc

We had an unconference at Spotify last Thursday and I added a semi-trolling semi-serious topic about abolishing documentation. Or NoDoc, as I'm going to call this movement. This was meant to be mostly a thought experiment, but I don't see it as complete madness.

From: Erik Bernhardsson

6/16/13, 12:00 AM Web Development

Feeds