There is no magic trick

(Warning: super speculative, feel free to ignore) As Yogi Berra said, “It's tough to make predictions, especially about the future”. Unfortunately predicting is hard, and unsurprisingly people look for the Magic Trick™ that can resolve all the uncertainty.

From: Erik Bernhardsson

Installing TensorFlow on AWS

Curious about Google's newly released TensorFlow? I don't have a beefy GPU machine, so I spent some time getting it to run on EC2. The steps on how to reproduce it are pretty brutal and I wouldn't recommend going through it unless you want to waste five hours of your live.

From: Erik Bernhardsson

Looking for smart people

I haven't mentioned what I'm currently up to. Earlier this year I left Spotify to join a small startup called Better. We're going after one of the biggest industries in the world that also turns out to be completely broken.

From: Erik Bernhardsson

MCMC for marketing data

The other day I was looking at marketing spend broken down by channel and wanted to compute some simple uncertainty estimates. I have data like this: <th> Total spend </th> <th> Transactions </th> Channel A <td> 2292.

From: Erik Bernhardsson

Interview with a Data Scientist: Erik Bernhardsson

I was featured in Peadar Coyle's interview series interviewing various “data scientists” – which is kind of arguable since (a) all the other ppl in that series are much cooler than me (b) I'm not really a data scientist.

From: Erik Bernhardsson

Nearest neighbors and vector models – epilogue – curse of dimensionality

This is another post based on my talk at NYC Machine Learning. The previous two parts covered most of the interesting parts, but there are still some topics left to be discussed. To go back and read the meaty stuff, check out

From: Erik Bernhardsson

Nearest neighbors and vector models – part 2 – algorithms and data structures

This is a blog post rewritten from a presentation at NYC Machine Learning on Sep 17. It covers a library called Annoy that I have built that helps you do nearest neighbor queries in high dimensional spaces.

From: Erik Bernhardsson

Nearest neighbor methods and vector models – part 1

This is a blog post rewritten from a presentation at NYC Machine Learning last week. It covers a library called Annoy that I have built that helps you do (approximate) nearest neighbor queries in high dimensional spaces.

From: Erik Bernhardsson

Presentations about Spotify music recommendations

A couple of people in my old team have been around talking about how Spotify does music recommendations and put together some quite good presentations. First one is Neville Li's presentation about Scala Data Pipelines @ Spotify:

From: Erik Bernhardsson

Antipodes

I was playing around with D3 last night and built a silly visualization of antipodes and how our intuitive understanding of the world sometimes doesn't make sense.

From: Erik Bernhardsson

Software Engineers and Automation

Every once in a while when talking to smart people the topic of automation comes up. Technology has made lots of occupations redundant, so what's next? Switchboard operator, a long time ago What about software engineers?

From: Erik Bernhardsson

coin2dice

Here's a problem that I used to give to candidates. I stopped using it seriously a long time ago since I don't believe in puzzles, but I think it's kind of fun. Let's say you have a function that simulates a random coin flip.

From: Erik Bernhardsson

Benchmark of Approximate Nearest Neighbor libraries

Annoy is a library written by me that supports fast approximate nearest neighbor queries. Say you have a high (1-1000) dimensional space with points in it, and you want to find the nearest neighbors to some point.

From: Erik Bernhardsson

More Luigi alternatives

The workflow engine battle has intensified with some more interesting entries lately! Here are a couple I encountered in the last few days. I love that at least two of them are direct references to Luigi!

From: Erik Bernhardsson

3D in D3

I have spent some time lately with D3. It's a lot of fun to build interactive graphs. See for instance this demo (will provide a longer writeup soon). D3 doesn't have support for 3D but you can do projections into 2D pretty easily.

From: Erik Bernhardsson

The hardest challenge about becoming a manager

Note: this post is full of pseudo-psychology and highly speculative content. Like most fun stuff! I became a manager back in 2009. Being a developer is fun. You have this very tangible way to measure yourself.

From: Erik Bernhardsson

The lane next to you is more likely to be slower than yours

Saw this link on Hacker News the other day: The Highway Lane Next to Yours Isn’t Really Moving Any Faster The article describes a phenomenon unique to traffic where cars spread out when they go fast and get more compact when they go slow.

From: Erik Bernhardsson

Better precision and faster index building in Annoy

Sometimes you have these awesome insights. A few days ago I got an idea for how to improve index building in Annoy. For anyone who isn't acquainted with Annoy – it's a C++ library with Python bindings that provides fast high-dimensional nearest neighbor search.

From: Erik Bernhardsson

Annoy – now without Boost dependencies and with Python 3 Support

Annoy is a C++/Python package I built for fast approximate nearest neighbor search in high dimensional spaces.

From: Erik Bernhardsson

Ping the world

I just pinged a few million random IP addresses from my apartment in NYC. Here's the result: Some notes: What's going on with Sweden? Too much torrenting? Ireland is likewise super slow, but not Northern Ireland Eastern Ukraine is also super slow, maybe not surprising given current events.

From: Erik Bernhardsson

Black Box Machine Learning in the Cloud

There's a bunch of companies working on machine learning as a service. Some old companies like Google, but now also Amazon and Microsoft. Then there's a ton of startups: PredictionIO ($2.7M funding), BigML ($1.6M funding), Clarifai, etc, etc.

From: Erik Bernhardsson

It's called Berkson's paradox!

As noted by multiple tweets, my previous post describes a phenomenon denoted Berkson's paradox. Here's another example: Why Are Handsome Men Such Jerks?

From: Erik Bernhardsson

Norvig's claim that programming competitions correlate negatively with being good on the job

I saw a bunch of tweets over the weekend about Peter Norvig claiming there's a negative correlation between being good at programming competitions and being good at the job. There were some decent Hacker News comments on it.

From: Erik Bernhardsson

Pinterest open sources Pinball

Pinterest just open sourced Pinball which seems like an interesting Luigi alternative. There's two blog posts: Pinball: Building workflow management (from 2014) and Open-sourcing Pinball (from this week). The author has a comment in the comments thread on Hacker News:

From: Erik Bernhardsson

The relationship between commit size and commit message size

Wow I guess it was more than a year ago that I tweeted this. Crazy how time flies by. Anyway, here's my rationale: When I update one line of code I feel like I have to put in a long explanation about its side effects, why it's fully backwards compatible, and why it fixes some issue #xyz.

From: Erik Bernhardsson