More Luigi!
Elias Freider just talked about Luigi at PyData 2013: The presentation above is much better than one I put together a few weeks ago. In case anyone is interested I'll include it too:
From: Erik Bernhardsson
Elias Freider just talked about Luigi at PyData 2013: The presentation above is much better than one I put together a few weeks ago. In case anyone is interested I'll include it too:
From: Erik Bernhardsson
I recently came across this paper describing how they do ML at Twitter. TL;DR Their approach is pretty interesting. Everything is a Pig workflow and then they do everything as UDF's. This approach seems pretty interesting.
From: Erik Bernhardsson
This article from today in Mashable describes some of the fun stuff I get to work with: Erik Bernhardsson is technical lead at Spotify, where he helped to build a music recommendation system based on large-scale machine learning algorithms, mainly matrix factorization of big matrices using Hadoop.
From: Erik Bernhardsson
Slides from the talk. Slightly edited because (a) some of the slides make little sense taken out of context (b) Slideshare seem to have problem converting some of the stuff. Collaborative filtering at Spotify from Erik Bernhardsson
From: Erik Bernhardsson
From the NYC Machine Learning talk I had last week: Haven't looked at it yet except briefly. Unfortunately the quality isn't the best.
From: Erik Bernhardsson
The Economist just published an article called The best, the worst and the ugly.
From: Erik Bernhardsson
This was posted on the Twitter Engineering blog a few days ago: Dimension Independent Similarity Computation (DISCO) I just glanced at the paper, and there's some cool stuff going on from a theoretical perspective.
From: Erik Bernhardsson
Not sure how I managed to miss this, but I'm watching this Tumblr presentation and they talk about their projects named after Arrested Development topics: Gob, Parmesan, Buster, Jetpants, Oscar, George and Motherboy. Still, the best software project name is probably still Apple's BHA.
From: Erik Bernhardsson
Something that pops up pretty frequently is to implement time decay, especially where you have recursive chains of jobs.
From: Erik Bernhardsson
I'm shamelessly promoting my first major open source project.
From: Erik Bernhardsson
In this article, I will show you how to create SQL insert statements from a spreadsheet
From: Dan Vega
These are some blog posts which have gotten a disproportionate amount of traffic (10,000+ page views): 2022 We are still early with the cloud: why software development is overdue for a change 2021 Storm in the stratosphere: how the cloud will be reshuffled Building a data team at a mid-stage star...
From: Erik Bernhardsson