Kaggle Learns, Too

This was definitely the buzziest meetup I attended this week, but hey: how do you get to be a nerdhero without a few buzz words? The CEO of Kaggle (Anthony Goldblum) came to talk to the Statistical Programming DC / Data Science DC about two things:

  1. What the typical Kaggle competition lifecycle looks like, and
  2. Which models that keep on whoopin’ ass in the Kaggle competitions.

For the former, his claim was that competition “showed people what was possible” and therefore pushed them to better. I love his “invisible hand” interpretation of the data science marketplace, but I think it may have way more to do with the best kagglers (?) out there sharing their secret recipes. I don’t know, but that just seems way more reasonable.

As for which models are winning, as he put it, it boiled down to two epochs: the first was the reign of the random forest / gradient boosting / ensemble-of-decision-tree models types, and the current reign of neural networks. I appreciated the distinction he went on to make about realms of applicability and the typical workflow breakdown: his take was that any data scientist split his or her time between feature selection and model tuning. The RF/Boost models put emphasis on feature selection (the models themselves are basically braindead), while the neural network models don’t give a got-damn about the data (how many layers, bro?!?).

All-in-all, not bad. I’ll give them a second chance (and next time I’ll show up in time for some empanadas, so whichever d-bag packed up 10 in a ziplock (I’m serious) doesn’t eat them all before me).

Written on January 14, 2016
Keywords: kaggle, dc, district, columbia, statistics, machine, learning, neural, networks, xgboost, gradient, empanadas, ziplock, douchebag