Kaggle Learns, Too
This was definitely the buzziest meetup I attended this week, but hey: how do you get to be a nerdhero without a few buzz words? The CEO of Kaggle (Anthony Goldblum) came to talk to the Statistical Programming DC / Data Science DC about two things:
- What the typical Kaggle competition lifecycle looks like, and
- Which models that keep on whoopin’ ass in the Kaggle competitions.
For the former, his claim was that competition “showed people what was possible” and therefore pushed them to better. I love his “invisible hand” interpretation of the data science marketplace, but I think it may have way more to do with the best kagglers (?) out there sharing their secret recipes. I don’t know, but that just seems way more reasonable.
As for which models are winning, as he put it, it boiled down to two epochs: the first was the reign of the random forest / gradient boosting / ensemble-of-decision-tree models types, and the current reign of neural networks. I appreciated the distinction he went on to make about realms of applicability and the typical workflow breakdown: his take was that any data scientist split his or her time between feature selection and model tuning. The RF/Boost models put emphasis on feature selection (the models themselves are basically braindead), while the neural network models don’t give a got-damn about the data (how many layers, bro?!?).
All-in-all, not bad. I’ll give them a second chance (and next time I’ll show up in time for some empanadas, so whichever d-bag packed up 10 in a ziplock (I’m serious) doesn’t eat them all before me).