Black Hat Data Wrangler

I had a choice between meetups on Tuesday night and, frankly, I made the wrong choice. Fotunately, others didn’t, including one faithful DC-area meetup streamer (Casey Driscoll). The video of the event was posted, allowing me to double-dip (so huge hat tip to Casey). Casey wasn’t alone in his awesomeness, though, as the presentation itself was posted to one of the poster’s github accounts. The materials are all there, and you should definitely check them out.

Many of the tricks and hacks they talked about will be passe for anyone who has done any measure of web scraping, but some (particularly the hidden spans and font remapping) were super cool to see. The examples were clear and fun, and the presentation itself was entertaining enough.

Excuse me while I go remap some fonts…

Written on January 13, 2016
Keywords: data, wrangle, wrangling, black, hat, white, obfuscate, obfuscation, python, selenium, ocr, ocropus, pdf, font remapping