MTG, or: how to be really, really, really cool

This article is part 1 in a series on mtg. If this is your bag, check out

  • Part 1: In which our hero struggles to game a multi-thousand-dollar game about goblins
  • Part 2: In which our hero checks his resume's neo4j box

The coolest thing about me, probably, is my love for a novelty card game about fantasy battles. Think pokeman for lord of the rings fans. For the record, I don’t really like just about anything about the fantasy genre outside of the context of “Magic: The Gathering.” I don’t even like the fact that the name of the game is “Magic: The Gathering.” I suppose I’m trying too hard to distance myself from the game, here; playing a little too cool for school. So let me just drop the coolness factor on down to 0 and bring the realness factor up to 100: I love this silly fantasy card game.

Unfortunately, however, my wallet doesn’t fell the same way. Magic cards are ridiculously expensive (see, e.g., one of the ‘power nine’, black lotus). You’re reading that right – we’re in the tens of thousands for that piece of cardboard. Like, desired enough to make a grownass adult cry out of joy. Personally, five minutes in MS Paint and some understanding playing partners seems like a reasonable reaction to that price, but, hey, Wizards of the Coast has to fleece the humble neckbeard residents of The Nerderlands somehow.

So to the point: why am I talking about this game on my ostensibly tech-themed blog? Easy: with neither the wealth of America-great-ifier Donald Trump nor the skill of [whomever is good at mtg this week], I am left with only my trusty debian dev environment and my pythonic wits to attempt to eek out some kind of competitive advantage – so eek I shall.

In this series, my goal is to do the following:

  • obtain data on cards and, in particular, their possible membership in high-performing decks (collections of 60 or so cards with some compositions regulations),
  • develop a metric that ranks cards based on previous success and various synergies,
  • predict which cards of the next set release will perform well or be particularly popular,
  • and, finally, attempt to use that information to get some sort of competitive advantage

As I go along I’m sure I’ll realize how ridiculously infeasible this proposal is and adjust it accordingly, but in the interest of full transparency into how naive I am, there it is.

So, without further ado:

MTZ: Magic the Zachering

Part 1: Krosan Datascraper

Data Sources

After quite a bit of searching, I found two pretty gnarly sources of data:

Card Info

For simple card info, you just cannot beat MTGJson (at least in my limited searching – as always, please let me know if you have found something better). Quick and easy JSON api. There is one thing it is missing that may, eventually, prove to be very valuable to me – community ratings. The MTG “main page” for card data is Gatherer, which has (among many other things) a “star rating” for community sentiment re: individual cards. Info like that may end up being a useful tiebreaker (or, simply, primary feature) in my later models.

Deck Info

Magic is played by users who assemble approximately 60 cards (it must be 60 or greater, and conventional wisdom is to use exactly the minimum required (ostensibly, to increase your chances of getting the cards you most want)) into some synergistic whole. It is hard to emphasize just how large the phase space of possible decks and games is. That being said, throughout time and user interaction several “meta” strategies have emerged, and most active competition in the magic world involves creating a deck with one of a somewhat limited number of strategies. The relative power of these strategies then ebbs and flows as different cards come in and out of legality.

In any case, I am starting off with the assumption that knowing the cards (or perhaps simply the common properties of cards) which are “successful” in Magic competitions will give me information about under- or over-valued cards and overlooked synergies. To this end, I have to have a measure of the “successful-ness” of cards, and for that I am relying on their membership in championship-caliber decks. I found no better resource for tournament performance of cards than the Star City Games Deck Database.

Scraping

All the code I wrote for scraping this data can be found on my github repo here. The basic breakdown is simple:

  1. cards.py: a general purpose card parsing library. Because of the awesome above-mentioned MTGJson, this is as simple as creating a slightly modified dictionary object. This is really not much else
  2. decks.py: really just defining an abstract interface for deck objects
  3. scgdecks.py: okay, so this is where the meat is. The good folks at SCG have collected a ton of decklists, and slowly but surely this script will scrape them all. Currently it doesn’t persist them anywhere (bummer!) but it does load them into memory. There are enough that I simply don’t recommend that.

There is no command line interface yet (tsk, tsk, me – don’t worry, I feel as ashamed as you want me to) because there isn’t really a defined “thing” to do with the data once we have it. Although my usual natural inclination is to give these deck objects a postgres persistence capability, I am instead going to be super buzzwordy and use neo4j. As a result, I’m holding off on that until next time.

Next time!

I’m going to persist all this madness into a locally installed neo4j instance. I’ll go over how I did the install first, offer some quick first-take reactions to the technology, and then dive into my own use case. Till then!

Written on February 28, 2016
Keywords: hack, mtg, magic, gathering, superfly, web, scraping