Postgres theory, feature discussion, and advocacy are fun. But even I’ll admit it’s nice to have some practical application every once in a while. This week, we’re going to build an actual database.
But what would be small enough for a proof of concept, yet somewhat interesting? Well, I’m a fan of Hearthstone. It’s a silly card game much like Magic: The Gathering, but has the distinct aura of “eSports!” Regardless, it’s a fun little time waster, and has a few hundred data points we can manipulate.
Annoyingly, there’s no official card list. But fans are nothing if not dedicated, and several folks maintain spreadsheets for their own purposes and often share them. This one had a pretty good breakdown and is up to date with the latest expansions, so it makes a good starting point.
The second page of this Excel workbook is a card list, which I saved as a quoted tab-delimited file. The first three lines are garbage, so away they went. The last four lines are also empty, so into the digital bucket in the sky they passed. What’s left is something that we can import into a junk table:
CREATE UNLOGGED TABLE hs_raw_import ( junk1 VARCHAR, junk2 VARCHAR, junk3 VARCHAR, junk4 VARCHAR, expansion VARCHAR, card_name VARCHAR, character_class VARCHAR, rarity VARCHAR, category VARCHAR, tribe VARCHAR, mana_cost VARCHAR, attack VARCHAR, hitpoints VARCHAR, card_text VARCHAR, junk5 VARCHAR, junk6 VARCHAR, junk7 VARCHAR); COPY hs_raw_import FROM'/tmp/hearthstone_cards.tsv'WITH CSV DELIMITER E'\t' HEADER; ANALYZE hs_raw_import; |
All of the “junk” columns are an unfortunate reality of the current COPY
command; there’s no way to ignore columns in source files. That’s OK in this case because we’re going to discard the import table when everything is done, and it’s certainly speedy. Importing all 743 rows on a test VM required less than 5ms.
Now we need to transform the data into something a bit more normalized. To do that, we need to look at the data. A technique that works really well is to examine how the data is distributed by checking the contents of the pg_stats
table. Let’s look at this data:
SELECT attname, n_distinct FROM pg_stats WHERE tablename='hs_raw_import'AND attname NOTLIKE'junk%'; attname | n_distinct -----------------+------------ expansion |9 card_name |-1 hitpoints |12 card_text |-0.89502 character_class |10 rarity |5 category |3 tribe |7 mana_cost |13 attack |12 |
Looking at the statistics, any negative number means there’s a proportional relationship between the total row count and the number of distinct values. Positive numbers are absolute counts, and those are the ones we want to focus on for normalization.
After leveraging a bit of knowledge regarding the actual game mechanics, we can ignore elements that act as measurable metrics. Things like hitpoints, attack, and mana_cost may not have a lot of distinct values, but don’t have further associated information. But what about the rest? Let’s dig a little deeper:
- expansion : As with most CCGs, new cards and mechanics are added through expansions. These have their own associated attributes we might want to track independently. This one warrants a new table.
- character_class : In Hearthstone, cards are separated into nine distinct classes, with a tenth as neutral cards any class can leverage. Again, if we were writing an app, we might list several other data points about each class, so we’d want that in a separate table.
- rarity : This one is a bit tricky. There are currently only five rarity levels, each of which has an occurrence percentage when opening new card packs. Yet Blizzard is not likely to add further rarity levels. For now, let’s leave percentages as an academic exercise, and just leave rarity as a text description.
- category : Currently there are only minions, spells, and weapons. This is extremely likely to change in the future, assuming the game remains popular for several more decades, as Magic has. This one gets a new table.
- tribe : Some players consider this a subcategory. Minions can sometimes be further broken down into a taxonomy that will alter game mechanics via synergistic effects. We definitely want to track this separately.
Not listed here are attributes inherent in the card text itself. The taunt attribute for example, is a significant variable game mechanic, which we’ll want to mine from the text. Since card text is often conditional, these optional elements should also be tracked separately through an attribute table.
Given all of this, here’s an extremely simplified architecture for representing the cards themselves, with ‘etc’ representing further descriptive columns. Queries to bootstrap them from the source table are at the end:
CREATETABLE expansion ( expansion_id SERIAL PRIMARYKEY, expansion_name VARCHARNOTNULL, release_date DATENULL, etc VARCHARNULL); CREATETABLE character_class ( class_id SERIAL PRIMARYKEY, class_name VARCHARNOTNULL, hero_power VARCHARNULL, etc VARCHARNULL); CREATETABLE category ( category_id SERIAL PRIMARYKEY, category_name VARCHARNOTNULL, etc VARCHARNULL); CREATETABLE minion_tribe ( tribe_id SERIAL PRIMARYKEY, tribe_name VARCHARNOTNULL, etc VARCHARNULL); CREATETABLE attribute ( attribute_id SERIAL PRIMARYKEY, att_name VARCHARNOTNULL, effect TEXT NULL, etc VARCHARNULL); CREATETABLE card ( card_id SERIAL PRIMARYKEY, expansion_id INTNOTNULLREFERENCES expansion, card_name VARCHARNOTNULL, class_id INTNOTNULLREFERENCES character_class, rarity VARCHARNOTNULL, category_id INTNOTNULLREFERENCES category, tribe_id INTNULLREFERENCES minion_tribe, mana_cost INTNOTNULL, attack INTNULL, hitpoints INTNULL, card_text TEXT ); CREATEINDEX idx_card_expansion ON card (expansion_id); CREATEINDEX idx_card_class ON card (class_id); CREATEINDEX idx_card_category ON card (category_id); CREATEINDEX idx_card_tribe ON card (tribe_id); CREATETABLE card_attribute ( card_id INTNOTNULLREFERENCES card, attribute_id INTNOTNULLREFERENCES attribute ); CREATEINDEX idx_card_attribute_attribute_id ON card_attribute (attribute_id); INSERTINTO expansion (expansion_name)SELECTDISTINCT expansion FROM hs_raw_import; INSERTINTO character_class (class_name)SELECTDISTINCT character_class FROM hs_raw_import; INSERTINTO category (category_name)SELECTDISTINCT category FROM hs_raw_import; INSERTINTO minion_tribe (tribe_name)SELECTDISTINCT tribe FROM hs_raw_import; WHERE tribe ISNOTNULL; INSERTINTO attribute (att_name)VALUES('taunt'),('enrage'); INSERTINTO card ( expansion_id, card_name, class_id, rarity, category_id, tribe_id, mana_cost, attack, hitpoints, card_text )SELECT e.expansion_id, r.card_name, cl.class_id, r.rarity, ca.category_id, t.tribe_id, r.mana_cost::INT, r.attack::INT, r.hitpoints::INT, r.card_text FROM hs_raw_import r JOIN expansion e ON(e.expansion_name = r.expansion)JOIN character_class cl ON(cl.class_name = r.character_class)JOIN category ca ON(ca.category_name = r.category)LEFTJOIN minion_tribe t ON(t.tribe_name = r.tribe); INSERTINTO card_attribute (card_id, attribute_id)SELECT c.card_id, a.attribute_id FROM card c JOIN attribute a ON(c.card_text ~* a.att_name); ANALYZE attribute; ANALYZE card_attribute; ANALYZE expansion; ANALYZE character_class; ANALYZE category; ANALYZE minion_tribe; ANALYZE card; |
Whew! That was a lot of work! But in the end, we have a constellation that accurately represents how cards work, with future potential of adding more mechanics without the necessity of reorganizing our architecture. There are obviously more potential attributes than taunt and enrage, but we’ll leave the data mining as an exercise for anyone interested enough to really flesh out the model.
Postgres offers us a nice way to leverage this data, too. Having to decode all of these IDs is pretty annoying, so we probably want a view. Further, cards don’t change very often, and new expansions are only released every few months. This makes them a great candidate for a materialized view. Here’s how we’d do that:
CREATE MATERIALIZED VIEW v_all_cards ASSELECT e.expansion_name AS expansion, c.card_name, cl.class_name AS class, c.rarity, ca.category_name AS category, t.tribe_name AS tribe, c.mana_cost, c.attack, c.hitpoints, c.card_text, ARRAY(SELECT a.att_name FROM card_attribute tr JOIN attribute a USING(attribute_id)WHERE tr.card_id = c.card_id )AS attributes FROM card c JOIN expansion e USING(expansion_id)JOIN character_class cl USING(class_id)JOIN category ca USING(category_id)LEFTJOIN minion_tribe t USING(tribe_id); CREATEINDEX idx_all_expansion ON v_all_cards (expansion); CREATEINDEX idx_all_class ON v_all_cards (class); CREATEINDEX idx_all_category ON v_all_cards (category); CREATEINDEX idx_all_tribe ON v_all_cards (tribe); SELECT card_name, class, rarity FROM v_all_cards WHERE expansion ='Classic'AND'enrage'= ANY(attributes); card_name | class | rarity ---------------------+---------+----------- Angry Chicken | Neutral | Rare Amani Berserker | Neutral | Common Raging Worgen | Neutral | Common Tauren Warrior | Neutral | Common Spiteful Smith | Neutral | Common Grommash Hellscream | Warrior | Legendary |
And of course, this particular rabbit hole can go much, much deeper. With only a few hundred rows, it may seem silly to design the database this way, but slap a front-end on this, and it’s a collection management tool that rivals the one built into the actual game client. The official client only allows players to search their collection by expansion, mana cost, and text, while our structure has far more flexibility.
This is just one example for using Postgres in an everyday scenario. There’s much more out there if you’re interested in looking. Happy hunting!