Quantcast
Channel: Planet PostgreSQL
Viewing all articles
Browse latest Browse all 9642

Shaun M. Thomas: PG Phriday: Database Creation Workshop

$
0
0

Postgres theory, feature discussion, and advocacy are fun. But even I’ll admit it’s nice to have some practical application every once in a while. This week, we’re going to build an actual database.

But what would be small enough for a proof of concept, yet somewhat interesting? Well, I’m a fan of Hearthstone. It’s a silly card game much like Magic: The Gathering, but has the distinct aura of “eSports!” Regardless, it’s a fun little time waster, and has a few hundred data points we can manipulate.

Annoyingly, there’s no official card list. But fans are nothing if not dedicated, and several folks maintain spreadsheets for their own purposes and often share them. This one had a pretty good breakdown and is up to date with the latest expansions, so it makes a good starting point.

The second page of this Excel workbook is a card list, which I saved as a quoted tab-delimited file. The first three lines are garbage, so away they went. The last four lines are also empty, so into the digital bucket in the sky they passed. What’s left is something that we can import into a junk table:

CREATE UNLOGGED TABLE hs_raw_import (
  junk1 VARCHAR,
  junk2 VARCHAR,
  junk3 VARCHAR,
  junk4 VARCHAR,
  expansion VARCHAR,
  card_name VARCHAR,
  character_class VARCHAR,
  rarity VARCHAR,
  category VARCHAR,
  tribe VARCHAR,
  mana_cost VARCHAR,
  attack VARCHAR,
  hitpoints VARCHAR,
  card_text VARCHAR,
  junk5 VARCHAR,
  junk6 VARCHAR,
  junk7 VARCHAR);
 
COPY hs_raw_import FROM'/tmp/hearthstone_cards.tsv'WITH CSV DELIMITER E'\t' HEADER;
 
ANALYZE hs_raw_import;

All of the “junk” columns are an unfortunate reality of the current COPY command; there’s no way to ignore columns in source files. That’s OK in this case because we’re going to discard the import table when everything is done, and it’s certainly speedy. Importing all 743 rows on a test VM required less than 5ms.

Now we need to transform the data into something a bit more normalized. To do that, we need to look at the data. A technique that works really well is to examine how the data is distributed by checking the contents of the pg_stats table. Let’s look at this data:

SELECT attname, n_distinct
  FROM pg_stats
 WHERE tablename='hs_raw_import'AND attname NOTLIKE'junk%';
 
     attname     | n_distinct 
-----------------+------------
 expansion       |9
 card_name       |-1
 hitpoints       |12
 card_text       |-0.89502
 character_class |10
 rarity          |5
 category        |3
 tribe           |7
 mana_cost       |13
 attack          |12

Looking at the statistics, any negative number means there’s a proportional relationship between the total row count and the number of distinct values. Positive numbers are absolute counts, and those are the ones we want to focus on for normalization.

After leveraging a bit of knowledge regarding the actual game mechanics, we can ignore elements that act as measurable metrics. Things like hitpoints, attack, and mana_cost may not have a lot of distinct values, but don’t have further associated information. But what about the rest? Let’s dig a little deeper:

  • expansion : As with most CCGs, new cards and mechanics are added through expansions. These have their own associated attributes we might want to track independently. This one warrants a new table.
  • character_class : In Hearthstone, cards are separated into nine distinct classes, with a tenth as neutral cards any class can leverage. Again, if we were writing an app, we might list several other data points about each class, so we’d want that in a separate table.
  • rarity : This one is a bit tricky. There are currently only five rarity levels, each of which has an occurrence percentage when opening new card packs. Yet Blizzard is not likely to add further rarity levels. For now, let’s leave percentages as an academic exercise, and just leave rarity as a text description.
  • category : Currently there are only minions, spells, and weapons. This is extremely likely to change in the future, assuming the game remains popular for several more decades, as Magic has. This one gets a new table.
  • tribe : Some players consider this a subcategory. Minions can sometimes be further broken down into a taxonomy that will alter game mechanics via synergistic effects. We definitely want to track this separately.

Not listed here are attributes inherent in the card text itself. The taunt attribute for example, is a significant variable game mechanic, which we’ll want to mine from the text. Since card text is often conditional, these optional elements should also be tracked separately through an attribute table.

Given all of this, here’s an extremely simplified architecture for representing the cards themselves, with ‘etc’ representing further descriptive columns. Queries to bootstrap them from the source table are at the end:

CREATETABLE expansion (
  expansion_id    SERIAL PRIMARYKEY,
  expansion_name  VARCHARNOTNULL,
  release_date    DATENULL,
  etc             VARCHARNULL);
 
CREATETABLE character_class (
  class_id       SERIAL PRIMARYKEY,
  class_name     VARCHARNOTNULL,
  hero_power     VARCHARNULL,
  etc            VARCHARNULL);
 
CREATETABLE category (
  category_id    SERIAL PRIMARYKEY,
  category_name  VARCHARNOTNULL,
  etc            VARCHARNULL);
 
CREATETABLE minion_tribe (
  tribe_id    SERIAL PRIMARYKEY,
  tribe_name  VARCHARNOTNULL,
  etc         VARCHARNULL);
 
CREATETABLE attribute (
  attribute_id  SERIAL PRIMARYKEY,
  att_name      VARCHARNOTNULL,
  effect        TEXT NULL,
  etc           VARCHARNULL);
 
CREATETABLE card (
  card_id       SERIAL PRIMARYKEY,
  expansion_id  INTNOTNULLREFERENCES expansion,
  card_name     VARCHARNOTNULL,
  class_id      INTNOTNULLREFERENCES character_class,
  rarity        VARCHARNOTNULL,
  category_id   INTNOTNULLREFERENCES category,
  tribe_id      INTNULLREFERENCES minion_tribe,
  mana_cost     INTNOTNULL,
  attack        INTNULL,
  hitpoints     INTNULL,
  card_text     TEXT
);
 
CREATEINDEX idx_card_expansion ON card (expansion_id);
CREATEINDEX idx_card_class ON card (class_id);
CREATEINDEX idx_card_category ON card (category_id);
CREATEINDEX idx_card_tribe ON card (tribe_id);
 
CREATETABLE card_attribute (
  card_id       INTNOTNULLREFERENCES card,
  attribute_id  INTNOTNULLREFERENCES attribute
);
 
CREATEINDEX idx_card_attribute_attribute_id
    ON card_attribute (attribute_id);
 
INSERTINTO expansion (expansion_name)SELECTDISTINCT expansion FROM hs_raw_import;
 
INSERTINTO character_class (class_name)SELECTDISTINCT character_class FROM hs_raw_import;
 
INSERTINTO category (category_name)SELECTDISTINCT category FROM hs_raw_import;
 
INSERTINTO minion_tribe (tribe_name)SELECTDISTINCT tribe FROM hs_raw_import;
 WHERE tribe ISNOTNULL;
 
INSERTINTO attribute (att_name)VALUES('taunt'),('enrage');
 
INSERTINTO card (
    expansion_id, card_name, class_id, rarity, category_id,
    tribe_id, mana_cost, attack, hitpoints, card_text
)SELECT e.expansion_id, r.card_name, cl.class_id, r.rarity,
       ca.category_id, t.tribe_id, r.mana_cost::INT,
       r.attack::INT, r.hitpoints::INT, r.card_text
  FROM hs_raw_import r
  JOIN expansion e ON(e.expansion_name = r.expansion)JOIN character_class cl ON(cl.class_name = r.character_class)JOIN category ca ON(ca.category_name = r.category)LEFTJOIN minion_tribe t ON(t.tribe_name = r.tribe);
 
INSERTINTO card_attribute (card_id, attribute_id)SELECT c.card_id, a.attribute_id
  FROM card c
  JOIN attribute a ON(c.card_text ~* a.att_name);
 
ANALYZE attribute;
ANALYZE card_attribute;
ANALYZE expansion;
ANALYZE character_class;
ANALYZE category;
ANALYZE minion_tribe;
ANALYZE card;

Whew! That was a lot of work! But in the end, we have a constellation that accurately represents how cards work, with future potential of adding more mechanics without the necessity of reorganizing our architecture. There are obviously more potential attributes than taunt and enrage, but we’ll leave the data mining as an exercise for anyone interested enough to really flesh out the model.

Postgres offers us a nice way to leverage this data, too. Having to decode all of these IDs is pretty annoying, so we probably want a view. Further, cards don’t change very often, and new expansions are only released every few months. This makes them a great candidate for a materialized view. Here’s how we’d do that:

CREATE MATERIALIZED VIEW v_all_cards ASSELECT e.expansion_name AS expansion, c.card_name,
       cl.class_name AS class, c.rarity,
       ca.category_name AS category,
       t.tribe_name AS tribe, c.mana_cost,
       c.attack, c.hitpoints, c.card_text,
       ARRAY(SELECT a.att_name
           FROM card_attribute tr
           JOIN attribute a USING(attribute_id)WHERE tr.card_id = c.card_id
       )AS attributes
  FROM card c
  JOIN expansion e USING(expansion_id)JOIN character_class cl USING(class_id)JOIN category ca USING(category_id)LEFTJOIN minion_tribe t USING(tribe_id);
 
CREATEINDEX idx_all_expansion ON v_all_cards (expansion);
CREATEINDEX idx_all_class ON v_all_cards (class);
CREATEINDEX idx_all_category ON v_all_cards (category);
CREATEINDEX idx_all_tribe ON v_all_cards (tribe);
 
SELECT card_name, class, rarity
  FROM v_all_cards
 WHERE expansion ='Classic'AND'enrage'= ANY(attributes);
 
      card_name      |  class  |  rarity   
---------------------+---------+-----------
 Angry Chicken       | Neutral | Rare
 Amani Berserker     | Neutral | Common
 Raging Worgen       | Neutral | Common
 Tauren Warrior      | Neutral | Common
 Spiteful Smith      | Neutral | Common
 Grommash Hellscream | Warrior | Legendary

And of course, this particular rabbit hole can go much, much deeper. With only a few hundred rows, it may seem silly to design the database this way, but slap a front-end on this, and it’s a collection management tool that rivals the one built into the actual game client. The official client only allows players to search their collection by expansion, mana cost, and text, while our structure has far more flexibility.

This is just one example for using Postgres in an everyday scenario. There’s much more out there if you’re interested in looking. Happy hunting!


Viewing all articles
Browse latest Browse all 9642

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>