April 19, 2019

Keyforge: IMPACT Deck Analysis

Like many, I heard about Keyforge, from Fantasy Flight Games, last year, months before its proper release. As a former Magic: The Gathering player (circa 1994-1997 and sporadically thereafter) I was curious about Keyforge and how it would play. As a designer, I was downright intrigued by Richard Garfield's creation. As a gamer staring down the barrel of less free time, the prospect of playing a collectible card card (CCG) game without the expense and time most CCG’s require - what with all the card collecting and deck construction furor - was enough to sell me on the idea.

Fast forward, I’m now the owner of eight decks and have managed to pull a significant number of friends and family under the Keyforge umbrella. Afterall, one does need opponents! Fortunately, it’s been universally well-received among kids and adults alike. It is remarkable in how the procedurally generated decks nevertheless find ways, often clever or unexpected, to shine. I can’t shake the sense when staring at a “weak” seeming deck, that if I keep digging I’ll keep finding lines of play and nuanced card-combos that keep the gameplay fresh and exciting.

As readers of this blog may know, I’m a data geek at heart. And so it should come as no surprise that I decided to dig into the numbers and data behind Keyforge. In particular, I was trying to understand how and why different decks perform they way that they do. Hence this blog post.

The good news is many people before me have had this same thought, and a variety of “rating systems” have sprung out the aether-net to help players better understand their deck’s capabilities. Most prominent is the SAS/AERC system at DecksOfKeyforge.com. But there is also the ADHD system, best accessed at keyforge-compendium.com, or Baron Ashler’s system for generating a deck’s Expected Win Rates(EWR). ToyWiz has developed some of their own metrics, and connected the whole thing into a storefront for buying and selling decks.

Despite all these rating systems, none of them quite clicked with me. I feel that they didn’t fully or consistently appraise all facets of play and card effects, which results in skewing the perceived worth and values of certain decks. While I have tremendous appreciation for the SAS/AERC system (as it is the most thorough), the way in which SAS card ratings are subjectively determined and how the six AERC sub-values don’t fully cover all possible card effects means that some cards are either over/under rated, or whole swaths of card functionality are devalued. I was determined to change this. It’s data geek time.

IMPACT Deck Analysis, The Making Of

I penciled out an idea for a new deck rating system that was comprehensive and reasonably objective (or at the very least consistent), with how it determined card ratings. Here’s how it came into being:

Step 1 - The first thing I did was establish that a value of “1” was roughly analogous to 1 aember. For those that don’t know, the goal of Keyforge is to forge keys (clever name, eh?), which you do by collecting six aember tokens per key. The first player to forge 3 keys (18 total aember) wins. Thus, the effects of a particular card can be framed in terms of its IMPACT on the flow of aember.

Step 2 - The second thing I did was look across all the 300+ cards and start to categorize the primary effects of each card into a different categories. Some cards generate aember and others steal aember. Some abilities help you draw more cards and cycle your hand faster (giving you more flexibility and options) whereas other cards stall and slow down your opponent (thus hindering their options).

All in all, I identified twelve impact categories for card effects, which can be aggregated further into three big buckets: pacing, flexibility, and board. Pacing relates to effects that either speed up your aember production or stall your opponent’s. Flexibility is about manipulating your options through card cycling, recall effects (e.g. pulling creatures from play back into your hand) and activation, as well as messing with your opponent through hindering effects. Board (i.e. board control) is all about maintaining and leveraging creature and artifact power on the table. You can see descriptions of all 12 impact effect categories further below.

Step 3 - After identifying the impact categories, I developed a scoring rubric for how to assign value to each card based on the strength of its primary effect and modified by contingencies, penalties, bonuses, and other card attributes.

For example, I valued each point of creature power at 0.25, such that a 4-power creature would be valued at “1”, on the premise that a 4-power creature would be expected to survive at least one round and could reap (i.e. collect) one aember. Stronger or weaker creatures would be expected to gain more or less aember in proportion.

There many cards with strong effect, but which are tempered by their effectiveness being contingent on specific in-game situations. Other strong cards might come with a penalty that can potentially harm you as much as your opponent if you don’t plan around it. These contingency and penalty effects, alongside secondary bonus effects, also play into the scoring rubric.

Step 4 - With the rubric in hand, the most laborious task was to go through the 300+ cards in the game and assign impact values to each card. However, this is where the strength of this rating system, I feel, shines through. While each card has a “total impact” value, this value is an aggregation of each of the 12 categories, allowing you to see in exactly what ways a given card’s effects might impact the game.

I was able to weave in some interesting modifiers too, like providing cards with a bonus when they have an “Outhouse” effect. In Keyforge, you are limited to playing and activating cards of only one house (i.e. suit) each turn. But some cards have effects that let you activate “out of house” cards (e.g. outhouse or “house cheats”), which can open up powerful lines of play. Accounting for outhouse abilities is, IMHO, pretty important for a rating system.

Step 5 - The final step was to assemble a tool, in this case a google spreadsheet, that lets users plug in their deck and automatically generate an IMPACT score profile for their deck. Check it out.

Credit goes to Baron Ashler’s Expected Win Rate calculator which provides the technical backend for querying the Keyforge Master Vault API and automatically plugging the target deck’s unique card list. From there, some spreadsheet jujitsu (vlookups mostly) reads the master card rating table for the various IMPACT scores, and from there slices and dices the information in a bunch of different ways - including but not limited to a spider graph. Where would the world be without spider graphs?

So how does this tool work? Read on my friends!

IMPACT Deck Analysis, Tool Description
Click on link above to use the tool

Purpose: The IMPACT deck analysis tool is intended to provide a comprehensive assessment of a Keyforge deck’s composition, relative strengths/weaknesses, and potential playstyles.

Disclaimer: As with all Keyforge rating systems, this should not be viewed or used as if it provides an objective truth about a deck’s performance. Many facets of interactions between cards, both within the deck and between an opponent’s deck, are not fully captured by the system. Moreover, skillful play is always a significant factor in determining the winner of a match (as it should be), and so higher or lower relative IMPACT comparisons should be seen as a predictor of which deck will win.

Let's take a look at a personal deck of mine, the aptly named, Oliver, Rock Viking.

The top section of the impact tool summarizes the raw scores and totals across the 12 different IMPACT factors, aggregating these into Pacing, Flexibility, and Board sub-scores.

PACING: Relates to ability to control the flow of Aember, both yours and your opponent's

* Speed: Amber generation from raw aember bonus, card effects, and buffs to any of the above.
* Stall: Aember control or ability stall or delay your opponent from making keys by stealing, capturing, removing their Aember, increasing key forge costs, etc.
* Forging: Ability to forge a key outside of the normal turn process - which can be a huge boost to your pacing.

FLEXIBILITY: Relates to the ability to manipulate your hand/deck or limit your opponent's options

* Cycling: Measure of hand cycling, deck stacking, archive effects etc.
* Recall: Ability to move cards between board/hand/discard, which is a big boost to flexibility (e.g. regrowth)
* Activation: Ability to change the exhausted/ready state of cards, unstun cards, activate neighboring creatures, etc.
* Hinder: Ability to mess with opponent’s flexibility (i.e. make them draw less cards, discarding their cards, forcing house selection, exhausting their cards or stunning cards)

BOARD: Relates to the ability to maintain or exert board presence through creatures and artifacts

* Damage: Ability to target direct damage or issue mass damage
* Neutralize: Eliminate or takeover threats - either single target or global "wipe" effects
* Artifact: Ability to control, destroy, steal or otherwise deal with hostile artifacts
* Power: Ability to deal damage with creatures, offensive boosts and effects (e.g. charging, skirmish)
* Stability: Defensive and staying power of creatures (heal, elusive, taunt, armor)

These twelve impact categories are also aggregated into external vs. internal effects, giving an overall evaluation of how much your deck’s capabilities relate to managing and utilizing your own cards and assets versus interacting with those of your opponent (e.g. hindering effects, direct damage, stealing aember)

Another line of consideration is the distribution of effects relative to cards in your deck. For example, in the deck shown in the image (Oliver, Rock Viking), the Recall Impact of the deck is 10.5, which is linked to just five cards. These type of calculation is aggregated into a diffusion score, which is a measure of total impact divided by the total number of card effects. Deck’s with a lower diffusion value means their card effects are spread out among relatively more cards, which can help improve the consistency of the deck compared to those where a lot impact is concentrated in a smaller subset of stronger cards.

This middle section runs some statistics on the 12 criteria to determine the percentile score of each impact category relative to a larger pool of cards. These percentiles are graphed on the spider chart, giving a quick overview of the relative strengths and weaknesses of a given deck.

For example, in Oliver, Rock Viking, the deck scores above the 85th percentile for hinder, meaning that it has a lot of effects that allow you to mess with your opponent’s stuff. The 2x tremors can stun three creatures each in your opponent’s line (often costing them the bulk of a turn to remove). The 2x Succubus can shrink your opponent’s hand size, limiting their options for future plays. And so on.

NOTE: These percentile scores should be taken with a grain of salt, as they are based on a small sample of about 30 decks. I need to figure out a way to build a more representative sample of decks for calculating percentages more accurately.

The impact + playstyle graph also shows a general relationship between each impact category and its aggression vs. control value and internal vs. external value. More aggressive decks will favor high scores in impact categories like speed, forging, and power - which will try to outpace their opponent through direct aember generation, key hacks, and using masses of creatures to reap. Conversely, a control-oriented deck will use stalling (discarding or stealing opponents aember), artifact stealing, and creature stability effects to control the board space. These two dimensions intersect with impact effects that are more internally focused (e.g. card cycling) or externally focused (e.g. dealing direct damage).

The final section contains a detailed table of results for all of the cards in the deck and types of impact provided by each card. The three columns to the right of the net impact provide a breakdown of some specifics, such as raw amber generation from cards, raw creature power, and the bonuses for outhouse abilities.

The hidden secret of all of this is that “deck construction” is still very much alive and kicking in Keyforge. It’s just shifted from buying individual cards and deck tuning into a scouring the web for interesting decks with particular combination of cards. Central to these scouring activities are tools that help people quickly appraise the composition of a deck and help winnow down the staggering range of possibility.

Now it’s your turn…
Have you played Keyforge? Are you a data junky? Have you used other rating systems? If you plug your deck into the IMPACT system, how does it fare? Does it match your experiences using the deck?

The phones are open!


  1. This comment has been removed by the author.

  2. I took my favourite decks and ran them through your spreadsheet. I get similar numbers to what I have previously from desksofkeyforge.com - However, I love the breakdown of the cards and how you're explaining the power of synergies. I find my decks play better the more I run them and learn how they work, one of my favourite decks is scoring a 83 and it's an absolute joy to play even though I don't always win I love the combos. Look forward to your future updates! (Previous reply had a typo)

    1. Thanks for sharing your thoughts. Update posted to the system.