August 30, 2018

Visualizing the BGG Game Database with Gephi. Whoa!


So I stumbled into an interesting post over at r/boardgames from reddit user Shepperstein, who had downloaded a trove of data from BGG’s database. He then used Gephi to create some fantastic network models (aka graphs) depicting relationships between game categories. Very cool stuff. I urge you to check out his post and links to his analysis.

Of course, I immediately wanted to start playing around with the data myself!

Fortunately, I’m no stranger to excel AND I used Gephi several years ago, so I was already familiar with its basic functionality. Shepperstein also kindly provided a direct link to his database, so I could tap into that information directly. Are we excited yet?



Even more, this would prove to be an opportunity to tackle something I’ve long wanted to do. If you’ve read this blog before, you’ll know I’ve always had an interest in game classification and taxonomy. In particular, I’ve had a long-standing attraction to Selwyth’s Alternative Classification of Boardgames, which provides a comprehensive rework of BGG’s category and mechanism descriptors.

One of the challenges has always been finding a way (or perhaps simply the motivation) to “remap” BGG’s category + mechanism descriptors into new classes (based on Selwyth’s approach for example). Ideally, these classes would better reflect the nature of the individual descriptors. For example, the 80+ descriptors in the category field are a total hodge-podge of thematic items (“farming” or “trading in the mediterranean”, etc.), mechanisms, domains (i.e. Wargame or Party Game), and more besides. Likewise the mechanism attribute contains stuff that aren’t really mechanisms at all.

Long story short, I remapped all of the categories and mechanisms from BGG’s system over to an “alternative” system. You can check out the category-mechanism reclassification tables to see what I did, if you’re so inclined. Armed with these reclassified tables and a trove of BGG database… uhh… data… I set about pulling it all into Gephi and having a look at what I could do.

In contrast to Shepperstein’s work, I wanted to use Gephi to visualize not just the BGG categories, but also the Mechanisms, AND do it in a way such that the final output would give an indication what new class the descriptors would fall into. I wanted it so that things Selwyth classified as mechanisms or genre would be identified as such. Of course I also needed to balance this with the ability to logically discern groupings (aka “communities”) of related attributes.

The image below shows the culmination of this effort. If you want to read it, you really need to expand the image link and make it full screen. Have at it, and I’ll provide some discussion below.



A few technical notes about the above analysis.

(1) The database from Shepperstein only includes games from 1990 to 2018, although that still reflects tens of thousands of games, and also tends to be things more recent and more likely to be tagged with mechanisms and categories.

(2) In Gephi, I excluded node records (i.e. the list of descriptors) with less than 50 games using that category. Likewise, I excluded games where the “weight” of connections between any two descriptors was less than 40. This means that if there aren’t more than 40 games that both share a pairing of any two attributes, then the relationship is ignored. With over 18,000 node connections, it made sense to prune out the ones with a fairly minimal impact.

(3) The fainter-shaded outer circles/colors around the nodes correspond to my reclassified descriptors discussed above.

(4) The colored “community” groupings were based on running a modularity statistic (I have no idea what it’s doing, just for the record), but it results in assigning nodes to groupings based on the relatedness to other nodes. After playing around with the tolerances, it ended up with 11 categories that you see in the brighter colors (e.g. all the “Wargame” related stuff are Red).

Now, I think there some really cool things to come out of this graph and the community groupings. Wargames along with their frequently used mechanics (area movement, campaign/card driven, chit-pulling, point-to-point movement) are all clustered pretty well together. Likewise we see groupings around Party games, which also contains the gamut of social deduction-style games.

Given the plethora of cooperative games with horror/zombie themes, roleplaying elements, and adventure, it was neat to see all those clustered together. Of course, this was pretty well intermingled with fantasy games that leverage variable player-powers, fighting mechanics/genres, miniatures, collectable components (i.e. LCG’s). Science-fiction is likewise ensconced in this zone of the graph.

Economic games are in the bottom right, and constitutes the bulk of what I see as mainline euro-style games. I like the little enclave of Route-Network Building, Transportation-theme, Train-them, Stock holding down there. Aka, the 18xx games and their ilk. I do think there is a high level of alignment with Tile-laying games and eurogames, which is why they also fell into the same community.

Another interesting result is that Area-Control / Area-Influence ended up as it’s own community, and rightly situated between wargames and more euro-style economic games. Area control games tend to have more direct player-to-player interaction on a map, and hence are associated somewhat with their wargaming neighbors. Is this the homeland of the wuero?

Abstract games are down at the bottom, at a logical point between both euro-style economic games (which also tend to be somewhat abstract in nature) and Children’s Games, which are also quite abstract (perhaps as a means of keeping things simple in mechanics - or just that they share some common descriptors?).

In the dead center are a few big communities, including card games and the obviously associated hand management, along with Dice and press your luck type systems. Some of these, like cards and dice are so ubiquitous across domains of games that it’s not at all surprising to see them in the middle of the graph with connections to just about everywhere. I tried excluding them from graph and it basically had no structural impact at all, more or less confirming this assessment. Of course you get things like “take that” games and “trick-taking” games are very closely associated with card games, so I left it in for clarity and completeness.

I also thought it was interesting to compare opposite sides of the graph. Wargames are directly opposite to Children’s games. Highly thematic games in the Fantasy/Fighting, Science fiction, and Cooperative realms are all opposite to Economic (euro-style) games and abstract games. Likewise, games that focus on area control/majority elements and derive much of their deep strategic play from spatial positioning and the like are opposite to party and deduction style games, which emphasize an entirely different sort of player-to-player interactions.

Phew!

Having done all of this, I’m not sure what’s next! I’m tempted to see about refining the database to pull, for example, the top 10,000 ranked games or top 10,000 most owned games - irrespective of year - in order to hone the database around games more likely to be known, as well as grabbing more of the popular (or classic) games from prior to 1990. Much of the database is filled with relatively obscure games or print-and-play projects and don’t reflect fully published and circulated titles. Over 50% of the dataset (~8,200 records) are games with less than 250 owners for example. I also have pulled in BGG ranking data, average weights, number of owned copies, and more - but I’ll need to think more on how to make that interesting.

So for now, I guess it’s time to open the phones! Any reactions? Thoughts or ideas of other ways to slice the data? I’d love to hear from you all. Cheers.


2 comments: