September 4, 2014

The Simpsons Social Network (Season 1)

I've been a fan of The Simpsons ever since Season #1 was first broadcast. So, I was recently thinking about visualizing the social network (no, not this one) of Simpsons characters.

Constructing the network of social relationships between various Simpsons characters would be a difficult and time-consuming process (does Lisa even have any friends?) So, I opted for a different network that can be constructed programmatically; the network of character co-appearances. In this network, two characters are connected if they appear in the same episode of The Simpsons. This network is similar to the one constructed for film actors that allows us to determine six degrees of Kevin Bacon.

The Simpsons co-appearances network can be constructed by parsing the episodes pages of Wikisimpsons. Mathematically speaking, the network is a graph. Each node of the graph represents a Simpsons character. An (undirected) edge connects each pair of nodes whose characters appear in the same episode. To each edge I add a weight; the number of episodes in which the pair of characters co-appear. I also label each node with the number of episodes in which its character appears.

Having constructed the graph we can set about visualizing it. Visualizing graphs helps you understand the structure of a network. So the choice of graph-layout algorithm is critical. If you impose a hierarchical layout, you'll see hierarchies. If you impose a circular layout you'll see circles.

For this reason I've used a force-directed layout, which attempts to position the nodes such that the distance between any pair of connected nodes is inversely proportional to the weight on the edge between them. This results in characters who co-appear often having their nodes positioned close together, while those that don't will have their nodes separated.

To do this I used Gephi the "open source graph visualization platform". Gephi allows you to experiment with various layout algorithms and customize the appearance of your graph. You can easily apply different colour maps, labelling and rendering attributes to your graph's nodes and edges. Gephi has tools for filtering nodes and edges, and an arsenal of graph theoretic indices can be calculated.

I constructed a co-appearances graph for Season 1 of the Simpsons and loaded it into Gephi. I applied the following settings:
  • Layout: ForceAtlas 2
  • Node size and colour: number of episodes in which a character makes an appearance
  • Edge colour: number of episodes in which characters connected by the edge co-appearance
The resulting graph is shown below. High-resolution renderings are also available (PNG, PDF, SVG).
Graph of Simpsons characters co-appearances in Season 1.

The graph shows us several things. The "central" characters - Homer, Marge, Bart, Lisa and Maggie Simpson - form a cluster at the centre of the graph. They have the largest, darkest nodes because they appear in every episode of Season 1.

Around this central cluster are positioned smaller, lighter nodes for characters who appear frequently but not in every episode; characters like Milhouse Van Houten, Moe Szyslac, Barney Gumble, Monty Burns and Waylon Smithers. Notice that Burns and Smithers, and Moe and Barney are positioned close together as they often appear in the same episodes.
The central cluster of the Simpsons co-appearance graph.

On the outer edges of the graph are clusters of characters who appear together in a single episode. Below we see the cluster (of minor characters) for episode 7 "The Call of the Simpsons". Between these episode clusters are positioned characters who appear in two or three episodes.
Cluster of minor characters appearing in episode 7 "The Call of the Simpsons".
If you'd like to experiment with this graph you can download it from Github.