May 19, 2016

The Simpsons Social Networks: Seasons #1 - #26

Back in 2014 I used Gephi to layout a graph of co-appearances of characters in Season #1 of The Simpsons. I've now repeated that effort for Seasons #1 through #26.

To recap, the graphs visualize the co-appearance networks of each season of The Simpsons. Each graph vertex represents a character, edges connect the vertices of pairs of characters who appear together in an episode. Each edge carries a weight whose value is the number of episodes in the season in which the connected characters co-appear. The size of a vertex encodes the number of episodes in which a character appears in a given season. This value is also encoded in the vertex's colour.

The graphs have some common features. The largest nodes at the centre of each graph are the core Simpsons family unit: Homer, Marge, Bart, Lisa and Maggie. Occasionally, Maggie is absent from a few episodes, and in these seasons her vertex is slightly smaller than those of the rest of the family.

Surrounding the central family vertices are vertices of secondary characters who make frequent appearances although not in every episode. These include Abe Simpson (Grandpa), bartender Moe Szyslak, Homer's colleagues Carl Carlson and Lenny Leonard, Bart's school chum Milhouse Van Houten, and many more. Shown below is the central cluster for Season 14's graph.
The central cluster of the Season #14 co-appearance graph.
Further from the centre we find characters who make fewer appearances, and on the periphery are clusters of vertices representing characters who appear together in single episodes. Such a cluster is shown below for Episode 19 (Simple Simpson) of Season 15.
The characters whose only appearance in Season 15 is in Episode 19.
The graphs become larger and more complex with the progression of the seasons. Season #1's graph has 240 character vertices. This rises to 600 characters in Season #26. The graphs for Seasons #1 - #26 are shown below.

The Data

I obtained the data from Wikisimpsons. I wrote a PERL script to fetch and parse the characters appearing in each season's episodes. As is often the case sourcing and cleansing the data took considerable effort. Fortunately, Wikisimpsons is a wiki so I could correct some errors at source. Other problems require hacks and workarounds in the script. Even after this there are still some issues with the data that require attention.

This work assumes Wikisimpsons is 100% complete, consistent and correct. It isn't, so if you spot any problems then please contribute to this excellent wiki by fixing what you can.

The Graphs

My PERL script generates two files for each season: nodes.csv (vertices) and edges.csv (edges). I import these into Gephi and then layout the resulting graph. I used Gephi's force-directed algorithm ForceAtlas2. It attempts to layout the vertices such that those connected by edges are close together (the larger the edge weight, the shorter the edge) and those not connected by edges are kept separate.

ForceAtlas2 also has a parameter that tweaks the layout so that vertex overlap is avoided. I enabled this parameter once the layout had stabilized.

Gephi also supports manual layout. So once ForceAtlas2 had settled down I made some manual adjustments to bring outlying clusters closer to the main graph so as to produce a more compact layout.

The final graphs were exported from Gephi as SVG, converted to PNG images using Inkscape and labelled using ImageMagick.

Tools

Copyright

The graphs, SVGs, PNGs and script are available at GitHub under the MIT License.