May 19, 2016

The Simpsons Social Networks: Seasons #1 - #26

Back in 2014 I used Gephi to layout a graph of co-appearances of characters in Season #1 of The Simpsons. I've now repeated that effort for Seasons #1 through #26.

To recap, the graphs visualize the co-appearance networks of each season of The Simpsons. Each graph vertex represents a character, edges connect the vertices of pairs of characters who appear together in an episode. Each edge carries a weight whose value is the number of episodes in the season in which the connected characters co-appear. The size of a vertex encodes the number of episodes in which a character appears in a given season. This value is also encoded in the vertex's colour.

The graphs have some common features. The largest nodes at the centre of each graph are the core Simpsons family unit: Homer, Marge, Bart, Lisa and Maggie. Occasionally, Maggie is absent from a few episodes, and in these seasons her vertex is slightly smaller than those of the rest of the family.

Surrounding the central family vertices are vertices of secondary characters who make frequent appearances although not in every episode. These include Abe Simpson (Grandpa), bartender Moe Szyslak, Homer's colleagues Carl Carlson and Lenny Leonard, Bart's school chum Milhouse Van Houten, and many more. Shown below is the central cluster for Season 14's graph.
The central cluster of the Season #14 co-appearance graph.
Further from the centre we find characters who make fewer appearances, and on the periphery are clusters of vertices representing characters who appear together in single episodes. Such a cluster is shown below for Episode 19 (Simple Simpson) of Season 15.
The characters whose only appearance in Season 15 is in Episode 19.
The graphs become larger and more complex with the progression of the seasons. Season #1's graph has 240 character vertices. This rises to 600 characters in Season #26. The graphs for Seasons #1 - #26 are shown below.

The Data

I obtained the data from Wikisimpsons. I wrote a PERL script to fetch and parse the characters appearing in each season's episodes. As is often the case sourcing and cleansing the data took considerable effort. Fortunately, Wikisimpsons is a wiki so I could correct some errors at source. Other problems require hacks and workarounds in the script. Even after this there are still some issues with the data that require attention.

This work assumes Wikisimpsons is 100% complete, consistent and correct. It isn't, so if you spot any problems then please contribute to this excellent wiki by fixing what you can.

The Graphs

My PERL script generates two files for each season: nodes.csv (vertices) and edges.csv (edges). I import these into Gephi and then layout the resulting graph. I used Gephi's force-directed algorithm ForceAtlas2. It attempts to layout the vertices such that those connected by edges are close together (the larger the edge weight, the shorter the edge) and those not connected by edges are kept separate.

ForceAtlas2 also has a parameter that tweaks the layout so that vertex overlap is avoided. I enabled this parameter once the layout had stabilized.

Gephi also supports manual layout. So once ForceAtlas2 had settled down I made some manual adjustments to bring outlying clusters closer to the main graph so as to produce a more compact layout.

The final graphs were exported from Gephi as SVG, converted to PNG images using Inkscape and labelled using ImageMagick.

Tools

Copyright

The graphs, SVGs, PNGs and script are available at GitHub under the MIT License.  

April 2, 2015

When Is Easter Sunday?

Easter Sunday is the Sunday following the first full moon after March 21. A few years ago I published this histogram of Easter Sunday dates. A friend then asked whether it was possible to visualize Easter Sunday dates from one year to the next. I've finally gotten around to producing such a visualization.

The time-series chart below shows 1000 years of Easter Sunday dates. You can find an interactive version here (modern browser required). Use the mini-chart to focus the top chart on a specific range of years.


This chart allows us to see repeating patterns in the sequence of Easter Sunday dates. The patterns don't always repeat exactly but the human visual system is good at spotting similar patterns. For example, I was quickly able to match the sequence of Easter Sunday dates starting in 1943 and 2038 (April 25, 21, 17, 13, ...). The sequences are identical for almost 30 years. If you spot any others please let us know.

Implementing this Easter Sundays chart was really just an excuse for me to experiment with the NVD3 library, a collection of reusable charts built atop the excellent d3.js library. The beauty of NVD3 is that it allows you to create fairly complex, interactive charts with only a few lines of code. The code I used is available here.

Happy Easter!


February 15, 2015

Triple-J Hottest 100 Artists 1993 – 2014

Triple-J's Hottest 100 is an annual music poll conducted by the ABC's youth radio station JJJ. The poll counts votes for listeners' favourite songs of the previous year, and has been run (in its current form) each year since 1993 with more than two million votes cast in 2014's Hottest 100.

I was curious to know which artists feature most prominently in the Hottest 100. So, I scraped the poll results for each year from Wikipedia, and used KNIME to aggregate entries for each artist. I then ranked artists by number of entries, followed by median entry position. This allowed me to come up with the "Hottest 100 Artists for 1993 – 2014".

I created an interactive chart using d3.js to visualize the Hottest 100 Artists. An interactive version of the chart is available here (you'll need a modern browser) and a screenshot shown below.

Hottest 100 Artists for 1993 – 2014. Interactive version here.

The chart is a scatter plot with a circle glyph for each Hottest 100 track by a popular artist. The position of a glyph is determined by its artist and the year it polled. Its colour denotes the track's rank in the poll (red=1; yellow=50; white=100). You can mouse-over a track to see more information about it, or mouse-over an artist/year label to highlight tracks for that artist/year.

We can see that Powderfinger currently tops the poll with 22 entries (median position 21.5; 1996 – 2009) followed by Foo Fighters (17 entries; median 37; 1995 – 2015). Lana Del Ray takes out the 100th position with five entries (median 32; 2011 – 2013).

I'll update and improve the chart in future years. Meanwhile, if you've any corrections or constructive criticism then please leave a comment. The source-code and data are available here.

December 11, 2014

Cartograms of the Periodic Table of Elements

I recently came across a couple of examples of cartograms of Mendeleev's periodic table of elements. Before sharing them let's travel back in time to the 1970s to see WF Sheehan's cartogram (shown below), which inspired these more recent works.
The Elements According to Relative Abundance

Sheehan mapped the relative abundance of elements in the earth's crust to the area assigned to each element in the table. As Sheehan said: The chart emphasises that in real life a chemist will probably meet, O, Si, Al, ... and that he better do something about it.

More recently, the Big Picture team at Google Research produced an interactive version of Sheehan's cartogram. In the Google version you can choose between several choices of mapping variable:
  • mentions in books
  • abundance in the human body
  • abundance in the earth's crust
  • abundance in the sea
  • abundance in the sun
  • volume
  • volume (excluding gases)
Below, for example, is the cartogram for relative abundance in the earth's crust.


Additionally, you can choose to represent the mapping variable in several ways:
  • bars
  • cubes (as shown above)
  • electron rings (not a mapping variable; shown below)
The on-line version is interactive so you can experiment with the settings. Mouse-over an element in the table to display a tool-tip with additional information about the element.

Along similar lines is the Elemental Cartograms tool developed by Babak Sanii, that allows you to specify your own table of elemental data, and will generate a cartogram accordingly. Below, for example, is the availability of elements for purchase on Amazon. You can find many more weird and wonderful examples on the Elemental Cartograms Tumblr feed.

November 26, 2014

Stacey Barr: The First Three Steps To Get KPI Buy-In

Last week I attended a webinar by Stacey Barr to launch her new book Practical Performance Measurement, which describes Stacey's PuMP Blueprint for developing performance measurement processes.

The webinar covered the preparatory steps in performing meaningful performance measurement, including
  1. Why performance measurement is difficult
  2. What's wrong with current wisdom about KPIs
  3. What actually works
The webinar also provided a brief overview of the PuMP Blueprint.

If performance measurement is an important part of your work or that of your organisation then you can find out more here.

November 24, 2014

Stephen King Screen Adaptations (Plotly)

Stephen King is a prolific author, whose books I've enjoyed reading since I was a teenager. His prodigious written output has spawned many screen adaptations for film and television, but in many cases I've been disappointed by the screen versions; see, for example, the dreadful "Under the Dome" TV mini-series.

I decided to look at how well-received King's films have been compared with his books. I found a list of screen adaptations, and for each looked up the book's rating on Goodreads and the movie's rating on IMDb. I necessarily omitted screenplays, movie sequels (not a adapted from a King book) and short stories that contributed to only a portion of a movie. I then imported this data into Plotly and produced the chart shown below
Mouse over a glyph to display details.

The chart reveals a positive correlation between the ratings of King's books and their screen adaptations. Highly rated novels such as "The Green Mile", "Rita Hayworth & The Shawshank Redemption" and "The Body" produced well-regarded movies, whereas poorly rated stories such as "Trucks", "The Mangler" and "Tommyknockers" resulted in absolute stinkers on screen.

We can also see that TV adaptations (wide glyphs) were generally less well-received than were film adaptations (tall glyphs). So too short stories (orange glyphs) and their screen adaptations tend not to rate as highly as novels (blue glyphs) and novellas (green glyphs), and their screen adaptations.

Incidentally, this was my first time using Plotly. I was able to import my data and generate a scatter plot with relative ease. Customising it for my needs took a little longer as I was new to the tool. I'll definitely use Plotly again.

November 11, 2014

Visualizing how my personal tax was spent

I received my tax assessment yesterday. On the last page was the bar chart shown below, which visualizes where my "personal tax was spent, based on 2014-15 Budget estimates" (according to the caption). To the right of each bar is a dollar amount (obscured) that represents the portion of my taxes spent in each category.

Where my "personal tax was spent, based on 2014-15 Budget estimates".
































I've not seen this chart on previous years' tax assessments. It provides a useful indication of where the Federal government (expects) to spend our personal taxes.

The chart is simple but effective. Sorting from largest to smallest is a good choice, as is the breakdown of the Welfare budget into sub-categories. I don't believe the colours encode any information. I'm glad they didn't use a (3D) pie chart which so often blights public reports of budget expenditure.

I'll be interested to see what charts accompany my tax assessment next year. I'd be interested to see some historical information such as budgeted versus actual expenditure, or the change in amount of tax paid.