April 2, 2015

When Is Easter Sunday?

Easter Sunday is the Sunday following the first full moon after March 21. A few years ago I published this histogram of Easter Sunday dates. A friend then asked whether it was possible to visualize Easter Sunday dates from one year to the next. I've finally gotten around to producing such a visualization.

The time-series chart below shows 1000 years of Easter Sunday dates. You can find an interactive version here (modern browser required). Use the mini-chart to focus the top chart on a specific range of years.

This chart allows us to see repeating patterns in the sequence of Easter Sunday dates. The patterns don't always repeat exactly but the human visual system is good at spotting similar patterns. For example, I was quickly able to match the sequence of Easter Sunday dates starting in 1943 and 2038 (April 25, 21, 17, 13, ...). The sequences are identical for almost 30 years. If you spot any others please let us know.

Implementing this Easter Sundays chart was really just an excuse for me to experiment with the NVD3 library, a collection of reusable charts built atop the excellent d3.js library. The beauty of NVD3 is that it allows you to create fairly complex, interactive charts with only a few lines of code. The code I used is available here.

Happy Easter!

February 15, 2015

Triple-J Hottest 100 Artists 1993 – 2014

Triple-J's Hottest 100 is an annual music poll conducted by the ABC's youth radio station JJJ. The poll counts votes for listeners' favourite songs of the previous year, and has been run (in its current form) each year since 1993 with more than two million votes cast in 2014's Hottest 100.

I was curious to know which artists feature most prominently in the Hottest 100. So, I scraped the poll results for each year from Wikipedia, and used KNIME to aggregate entries for each artist. I then ranked artists by number of entries, followed by median entry position. This allowed me to come up with the "Hottest 100 Artists for 1993 – 2014".

I created an interactive chart using d3.js to visualize the Hottest 100 Artists. An interactive version of the chart is available here (you'll need a modern browser) and a screenshot shown below.

Hottest 100 Artists for 1993 – 2014. Interactive version here.

The chart is a scatter plot with a circle glyph for each Hottest 100 track by a popular artist. The position of a glyph is determined by its artist and the year it polled. Its colour denotes the track's rank in the poll (red=1; yellow=50; white=100). You can mouse-over a track to see more information about it, or mouse-over an artist/year label to highlight tracks for that artist/year.

We can see that Powderfinger currently tops the poll with 22 entries (median position 21.5; 1996 – 2009) followed by Foo Fighters (17 entries; median 37; 1995 – 2015). Lana Del Ray takes out the 100th position with five entries (median 32; 2011 – 2013).

I'll update and improve the chart in future years. Meanwhile, if you've any corrections or constructive criticism then please leave a comment. The source-code and data are available here.

December 11, 2014

Cartograms of the Periodic Table of Elements

I recently came across a couple of examples of cartograms of Mendeleev's periodic table of elements. Before sharing them let's travel back in time to the 1970s to see WF Sheehan's cartogram (shown below), which inspired these more recent works.
The Elements According to Relative Abundance

Sheehan mapped the relative abundance of elements in the earth's crust to the area assigned to each element in the table. As Sheehan said: The chart emphasises that in real life a chemist will probably meet, O, Si, Al, ... and that he better do something about it.

More recently, the Big Picture team at Google Research produced an interactive version of Sheehan's cartogram. In the Google version you can choose between several choices of mapping variable:
  • mentions in books
  • abundance in the human body
  • abundance in the earth's crust
  • abundance in the sea
  • abundance in the sun
  • volume
  • volume (excluding gases)
Below, for example, is the cartogram for relative abundance in the earth's crust.

Additionally, you can choose to represent the mapping variable in several ways:
  • bars
  • cubes (as shown above)
  • electron rings (not a mapping variable; shown below)
The on-line version is interactive so you can experiment with the settings. Mouse-over an element in the table to display a tool-tip with additional information about the element.

Along similar lines is the Elemental Cartograms tool developed by Babak Sanii, that allows you to specify your own table of elemental data, and will generate a cartogram accordingly. Below, for example, is the availability of elements for purchase on Amazon. You can find many more weird and wonderful examples on the Elemental Cartograms Tumblr feed.

November 26, 2014

Stacey Barr: The First Three Steps To Get KPI Buy-In

Last week I attended a webinar by Stacey Barr to launch her new book Practical Performance Measurement, which describes Stacey's PuMP Blueprint for developing performance measurement processes.

The webinar covered the preparatory steps in performing meaningful performance measurement, including
  1. Why performance measurement is difficult
  2. What's wrong with current wisdom about KPIs
  3. What actually works
The webinar also provided a brief overview of the PuMP Blueprint.

If performance measurement is an important part of your work or that of your organisation then you can find out more here.

November 24, 2014

Stephen King Screen Adaptations (Plotly)

Stephen King is a prolific author, whose books I've enjoyed reading since I was a teenager. His prodigious written output has spawned many screen adaptations for film and television, but in many cases I've been disappointed by the screen versions; see, for example, the dreadful "Under the Dome" TV mini-series.

I decided to look at how well-received King's films have been compared with his books. I found a list of screen adaptations, and for each looked up the book's rating on Goodreads and the movie's rating on IMDb. I necessarily omitted screenplays, movie sequels (not a adapted from a King book) and short stories that contributed to only a portion of a movie. I then imported this data into Plotly and produced the chart shown below
Mouse over a glyph to display details.

The chart reveals a positive correlation between the ratings of King's books and their screen adaptations. Highly rated novels such as "The Green Mile", "Rita Hayworth & The Shawshank Redemption" and "The Body" produced well-regarded movies, whereas poorly rated stories such as "Trucks", "The Mangler" and "Tommyknockers" resulted in absolute stinkers on screen.

We can also see that TV adaptations (wide glyphs) were generally less well-received than were film adaptations (tall glyphs). So too short stories (orange glyphs) and their screen adaptations tend not to rate as highly as novels (blue glyphs) and novellas (green glyphs), and their screen adaptations.

Incidentally, this was my first time using Plotly. I was able to import my data and generate a scatter plot with relative ease. Customising it for my needs took a little longer as I was new to the tool. I'll definitely use Plotly again.

November 11, 2014

Visualizing how my personal tax was spent

I received my tax assessment yesterday. On the last page was the bar chart shown below, which visualizes where my "personal tax was spent, based on 2014-15 Budget estimates" (according to the caption). To the right of each bar is a dollar amount (obscured) that represents the portion of my taxes spent in each category.

Where my "personal tax was spent, based on 2014-15 Budget estimates".

I've not seen this chart on previous years' tax assessments. It provides a useful indication of where the Federal government (expects) to spend our personal taxes.

The chart is simple but effective. Sorting from largest to smallest is a good choice, as is the breakdown of the Welfare budget into sub-categories. I don't believe the colours encode any information. I'm glad they didn't use a (3D) pie chart which so often blights public reports of budget expenditure.

I'll be interested to see what charts accompany my tax assessment next year. I'd be interested to see some historical information such as budgeted versus actual expenditure, or the change in amount of tax paid.

September 4, 2014

The Simpsons Social Network (Season 1)

I've been a fan of The Simpsons ever since Season #1 was first broadcast. So, I was recently thinking about visualizing the social network (no, not this one) of Simpsons characters.

Constructing the network of social relationships between various Simpsons characters would be a difficult and time-consuming process (does Lisa even have any friends?) So, I opted for a different network that can be constructed programmatically; the network of character co-appearances. In this network, two characters are connected if they appear in the same episode of The Simpsons. This network is similar to the one constructed for film actors that allows us to determine six degrees of Kevin Bacon.

The Simpsons co-appearances network can be constructed by parsing the episodes pages of Wikisimpsons. Mathematically speaking, the network is a graph. Each node of the graph represents a Simpsons character. An (undirected) edge connects each pair of nodes whose characters appear in the same episode. To each edge I add a weight; the number of episodes in which the pair of characters co-appear. I also label each node with the number of episodes in which its character appears.

Having constructed the graph we can set about visualizing it. Visualizing graphs helps you understand the structure of a network. So the choice of graph-layout algorithm is critical. If you impose a hierarchical layout, you'll see hierarchies. If you impose a circular layout you'll see circles.

For this reason I've used a force-directed layout, which attempts to position the nodes such that the distance between any pair of connected nodes is inversely proportional to the weight on the edge between them. This results in characters who co-appear often having their nodes positioned close together, while those that don't will have their nodes separated.

To do this I used Gephi the "open source graph visualization platform". Gephi allows you to experiment with various layout algorithms and customize the appearance of your graph. You can easily apply different colour maps, labelling and rendering attributes to your graph's nodes and edges. Gephi has tools for filtering nodes and edges, and an arsenal of graph theoretic indices can be calculated.

I constructed a co-appearances graph for Season 1 of the Simpsons and loaded it into Gephi. I applied the following settings:
  • Layout: ForceAtlas 2
  • Node size and colour: number of episodes in which a character makes an appearance
  • Edge colour: number of episodes in which characters connected by the edge co-appearance
The resulting graph is shown below. High-resolution renderings are also available (PNG, PDF, SVG).
Graph of Simpsons characters co-appearances in Season 1.

The graph shows us several things. The "central" characters - Homer, Marge, Bart, Lisa and Maggie Simpson - form a cluster at the centre of the graph. They have the largest, darkest nodes because they appear in every episode of Season 1.

Around this central cluster are positioned smaller, lighter nodes for characters who appear frequently but not in every episode; characters like Milhouse Van Houten, Moe Szyslac, Barney Gumble, Monty Burns and Waylon Smithers. Notice that Burns and Smithers, and Moe and Barney are positioned close together as they often appear in the same episodes.
The central cluster of the Simpsons co-appearance graph.

On the outer edges of the graph are clusters of characters who appear together in a single episode. Below we see the cluster (of minor characters) for episode 7 "The Call of the Simpsons". Between these episode clusters are positioned characters who appear in two or three episodes.
Cluster of minor characters appearing in episode 7 "The Call of the Simpsons".
If you'd like to experiment with this graph you can download it from Github.