tag:blogger.com,1999:blog-31846946753119408122024-02-02T12:02:40.211-08:00VisLives!Visualization as I see it.Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comBlogger70125tag:blogger.com,1999:blog-3184694675311940812.post-60893817198742539012019-07-04T22:42:00.001-07:002019-07-04T23:02:58.138-07:00Interactively ranking AFL teams by their player listsIt's been a while...<br />
<br />
I recently completed an interactive chart of AFL teams and the aggregated statistics for their player lists - see the image below or the <a href="https://cpudney.github.io/aflteams/" target="_blank">interactive version</a>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMTeFghLjKqXWowyqHB_KxMYgoiLAG4U0e-a-Y3hn_vCZJVflxgXLNqu-9nlRGcVl_Yu3WRDm7iL4zoz6ZKVUQMFPy05OizEiuYQDJfeKQGYeuBXdb6LaMrSsc4dsS8nh2qE-qyyd8TVNN/s1600/aflteams.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="773" data-original-width="1600" height="308" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMTeFghLjKqXWowyqHB_KxMYgoiLAG4U0e-a-Y3hn_vCZJVflxgXLNqu-9nlRGcVl_Yu3WRDm7iL4zoz6ZKVUQMFPy05OizEiuYQDJfeKQGYeuBXdb6LaMrSsc4dsS8nh2qE-qyyd8TVNN/s640/aflteams.png" width="640" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
Click on the <i>Age</i>, <i>Games</i>, <i>Height</i> or <i>Weight</i> button to reorder the teams by their average measures. Mouse-over, the red circular markers to display the average measure in a tooltip.<br />
<br />
I used <a href="http://d3js.org/" target="_blank">D3.js</a> v5 to implement the chart. The source-code is available in <a href="https://github.com/cpudney/aflteams/" target="_blank">Github</a>.<br />
<br />
Data source: <a href="http://www.footywire.com/" target="_blank">FootyWire.com</a>Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-15349708901325587482016-05-19T18:37:00.000-07:002016-05-19T18:37:04.455-07:00The Simpsons Social Networks: Seasons #1 - #26Back in 2014 I used Gephi to layout a <a href="http://www.vislives.com/2014/09/the-simpons-social-netwok-season-1.html" target="_blank">graph of co-appearances of characters in Season #1 of The Simpsons</a>. I've now repeated that effort for Seasons #1 through #26.<br />
<br />
To recap, the graphs visualize the co-appearance networks of each season of The Simpsons. Each graph vertex represents a character, edges connect the vertices of pairs of characters who appear together in an episode. Each edge carries a weight whose value is the number of episodes in the season in which the connected characters co-appear. The size of a vertex encodes the number of episodes in which a character appears in a given season. This value is also encoded in the vertex's colour.<br />
<br />
The graphs have some common features. The largest nodes at the centre of each graph are the core Simpsons family unit: Homer, Marge, Bart, Lisa and Maggie. Occasionally, Maggie is absent from a few episodes, and in these seasons her vertex is slightly smaller than those of the rest of the family.<br />
<br />
Surrounding the central family vertices are vertices of secondary characters who make frequent appearances although not in every episode. These include Abe Simpson (Grandpa), bartender Moe Szyslak, Homer's colleagues Carl Carlson and Lenny Leonard, Bart's school chum Milhouse Van Houten, and many more. Shown below is the central cluster for <a href="https://simpsonswiki.com/wiki/Season_14" target="_blank">Season 14</a>'s graph.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqva9nn8h4GGjEncVD9asv1hFm9w3NOBSLoz99d64VkhBheG-AtPQ0_xQiED3WhK3Ak21Zi6ihSD-U_3zkwxp0peSd_JkaFkqZPkMpTQpd-y3trQKjL87oqpTa1qiZj3rNUFKntrjmX90D/s1600/centre14.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="553" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqva9nn8h4GGjEncVD9asv1hFm9w3NOBSLoz99d64VkhBheG-AtPQ0_xQiED3WhK3Ak21Zi6ihSD-U_3zkwxp0peSd_JkaFkqZPkMpTQpd-y3trQKjL87oqpTa1qiZj3rNUFKntrjmX90D/s640/centre14.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The central cluster of the Season #14 co-appearance graph.</td></tr>
</tbody></table>
Further from the centre we find characters who make fewer appearances, and on the periphery are clusters of vertices representing characters who appear together in single episodes. Such a cluster is shown below for Episode 19 (<i><a href="https://simpsonswiki.com/wiki/Simple_Simpson" target="_blank">Simple Simpson</a></i>) of Season 15.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSEpKqqfMHPNr8TbqpmIpY16ZMutAU7YZMhcaxBf04F6R4iQd3wwwD0OV8R9uY2kWXL2Nx5QJwMxfIxL2QLzM3lApTFF85fgc70P_TdjfrGMtBkOJsC45KsGN7yprb6Q9DfFPMkmhQKcST/s1600/season15ep19.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="628" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSEpKqqfMHPNr8TbqpmIpY16ZMutAU7YZMhcaxBf04F6R4iQd3wwwD0OV8R9uY2kWXL2Nx5QJwMxfIxL2QLzM3lApTFF85fgc70P_TdjfrGMtBkOJsC45KsGN7yprb6Q9DfFPMkmhQKcST/s640/season15ep19.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The characters whose only appearance in Season 15 is in Episode 19.</td></tr>
</tbody></table>
The graphs become larger and more complex with the progression of the seasons. Season #1's graph has 240 character vertices. This rises to 600 characters in Season #26. The graphs for Seasons #1 - #26 are shown below.<br />
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season01/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season01/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season02/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season02/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season03/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season03/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season04/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season04/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season05/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season05/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season06/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season06/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season07/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season07/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season08/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season08/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season09/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season09/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season10/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season10/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season11/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season11/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season12/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season12/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season13/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season13/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season14/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season14/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season15/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season15/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season16/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season16/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season17/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season17/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season18/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season18/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season19/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season19/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season20/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season20/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season21/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season21/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season22/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season22/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season23/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season23/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season24/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season24/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season25/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season25/graph.png" width="150" /></a>
<a href="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season26/graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/cpudney/simpsons-seasons-networks/raw/master/season26/graph.png" width="150" /></a><br />
<div>
<h2>
The Data</h2>
I obtained the data from <a href="https://simpsonswiki.com/" target="_blank">Wikisimpsons</a>. I wrote a PERL script to fetch and parse the characters appearing in each season's episodes. As is often the case sourcing and cleansing the data took considerable effort. Fortunately, Wikisimpsons is a wiki so I could correct some errors at source. Other problems require hacks and workarounds in the script. Even after this there are still some issues with the data that require attention.<br />
<br />
This work assumes Wikisimpsons is 100% complete, consistent and correct. It isn't, so if you spot any problems then please contribute to this excellent wiki by fixing what you can.<br />
<div>
<h2>
The Graphs</h2>
My PERL script generates two files for each season: nodes.csv (vertices) and edges.csv (edges). I import these into Gephi and then layout the resulting graph. I used Gephi's force-directed algorithm <a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0098679" target="_blank">ForceAtlas2</a>. It attempts to layout the vertices such that those connected by edges are close together (the larger the edge weight, the shorter the edge) and those not connected by edges are kept separate.<br />
<br />
ForceAtlas2 also has a parameter that tweaks the layout so that vertex overlap is avoided. I enabled this parameter once the layout had stabilized.<br />
<br />
Gephi also supports manual layout. So once ForceAtlas2 had settled down I made some manual adjustments to bring outlying clusters closer to the main graph so as to produce a more compact layout.<br />
<br />
The final graphs were exported from Gephi as SVG, converted to PNG images using Inkscape and labelled using ImageMagick.</div>
<h2>
Tools</h2>
<ul>
<li><a href="http://gephi.org/" target="_blank">Gephi</a> for graph layout</li>
<li><a href="http://www.perl.org/" target="_blank">PERL</a> for data scraping</li>
<li><a href="http://inkscape.org/" target="_blank">Inkscape</a> for SVG to PNG conversion</li>
<li><a href="http://www.imagemagick.com/" target="_blank">ImageMagick</a> for image labeling</li>
</ul>
<h3>
Copyright</h3>
<div>
The graphs, SVGs, PNGs and script are available at <a href="https://github.com/cpudney/simpsons-seasons-networks" target="_blank">GitHub</a> under the <a href="https://github.com/cpudney/simpsons-seasons-networks/blob/master/LICENSE" target="_blank">MIT License</a>. </div>
</div>
Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-87916015197391937542015-04-02T19:53:00.000-07:002015-04-02T19:53:13.859-07:00When Is Easter Sunday?<a href="http://en.wikipedia.org/wiki/Easter#Date" target="_blank">Easter Sunday</a> is the Sunday following the first full moon after March 21. A few years ago I published <a href="http://www.vislives.com/2012/04/when-is-easter-sunday.html" target="_blank">this histogram of Easter Sunday dates.</a> A friend then asked whether it was possible to visualize Easter Sunday dates from one year to the next. I've finally gotten around to producing such a visualization.<br />
<br />
The time-series chart below shows 1000 years of Easter Sunday dates. You can find an interactive version <a href="http://bl.ocks.org/cpudney/raw/6d5ef950071e60b98122/" target="_blank">here</a> (modern browser required). Use the mini-chart to focus the top chart on a specific range of years.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVSIt8IAGbI6y9OJ3M8etJ-T3zgW0XZ_rPnuezZeO5ZRX0z9JXGJWfH2JGR8uOWN-6onxh3Sp3Fe5D092blQ9TG5x3EtjaUcuVXvtSorNb2wlhh6ltN-RTuVOcL4ihk7TovlGO4O6_GluO/s1600/Easter+Sunday+Time+Series.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVSIt8IAGbI6y9OJ3M8etJ-T3zgW0XZ_rPnuezZeO5ZRX0z9JXGJWfH2JGR8uOWN-6onxh3Sp3Fe5D092blQ9TG5x3EtjaUcuVXvtSorNb2wlhh6ltN-RTuVOcL4ihk7TovlGO4O6_GluO/s1600/Easter+Sunday+Time+Series.png" height="314" width="640" /></a></div>
<br />
This chart allows us to see repeating patterns in the sequence of Easter Sunday dates. The patterns don't always repeat exactly but the human visual system is good at spotting similar patterns. For example, I was quickly able to match the sequence of Easter Sunday dates starting in 1943 and 2038 (April 25, 21, 17, 13, ...). The sequences are identical for almost 30 years. If you spot any others please let us know.<br />
<br />
Implementing this Easter Sundays chart was really just an excuse for me to experiment with the <a href="http://nvd3.org/" target="_blank">NVD3 library</a>, a collection of reusable charts built atop the excellent d3.js library. The beauty of NVD3 is that it allows you to create fairly complex, interactive charts with only a few lines of code. The code I used is available <a href="https://gist.github.com/cpudney/6d5ef950071e60b98122" target="_blank">here</a>.<br />
<br />
Happy Easter!<br />
<br />
<br />
<div style="text-align: center;">
<span style="font-size: xx-small;">
</span><span style="font-size: xx-small;">
This work is licensed under a <a href="http://creativecommons.org/licenses/by/4.0/" rel="license">Creative Commons Attribution 4.0 International License</a></span><br /><span style="font-size: xx-small;">
<a href="http://creativecommons.org/licenses/by/4.0/" rel="license"><img alt="Creative Commons License" src="http://i.creativecommons.org/l/by/4.0/80x15.png" style="border-width: 0;" /></a></span></div>
Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-30487477650352420142015-02-15T17:31:00.003-08:002015-02-15T17:33:10.902-08:00Triple-J Hottest 100 Artists 1993 – 2014Triple-J's Hottest 100 is an annual music poll conducted by the ABC's youth radio station <a href="http://www.abc.net.au/triplej/" target="_blank">JJJ</a>. The poll counts votes for listeners' favourite songs of the previous year, and has been run (in its current form) each year since 1993 with more than two million votes cast in 2014's Hottest 100.<br />
<br />
I was curious to know which artists feature most prominently in the Hottest 100. So, I scraped the poll results for each year from <a href="http://en.wikipedia.org/wiki/Category:Triple_J_Hottest_100" target="_blank">Wikipedia</a>, and used <a href="http://knime.org/" target="_blank">KNIME</a> to aggregate entries for each artist. I then ranked artists by number of entries, followed by median entry position. This allowed me to come up with the "Hottest 100 Artists for 1993 – 2014".<br />
<br />
I created an interactive chart using <a href="http://d3js.org/" target="_blank">d3.js</a> to visualize the Hottest 100 Artists. An interactive version of the chart is available <a href="http://bl.ocks.org/cpudney/raw/06a28f9454d7e389b676/" target="_blank">here</a> (you'll need a modern browser) and a screenshot shown below.<br />
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjnZyJlhaBcuNztA-6ZLm3Np-LCYqxi6OReHQVvNApN1TW9Ip762LbVKjPy59R-jbQpQFdLjPmyTfqSo3wBVw-MMuhuNeAfNf4IFRMUEf69q766I4SoOwO51BUcJVzIH_TDLLYYnk7aMmyo/s1600/jjj+hottest+100+artists.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjnZyJlhaBcuNztA-6ZLm3Np-LCYqxi6OReHQVvNApN1TW9Ip762LbVKjPy59R-jbQpQFdLjPmyTfqSo3wBVw-MMuhuNeAfNf4IFRMUEf69q766I4SoOwO51BUcJVzIH_TDLLYYnk7aMmyo/s1600/jjj+hottest+100+artists.png" height="636" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Hottest 100 Artists for 1993 – 2014. Interactive version <a href="http://bl.ocks.org/cpudney/raw/06a28f9454d7e389b676/" target="_blank">here</a>.</td></tr>
</tbody></table>
<br />
The chart is a scatter plot with a circle glyph for each Hottest 100 track by a popular artist. The position of a glyph is determined by its artist and the year it polled. Its colour denotes the track's rank in the poll (red=1; yellow=50; white=100). You can mouse-over a track to see more information about it, or mouse-over an artist/year label to highlight tracks for that artist/year.<br />
<br />
We can see that Powderfinger currently tops the poll with 22 entries (median position 21.5; 1996 – 2009) followed by Foo Fighters (17 entries; median 37; 1995 – 2015). Lana Del Ray takes out the 100th position with five entries (median 32; 2011 – 2013).<br />
<br />
I'll update and improve the chart in future years. Meanwhile, if you've any corrections or constructive criticism then please leave a comment. The source-code and data are available <a href="https://gist.github.com/cpudney/06a28f9454d7e389b676" target="_blank">here</a>.<br />
<br />
<div style="text-align: center;">
<span style="font-size: x-small;">
</span>
<span style="font-size: x-small;">
This work is licensed under a <a href="http://creativecommons.org/licenses/by/4.0/" rel="license">Creative Commons Attribution 4.0 International License</a><br />
<a href="http://creativecommons.org/licenses/by/4.0/" rel="license"><img alt="Creative Commons License" src="http://i.creativecommons.org/l/by/4.0/80x15.png" style="border-width: 0;" /></a></span><br />
<span style="font-size: x-small;">
</span></div>
Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-30330735386473562642014-12-11T18:56:00.001-08:002014-12-11T18:59:20.882-08:00Cartograms of the Periodic Table of ElementsI recently came across a couple of examples of cartograms of Mendeleev's periodic table of elements. Before sharing them let's travel back in time to the 1970s to see WF Sheehan's cartogram (shown below), which inspired these more recent works.<br />
<div class="visually_embed">
<img alt="The Elements According to Relative Abundance" class="visually_embed_infographic" src="http://thumbnails-visually.netdna-ssl.com/the-elements-according-to-relative-abundance_50882e44c2027_w538.jpeg" /><br />
<div class="visually_embed_cycle">
</div>
<script class="visually_embed_script" id="visually_embed_script_56012" src="http://a.visual.ly/api/embed/56012?width=538" type="text/javascript"></script></div>
<br />
Sheehan mapped the relative abundance of elements in the earth's crust to the area assigned to each element in the table. As Sheehan said:<i> The chart emphasises that in real life a chemist will probably meet, O, Si, Al, ... and that he better do something about it.</i><br />
<br />
More recently, the Big Picture team at Google Research produced an interactive version of Sheehan's cartogram. In the Google version you can choose between several choices of mapping variable:<br />
<ul>
<li>mentions in books</li>
<li>abundance in the human body</li>
<li>abundance in the earth's crust</li>
<li>abundance in the sea</li>
<li>abundance in the sun</li>
<li>volume</li>
<li>volume (excluding gases)</li>
</ul>
Below, for example, is the cartogram for relative abundance in the earth's crust.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZQgbS_SvHRn14ubcDBTvv4Mf0iHRgLF8skjaUZy4ugoCey3XyS3PuNPAAr89pMNPh41GyyBp7D8-jxwQfjthMNqhh9_r3Sy0n9nRAk_pDTH-sUqTotAoXR3HblocNi_5l0b7k6lEOqPyo/s1600/periodicTableEarth.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZQgbS_SvHRn14ubcDBTvv4Mf0iHRgLF8skjaUZy4ugoCey3XyS3PuNPAAr89pMNPh41GyyBp7D8-jxwQfjthMNqhh9_r3Sy0n9nRAk_pDTH-sUqTotAoXR3HblocNi_5l0b7k6lEOqPyo/s1600/periodicTableEarth.png" height="338" width="640" /></a></div>
<br />
Additionally, you can choose to represent the mapping variable in several ways:<br />
<ul>
<li>bars</li>
<li>cubes (as shown above)</li>
<li>electron rings (not a mapping variable; shown below)</li>
</ul>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgM6CZS1aGqoY-0yHBBCQtr2CvzKkG9EbDZxC3A0UswJ9ULgsztqmX6BAZnNVpDJLpfzJJsbgS8i53bOPjl4CZW0CB7ZPrezEGviLUnTAGwalXJaQzpXDuAWtWrx85B4oxhP45t9cklp16c/s1600/periodicTableElectrons.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgM6CZS1aGqoY-0yHBBCQtr2CvzKkG9EbDZxC3A0UswJ9ULgsztqmX6BAZnNVpDJLpfzJJsbgS8i53bOPjl4CZW0CB7ZPrezEGviLUnTAGwalXJaQzpXDuAWtWrx85B4oxhP45t9cklp16c/s1600/periodicTableElectrons.png" height="338" width="640" /></a>The <a href="https://research.google.com/bigpicture/elements/" target="_blank">on-line version</a> is interactive so you can experiment with the settings. Mouse-over an element in the table to display a tool-tip with additional information about the element.<br />
<br />
Along similar lines is the <a href="http://bsanii.jsd.claremont.edu/ElementalCartograms.html" target="_blank">Elemental Cartograms</a> tool developed by <a href="http://faculty.kecksci.claremont.edu/bsanii/" target="_blank" title="Babak's homepage">Babak Sanii</a>, that allows you to specify your own table of elemental data, and will generate a cartogram accordingly. Below, for example, is the availability of elements for purchase on Amazon. You can find many more weird and wonderful examples on the <a href="http://elementalcartograms.tumblr.com/" target="_blank">Elemental Cartograms Tumblr feed</a>.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://40.media.tumblr.com/e9cc59df5d2c2c104ee46c0caa13daa8/tumblr_n6kxfeYpUe1tbdaaso1_1280.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://40.media.tumblr.com/e9cc59df5d2c2c104ee46c0caa13daa8/tumblr_n6kxfeYpUe1tbdaaso1_1280.jpg" height="480" width="640" /></a></div>
<br />Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-25274799225694196082014-11-26T16:56:00.000-08:002014-11-26T16:56:38.008-08:00Stacey Barr: The First Three Steps To Get KPI Buy-InLast week I attended a webinar by Stacey Barr to launch her new book <a href="http://www.practicalperformancemeasurement.com/" target="_blank">Practical Performance Measurement</a>, which describes Stacey's <a href="http://staceybarr.com/about/pump/" target="_blank">PuMP Blueprint</a> for developing performance measurement processes.<br />
<br />
The webinar covered the preparatory steps in performing meaningful performance measurement, including<br /><ol>
<li>Why performance measurement is difficult</li>
<li>What's wrong with current wisdom about KPIs </li>
<li>What actually works</li>
</ol>
The webinar also provided a brief overview of the PuMP Blueprint.<br />
<br />
If performance measurement is an important part of your work or that of your organisation then you can find out more <a href="http://staceybarr.com/webinars/3stepstogetkpibuyin/" target="_blank">here</a>.Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-38894588375664899162014-11-24T17:24:00.001-08:002014-11-25T16:21:15.718-08:00Stephen King Screen Adaptations (Plotly)Stephen King is a prolific author, whose books I've enjoyed reading since I was a teenager. His prodigious written output has spawned many screen adaptations for film and television, but in many cases I've been disappointed by the screen versions; see, for example, the dreadful "Under the Dome" TV mini-series.<br />
<br />
I decided to look at how well-received King's films have been compared with his books. I found a <a href="http://stephenking.com/library/video.html" target="_blank">list of screen adaptations</a>, and for each looked up the book's rating on <a href="http://goodreads.com/" target="_blank">Goodreads</a> and the movie's rating on <a href="http://imdb.com/" target="_blank">IMDb</a>. I necessarily omitted screenplays, movie sequels (not a adapted from a King book) and short stories that contributed to only a portion of a movie. I then imported this data into <a href="http://plotly.com/" target="_blank">Plotly</a> and produced the chart shown below <br />
<div style="text-align: center;">
<iframe frameborder="0" height="480" scrolling="no" seamless="seamless" src="https://plot.ly/~ChrisPudney/18.embed?width=640&height=480" width="640"></iframe>
<span style="font-size: x-small;"><i>Mouse over a glyph to display details.</i></span></div>
<br />
The chart reveals a positive correlation between the ratings of King's books and their screen adaptations. Highly rated novels such as "The Green Mile", "Rita Hayworth & The Shawshank Redemption" and "The Body" produced well-regarded movies, whereas poorly rated stories such as "Trucks", "The Mangler" and "Tommyknockers" resulted in absolute stinkers on screen.<br />
<br />
We can also see that TV adaptations (wide glyphs) were generally less well-received than were film adaptations (tall glyphs). So too short stories (orange glyphs) and their screen adaptations tend not to rate as highly as novels (blue glyphs) and novellas (green glyphs), and their screen adaptations.<br />
<br />
Incidentally, this was my first time using Plotly. I was able to import my data and generate a scatter plot with relative ease. Customising it for my needs took a little longer as I was new to the tool. I'll definitely use Plotly again.Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-84248217840064976602014-11-11T17:14:00.001-08:002014-11-11T23:13:42.339-08:00Visualizing how my personal tax was spentI received my tax assessment yesterday. On the last page was the bar chart shown below, which visualizes where my "personal tax was spent, based on 2014-15 Budget estimates" (according to the caption). To the right of each bar is a dollar amount (obscured) that represents the portion of my taxes spent in each category.<br />
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxzIXyBaoOMSm9B15AuomkX4BGBc2SFn4N3M-TeqoxCy7n1RF7K7CLTmGAwbIEC16WJAFyXuDvgRqD-MpiiS-5BFlqreNhLS_qZG4nQdEi8fYq4UoC6KeS-fQJIcvvr4seSftptxnZ6SHi/s1600/tax+assessment.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="566" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxzIXyBaoOMSm9B15AuomkX4BGBc2SFn4N3M-TeqoxCy7n1RF7K7CLTmGAwbIEC16WJAFyXuDvgRqD-MpiiS-5BFlqreNhLS_qZG4nQdEi8fYq4UoC6KeS-fQJIcvvr4seSftptxnZ6SHi/s640/tax+assessment.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Where my "personal tax was spent, based on 2014-15 Budget estimates".</td></tr>
</tbody></table>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
I've not seen this chart on previous years' tax assessments. It provides a useful indication of where the Federal government (expects) to spend our personal taxes.<br />
<br />
The chart is simple but effective. Sorting from largest to smallest is a good choice, as is the breakdown of the Welfare budget into sub-categories. I don't believe the colours encode any information. I'm glad they didn't use a (3D) pie chart which so often blights public reports of budget expenditure.<br />
<br />
I'll be interested to see what charts accompany my tax assessment next year. I'd be interested to see some historical information such as budgeted versus actual expenditure, or the change in amount of tax paid.Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-64550280780254020232014-09-04T18:30:00.000-07:002014-09-04T22:54:33.172-07:00The Simpsons Social Network (Season 1)I've been a fan of The Simpsons ever since Season #1 was first broadcast. So, I was recently thinking about visualizing the social network (no, not <a href="http://simpsonswiki.com/wiki/The_D%27oh-cial_Network" target="_blank">this one</a>) of Simpsons characters.<br />
<br />
Constructing the network of social relationships between various Simpsons characters would be a difficult and time-consuming process (<a href="http://simpsonswiki.com/wiki/Pay_Pal" target="_blank">does Lisa even have any friends</a>?) So, I opted for a different network that can be constructed programmatically; the network of character co-appearances. In this network, two characters are connected if they appear in the same episode of The Simpsons. This network is similar to the one constructed for film actors that allows us to determine <a href="http://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon" target="_blank">six degrees of Kevin Bacon</a>.<br />
<br />
The Simpsons co-appearances network can be constructed by parsing the episodes pages of <a href="http://simpsonswiki.com/" target="_blank">Wikisimpsons</a>. Mathematically speaking, the network is a graph. Each node of the graph represents a Simpsons character. An (undirected) edge connects each pair of nodes whose characters appear in the same episode. To each edge I add a weight; the number of episodes in which the pair of characters co-appear. I also label each node with the number of episodes in which its character appears.<br />
<br />
Having constructed the graph we can set about visualizing it. Visualizing graphs helps you understand the structure of a network. So the choice of graph-layout algorithm is critical. If you impose a hierarchical layout, you'll see hierarchies. If you impose a circular layout you'll see circles.<br />
<br />
For this reason I've used a <a href="http://en.wikipedia.org/wiki/Force-directed_graph_drawing" target="_blank">force-directed layout</a>, which attempts to position the nodes such that the distance between any pair of connected nodes is inversely proportional to the weight on the edge between them. This results in characters who co-appear often having their nodes positioned close together, while those that don't will have their nodes separated.<br />
<br />
To do this I used <a href="https://gephi.github.io/" target="_blank">Gephi</a> the "open source graph visualization platform". Gephi allows you to experiment with various layout algorithms and customize the appearance of your graph. You can easily apply different colour maps, labelling and rendering attributes to your graph's nodes and edges. Gephi has tools for filtering nodes and edges, and an arsenal of graph theoretic indices can be calculated.<br />
<br />
I constructed a co-appearances graph for <a href="http://simpsonswiki.com/wiki/Season_1" target="_blank">Season 1</a> of the Simpsons and loaded it into Gephi. I applied the following settings:<br />
<ul>
<li>Layout: <i>ForceAtlas 2</i></li>
<li>Node size and colour: number of episodes in which a character makes an appearance </li>
<li>Edge colour: number of episodes in which characters connected by the edge co-appearance</li>
</ul>
The resulting graph is shown below. High-resolution renderings are also available (<a href="https://gist.github.com/cpudney/6dfc60b2cf1d4d390e2e/raw/077ae8ef6c882cd8f81811695fbd3aea905dabca/season1v1-large.png" target="_blank">PNG</a>, <a href="https://gist.github.com/cpudney/6dfc60b2cf1d4d390e2e/raw/0786257246695422cfb2107fc6434e7b96df4274/season1v1.pdf" target="_blank">PDF</a>, <a href="https://gist.github.com/cpudney/6dfc60b2cf1d4d390e2e/raw/ff46d9a49b5e926d9d5e071787f75b98602ccb6b/season1v1.svg" target="_blank">SVG</a>).<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://gist.github.com/cpudney/6dfc60b2cf1d4d390e2e/raw/season1v1.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="640" src="https://gist.github.com/cpudney/6dfc60b2cf1d4d390e2e/raw/season1v1.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Graph of Simpsons characters co-appearances in Season 1.</td></tr>
</tbody></table>
<br />
The graph shows us several things. The "central" characters - Homer, Marge, Bart, Lisa and Maggie Simpson - form a cluster at the centre of the graph. They have the largest, darkest nodes because they appear in every episode of Season 1.<br />
<br />
Around this central cluster are positioned smaller, lighter nodes for characters who appear frequently but not in every episode; characters like Milhouse Van Houten, Moe Szyslac, Barney Gumble, Monty Burns and Waylon Smithers. Notice that Burns and Smithers, and Moe and Barney are positioned close together as they often appear in the same episodes.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTOXfid1wfVXJtOTv7EBnQ8OCM8_T2ADbnVZaNGzfbVPqL0tUqBEUllj6tqqMCfVG3awAqtr13jT3Q1IwmxXVNndxHl8ScxNN7TNiTOcC0FmbyAsm0dmsyn1hyQaQLjad5jwP2E3SID7q2/s1600/season1v1-centre.png" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTOXfid1wfVXJtOTv7EBnQ8OCM8_T2ADbnVZaNGzfbVPqL0tUqBEUllj6tqqMCfVG3awAqtr13jT3Q1IwmxXVNndxHl8ScxNN7TNiTOcC0FmbyAsm0dmsyn1hyQaQLjad5jwP2E3SID7q2/s1600/season1v1-centre.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The central cluster of the Simpsons co-appearance graph.</td></tr>
</tbody></table>
<br />
On the outer edges of the graph are clusters of characters who appear together in a single episode. Below we see the cluster (of minor characters) for <a href="http://simpsonswiki.com/wiki/The_Call_of_the_Simpsons" target="_blank">episode 7 "The Call of the Simpsons"</a>. Between these episode clusters are positioned characters who appear in two or three episodes.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhI_szVrKqv0Sc2P2FSzFXpfdoMaYnBXh2TKeD6WLr8PS_XFaOF-46DnA4HsnQEnmtqfBm7FfZZX0uXZxoSaZ1f5cmxrMFgVp3HfTTZ2Rnw3tcknovCRnsMHtLQGOMvt6jsrPVdEOiBMsRQ/s1600/season1v1-cluster.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhI_szVrKqv0Sc2P2FSzFXpfdoMaYnBXh2TKeD6WLr8PS_XFaOF-46DnA4HsnQEnmtqfBm7FfZZX0uXZxoSaZ1f5cmxrMFgVp3HfTTZ2Rnw3tcknovCRnsMHtLQGOMvt6jsrPVdEOiBMsRQ/s1600/season1v1-cluster.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Cluster of minor characters appearing in episode 7 "The Call of the Simpsons".</td></tr>
</tbody></table>
If you'd like to experiment with this graph you can <a href="https://gist.github.com/cpudney/6dfc60b2cf1d4d390e2e" target="_blank">download it from Github</a>.<br />
<br />
<div style="text-align: center;">
<span style="font-size: x-small;"><a href="http://creativecommons.org/licenses/by-sa/3.0/" rel="license"><img alt="Creative Commons License" src="https://i.creativecommons.org/l/by-sa/3.0/80x15.png" style="border-width: 0px;" /></a><br />This work is licensed under a <a href="http://creativecommons.org/licenses/by-sa/3.0/" rel="license">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>. </span></div>
Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-30216811953229937972014-06-19T17:55:00.002-07:002014-07-02T22:42:37.663-07:00Australian Federal Budget 2014/15: Changes to Public Service Staffing Levels Visualized Using a D3.js Zoomable Treemap<br />
A friend recently drew my attention to <a href="http://ausviz.com/wordpress/" target="_blank">Ausviz</a>, a site focussed on visualizations of Australian data, particularly data sets from <a href="http://data.gov.au/" target="_blank">data.gov.au</a>. One of the first Ausviz visualizations I looked at uses a force-directed graph to <a href="http://ausviz.com/wordpress/blog/2014/05/17/budget-2014-public-service-gainslosses/" target="_blank">visualize changes in public service staffing levels</a> arising from the 2014/15 Federal Budget. The graph represents the hierarchy of ministries and departments, with the size and colour of leaf nodes encoding the change in departmental headcounts.<br />
<br />
An alternative way of visualizing hierarchies is to use a treemap. The hierarchy is represented by a nested layout of rectangles. The size and colour of the rectangles is used to encode dimensions of the data.<br />
<br />
So, taking inspiration from the Ausviz visualization I implemented a treemap to visualize the same data. The layout of rectangles represents the hierarchy of Federal Government ministries and departments. Rectangle sizes encode the numbers of staff in each department (2013/14 or 2014/15). Rectangle colours encode the changes in staffing levels (absolute or relative). The colour scale ranges from red (staff decrease) through white (no change) to green (staff increase).<br />
<br />
The treemap is shown below. An interactive version can be found <a href="http://bl.ocks.org/cpudney/d372d3158b1dd82aaecd" target="_blank">here</a> (<a href="http://bl.ocks.org/cpudney/raw/d372d3158b1dd82aaecd/" target="_blank">fullscreen</a>). You will need a "modern" browser to use the interactive version, which supports the following operations:<br />
<ul>
<li>change the size encoding (2013/14 or 2014/15)</li>
<li>change the colour encoding (absolute or relative)</li>
<li>drill down into a ministry (click on a rectangle)</li>
<li>mouse over a rectangle to display a departmental tool-tip</li>
</ul>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyncAaDBxK92cKI9nwS4aenKJ6p6TT9v-moZNURNq-g3oc11IkyZMzgZEjnnDD50ggUbe8u_WShxEtlmk4Zhv6QFqjZduAuO_-du9yEPqNgU83JWh7Nez3dxHVoVZIMi58bSVquHoBjPxj/s1600/d3js+treemap.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyncAaDBxK92cKI9nwS4aenKJ6p6TT9v-moZNURNq-g3oc11IkyZMzgZEjnnDD50ggUbe8u_WShxEtlmk4Zhv6QFqjZduAuO_-du9yEPqNgU83JWh7Nez3dxHVoVZIMi58bSVquHoBjPxj/s1600/d3js+treemap.png" height="500" width="640" /></a></div>
<br />
<br />
<br />
<br />
The treemap allows us to quickly see where the biggest changes, both absolute and relative, are to occur:<br />
<ul>
<li>Size: <i>2013/14</i>; Change: <i>absolute </i>(we see the big winners and losers)</li>
<ul>
<li>Gain: Dept. Foreign Affairs & Trade - 1659, 42%</li>
<li>Gain: Dept. Prime Minister & Cabinet - 1543, 200%</li>
<li>Gain: Dept. Defence - 604, 1%</li>
<li>Loss: Dept. Employment, Education & Workplace Relations - 3740, 100%</li>
<li>Loss: Australian Taxation Office - 2954, 13%</li>
</ul>
<li>Size: <i>2014/15</i>; Change: <i>absolute </i>(we see the new, large departments and agencies)</li>
<ul>
<li>New: Dept. Employment - 1716 </li>
<li>New: Dept. Education - 1823</li>
<li>New: National Disability Insurance Agency - 798</li>
</ul>
<li><i>Size: <i>2013/14</i>; </i>Change<i>: relative </i>(we see shut down departments and agencies)</li>
<ul>
<li>Gone: AusAID - 1982 </li>
<li>Gone: Dept. Resources, Energy & Tourism - 655</li>
<li>Gone: Dept. Regional Australia, Local Govt. Arts & Sports - 482 </li>
<li>Gone: Health Workforce Australia - 140</li>
<li>Gone: Clean Energy Finance Corp. - 50</li>
<li>Gone: Wine Australia Corp. - 49</li>
<li>Gone: Australian National Preventative Health Agency - 40</li>
<li>Gone: Climate Change Authority - 35</li>
<li>Gone: Telecommunications Universal Service Management Agency - 17 </li>
<li>Gone: Grape & Wine R&D Corp. - 11 </li>
<li>Gone: Sugar Development R&D Corp. - 8</li>
</ul>
<li><i><i>Size: <i>2014/15</i>; </i></i>Change<i><i>: relative<i></i> </i></i>(we see the new, small departments and agencies)</li>
<ul>
<li>New: Australian Grape & Wine Authority - 55</li>
</ul>
</ul>
There are better ways of visualizing changes of this kind, e.g. a bump chart, sortable table, but the advantage of using a treemap is that it shows the structure of the public service.<br />
<br />
The treemap was implemented using <a href="http://d3js.org/" target="_blank">D3.js</a>, and borrowed heavily from a couple of excellent examples:<br />
<ul>
<li><a href="http://www.billdwhite.com/wordpress/2012/12/16/d3-treemap-with-title-headers/" target="_blank">Bill White's treemap with title headers</a></li>
<li><a href="http://bl.ocks.org/tgk/6044254">Thomas G. Kristensen's treemap with tooltips</a></li>
</ul>
Source data comes from <a href="http://data.gov.au/dataset/budget-2014-15-tables-and-data/resource/ed28dae5-4403-4a0d-a517-54b19c113db8">Budget Paper 4 Table 2.2 Average Staffing Table</a>. <b>Note </b>the many footnotes associated with this data.<br />
<br />
The source-code is available on <a href="https://gist.github.com/cpudney/d372d3158b1dd82aaecd" target="_blank">Github</a> and licensed under a <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">Creative Commons Attribution 4.0 International License</a>.Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-83365156649100122152014-03-28T00:28:00.003-07:002014-03-28T00:28:50.808-07:00A Distorted and Incomplete Picture<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4kaom8U0v1kowYb8MfI6Hu0mdTPFdcvQS6fDww5TTRACx-UtNORugofydhNcNwHkdWF_4uSxY_9i66CarND-1joa_6wgZWoYrVWOwZqsqWVJ0CwZg4jlW2AsqFzRnVAy6twDMRHcke-iK/s1600/081012-fc222.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4kaom8U0v1kowYb8MfI6Hu0mdTPFdcvQS6fDww5TTRACx-UtNORugofydhNcNwHkdWF_4uSxY_9i66CarND-1joa_6wgZWoYrVWOwZqsqWVJ0CwZg4jlW2AsqFzRnVAy6twDMRHcke-iK/s1600/081012-fc222.jpg" /></a></div>
When I saw Ben Goldacre's latest book <i>Bad Pharma: How Drug Companies Mislead Doctors and Harm Patients</i> on the new releases bookshelf of my local library, I borrowed it immediately. Not because I thought it would inform my data visualization work but because I really enjoyed Ben's previous book <i>Bad Science.</i><br />
<br />
So, I was pleased when I read <i>Bad Pharma</i> to find that it focuses on data, the raw material we work with when creating visualizations. I was also deeply disturbed by the book given that it details how modern evidence-based medicine is broken. Ben provides a useful summary of <i>Bad Pharma</i> in the book's introduction:<br />
<blockquote class="tr_bq">
Drugs are tested by the people who manufacture them, in poorly designed
trials, on hopelessly small numbers of weird, unrepresentative patients,
and analysed using techniques which are flawed by design, in such a way
that they exaggerate the benefits of treatments. Unsurprisingly, these
trials tend to produce results that favour the manufacturer. When trials
throw up results that companies don't like, they are perfectly entitled
to hide them from doctors and patients, so we only ever see a distorted
picture of any drug's true effects. Regulators see most of the trial
data, but only from early on in a drug's life, and even then they don't
give this data to doctors or patients, or even to other parts of
government. This distorted evidence is then communicated and applied in a
distorted fashion. In their forty years of practice after leaving
medical school, doctors hear about what works through ad hoc oral
traditions, from sales reps, colleagues or journals. But those
colleagues can be in the pay of drug companies – often undisclosed – and
the journals are too. And so are the patient groups. And finally,
academic papers, which everyone thinks of as objective, are often
covertly planned and written by people who work directly for the
companies, without disclosure. Sometimes whole academic journals are
even owned outright by one drug company. Aside from all this, for
several of the most important and enduring problems in medicine, we have
no idea what the best treatment is, because it's not in anyone's
financial interest to conduct any trials at all. These are ongoing
problems, and although people have claimed to fix many of them, for the
most part they have failed; so all these problems persist, but worse
than ever, because now people can pretend that everything is fine after
all.</blockquote>
But enough about medicine. What makes <i>Bad Pharma</i> interesting to a data visualization practitioner are not charts and graphs (there are only a few in the book) it's the discussion of data. The book's first chapter <i>Missing Data</i> describes how drug trials performed by pharmaceutical companies overwhelmingly produce results that are favourable to the companies. Goldacre argues that this arises for several reasons<br />
<ul>
<li>flawed experimental design: trials are designed in ways likely to produce a favourable outcome</li>
<li>flawed data analysis: see my post on <a href="http://www.vislives.com/2014/02/statistics-done-wrong.html" target="_blank">Alex Reinhart's Statistics Done Wrong</a></li>
<li>publication bias: trials that produce unfavourable outcomes are
simply not published, skewing published data towards favourable results</li>
</ul>
This reminds us to be circumspect about the data we visualize. We should ask:<br />
<ul>
<li>How was the data collected?</li>
<li>How has the data been transformed or processed?</li>
<li>Is the data complete?</li>
</ul>
The answers to these questions are metadata that we need to communicate as part of any visualization we create. Without it, we risk painting a distorted and incomplete picture of the data we are visualizing.Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-14697038900472699532014-03-21T02:22:00.003-07:002014-03-21T02:30:46.812-07:00Lyra: the Interactive Visualization Design EnvironmentI recently spent some time using Lyra an "interactive visualization design environment" that allows you to create visualizations without writing a single line of code. It's being developed by<span class="author"><a class="at" href="http://arvindsatya.com/" target="_blank"> Arvind Satyanarayan</a></span>, <span class="author"><a href="http://kanitw.yellowpigz.com/" target="_blank">Kanit “Ham” Wongsuphasawat</a></span> and <span class="author"><a class="at" href="http://homes.cs.washington.edu/%7Ejheer/" target="_blank">Jeffrey Heer</a> (think Prefuse, Protovis, D3, Vega) at the University of Washington's <a href="http://idl.cs.washington.edu/" target="_blank">Interactive Data Lab</a>.</span><br />
<br />
Lyra is a bit like other interactive visualization design tools such as Tableau and Spotfire. However, under the hood it's powered by D3 (like <a href="http://plot.ly/" target="_blank">Plot.ly</a>). Now, I enjoy coding visualizations directly using <a href="http://www.vislives.com/search/label/d3.js" target="_blank">D3</a> but I realise not everyone shares my enthusiasm or has the time to learn D3. Lyra gives you access to the expressiveness of D3 without requiring you to learn its API (or Javascript).<br />
<br />
The Lyra application is shown below and consists of three panels:<br />
<ul>
<li>the left-hand panel manages <i>Data Pipelines</i>, where you define and transform (sort, group, filter, window, apply formula) data sources</li>
<li>the
centre panel displays your visualization, where you interactively select
and modify visualization elements: marks (rectangles, symbols, arcs,
areas, lines and text), axes and layers</li>
<li>the right-hand panel provides access to attributes of the elements in your visualization</li>
</ul>
Once you've created a visualization you can export it as an image (PNG or SVG) or a <a href="http://trifacta.github.io/vega/" target="_blank">Vega</a> specification. <br />
<br />
<div style="text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5pElDMNUjjIpR2XnEDFWyz0q9Z3DQfrvULrRYL7BEXVsdK29KFLVFSpqRy9xOJCmb6jq_GtKKSYcKpogSigq5cqvpc2fQHNy7QPtA9mMw57U_orxrcp83ix1hUR_yHKP438Y_LWG5RijG/s1600/lyra.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh5pElDMNUjjIpR2XnEDFWyz0q9Z3DQfrvULrRYL7BEXVsdK29KFLVFSpqRy9xOJCmb6jq_GtKKSYcKpogSigq5cqvpc2fQHNy7QPtA9mMw57U_orxrcp83ix1hUR_yHKP438Y_LWG5RijG/s1600/lyra.png" height="305" width="640" /></a></div>
<br />
<br />
<br />
<br />
If you're familiar with D3 then you'll recognise some of its idioms in Lyra. For example, D3's data-binding mechanism is implemented by dragging-and-dropping data variables onto the attributes of visual elements.<br />
<br />
If you want to try Lyra then you have several options:<br />
<ul>
<li>visit the <a href="http://idl.cs.washington.edu/projects/lyra/" target="_blank">Lyra home page</a> and watch the introductory video</li>
<li><a href="http://idl.cs.washington.edu/projects/lyra/app/" target="_blank">run Lyra</a> in your Web browser - you'll need a "modern" browser</li>
<li>read Jim Vallandingham's excellent <a href="http://vallandingham.me/make_a_barchart_with_lyra.html" target="_blank">introductory tutorial</a></li>
<li><a href="https://github.com/uwdata/lyra" target="_blank">fork Lyra</a> on Github to run your own instance or develop/improve the code</li>
</ul>
Bear in mind that Lyra is alphaware. I did encounter a few issues, e.g. saving and recovering work didn't appear to work properly. The authors are interested in constructive feedback.Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-44564263490492261102014-02-20T23:57:00.001-08:002014-02-25T20:55:31.015-08:00Statistics Done Wrong - Alex ReinhartI've just finished reading Alex Reinhart's excellent <a href="http://www.statisticsdonewrong.com/" target="_blank">Statistics Done Wrong</a>; a guided tour of common statistical fallacies and misconceptions. It covers <i>p</i>-values, statistical power, statistical significance, pseudo-replication and stopping rules.<br />
<br />
<i>Statistics Done Wrong</i> is written with all scientists in mind, assuming no knowledge of statistical methods. It's essential reading for all data scientists, including data visualization practitioners. Even though we're not data analysts we need at least a basic level of statistical literacy.<br />
<br />
The problems Alex describes are rife in the scientific peer-reviewed literature. By coincidence I'm reading Ben Goldacre's <a href="https://en.wikipedia.org/wiki/Bad_Pharma" target="_blank">Bad Pharma,</a> which focusses on how clinical trials data is distorted by the pharmaceutical industry. Many of the issues Alex raises are seen in practice in <i>Bad Pharma</i>.<br />
<br />
<i>Statistics Done Wrong</i> concludes with <i>What Can Be Done?</i> Here I quote the <a href="http://www.statisticsdonewrong.com/what-next.html#your-job" target="_blank">Your Job section</a>:<br />
<blockquote class="tr_bq">
Your task can be expressed in four simple steps:<br />
<ol class="arabic simple">
<li>Read a statistics textbook or take a good statistics course. Practice.</li>
<li>Plan your data analyses carefully and deliberately, avoiding the
misconceptions and errors you have learned.</li>
<li>When you find common errors in the scientific literature – such as a simple
misinterpretation of <i>p</i> values – hit the perpetrator over the head with your
statistics textbook. It’s therapeutic.</li>
<li>Press for change in scientific education and publishing. It’s our
research. Let’s not screw it up.</li>
</ol>
</blockquote>
<br />
<i>Statistics Done Wrong</i> is on-line, free and should take you no more than an hour to read. Once you've read it share it with your data scientist colleagues. And if you want to learn more about data analysis then I recommend Coursera's Data Analysis MOOC - read my account of it <a href="http://www.vislives.com/search/label/coursera" target="_blank">here</a>.Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-70970349452336299372014-01-26T17:13:00.004-08:002014-01-26T17:21:28.974-08:00Review: Infoactive (beta)<a href="http://infoactive.co/" target="_blank">Infoactive</a> is an on-line tool for creating infographics, and is similar to Infogram, Easel.ly and Venngage - see my earlier <a href="http://www.vislives.com/2012/07/pushbutton-infographics.html" target="_blank">review</a> of these offerings.<br />
<br />
Infoactive garnered considerable attention from the visualization community as a result of its highly successful <a href="https://www.kickstarter.com/projects/trinachi/infoactive-drop-live-data-into-interactive-infogra" target="_blank">Kickstarter campaign</a>, which raised $55,109 (more that quadrupling its $12,000 target) from 1,448 backers. The promotional video clip is shown below.<br />
<br />
<iframe frameborder="0" height="480" scrolling="no" src="https://www.kickstarter.com/projects/trinachi/infoactive-drop-live-data-into-interactive-infogra/widget/video.html" width="640"> </iframe>
<br />
<br />
I was one of those backers, which granted me early access to the Infoactive beta program.What follows are my impressions of the tool after a few hours experimenting with it.<br />
<br />
At the outset, it's important to stress that at the time of writing Infoactive is in beta. I did encounter several problems that made working with the tool difficult. So, if you're expecting to start using Infoactive and be immediately productive then you're going to be somewhat disappointed.<br />
<br />
With that out of the way let's focus on what you can do with Infoactive. The tool is very easy to use. A panel on the left-hand side of the page holds a palette of graphical elements that you can drag-and-drop onto your infographic canvas.<br />
<br />
Two features that distinguish Infoactive from its rivals are<br />
<ul>
<li>Connect to live data sources: you can provide the URL of a public Google Drive spreadsheet to serve as your data source. If the data changes then so too does the Infographic connected to it.</li>
<li>Infographics created with Infoactive are interactive: this includes filtering and details on mouse-over events.</li>
</ul>
Below is a sample infographic created using Infoactive. I would have preferred to include my own example but due to some of the bugs I encountered I'm including the example created by the Infoactive team:<br />
<br />
<iframe allowfullscreen="" frameborder="0" height="1500" src="https://infoactive.co/plays/3908" width="600"></iframe>
<br />
<h2>
Data</h2>
Two types of data source can be used: public Google Drive spreadsheets or CSV files uploaded to Infoactive. You can specify multiple data sources for each infographic, with each chart connected to a specific source. An editor is provided that allows you to modify cell values in each data source.<br />
<h2>
Charts</h2>
Several chart types are provided:<br />
<ul>
<li>Line and area charts</li>
<li>Horizontal and vertical column charts</li>
<li>Pie and donut charts</li>
<li>Gauges </li>
<li>Maps</li>
</ul>
You can drag-and-drop charts into your infographic. Once in place, you can configure various attributes of the chart such as its title, data set and the data columns assigned to each axis.<br />
<h2>
Filters</h2>
Filters are a useful interactive element. When placed in an infographic they allow the user to focus on a subset of the data defined by a categorical data column. Charts associated with the filter are updated in response to the user's selections. You can configure the data source, data column and layout of each filter.<br />
<h2>
Other</h2>
A variety of text blocks (header, sub-header, text, logo) is provided. Two default colour themes (classic, earth) are available - you can also create a custom colour palette.<br />
<h2>
Publishing </h2>
Once you've created an infographic you can publish it. This provides you with a URL which displays the infographic on its own page (for sharing on social media), or an iframe for embedding in a web page.<br />
<h2>
Conclusion</h2>
It's early days for Infoactive. Many people have pledged support, so expectations are high. Similar tools are available but the live and interactive aspect of
Infoactive infographics differentiate them from the others. Infoactive is a promising tool that is easy to use but work is needed to iron out the bugs.Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-74407205768620366282013-09-11T21:55:00.001-07:002013-09-23T22:07:30.106-07:00Over the RainbowAbout a year ago Visual.ly posted on their blog an <a href="http://blog.visual.ly/subtleties-of-color-connecting-color-to-meaning/" target="_blank">open letter</a> to NASA asking them to avoid using the rainbow (spectral) colour scale for representing continuous data. The letter listed five problems with the Rainbow colour scale:<br />
<ul>
<li>Colour-blind people can't perceive the scale properly</li>
<li>Divisions between hues produce false visual artefacts</li>
<li>The order of the hues has no inherent meaning</li>
<li>Yellow appears much brighter than the other hues</li>
<li>It is more difficult to see detail than with scales that vary in brightness</li>
</ul>
I'm ashamed to admit that I've used the rainbow colour scale in my own work, often in response to pressure from users who are used to seeing the scale used elsewhere and want to apply it to their own data.<br />
<br />
NASA has responded to Visual.ly's open letter in the form of <a href="http://earthobservatory.nasa.gov/blogs/elegantfigures/" target="_blank">Robert Simmon</a> from the Earth Observatory. Robert has been on a similar crusade within NASA to eradicate use of the rainbow colour scale, apparently with some success.<br />
<br />
Robert's response has been a series of six blog posts on how to use colour for data visualization. It is the best tutorial I've come across on this subject. The posts are:<br />
<ul>
<li><a href="http://blog.visual.ly/subtleties-of-color/">Subtleties of Color</a> </li>
<li><a href="http://blog.visual.ly/subtleties-of-color-the-perfect-palette/">Subtleties of Color: The “Perfect” Palette</a> </li>
<li><a href="http://blog.visual.ly/subtleties-of-color-different-types-of-data-require-different-color-schemes/">Subtleties of Color: Different Data, Different Colors</a> </li>
<li><a href="http://blog.visual.ly/subtleties-of-color-connecting-color-to-meaning/">Subtleties of Color: Connecting Color to Meaning</a> </li>
<li><a href="http://blog.visual.ly/subtleties-of-color-tools-and-techniques/">Subtleties of Color: Tools and Techniques</a> </li>
<li><a href="http://blog.visual.ly/subtleties-of-color-references-and-resources-for-visualization-professionals/" target="_blank">Subtleties of Color: References and Resources</a></li>
</ul>
The use of colour is a vital but under-appreciated aspect of data visualization. It's all too easy to use the default colour scales provided by the tools we use to create visualizations. Unfortunately, these defaults are often inadequate. Rather than using the defaults, spend some time thinking about how you are using colour to represent data. If you refer to Robert Simmon's "Subtleties of Color" tutorial when doing so then you can't go wrong.<br />
<br />
[ 2013-09-24 ] Robert posted this <a href="http://blog.visual.ly/subtleties-of-color-addendum/" target="_blank">addendum</a>. Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-81934448998263644732013-07-23T18:01:00.000-07:002013-07-23T18:05:41.583-07:00Mobile VisualizationOne of my favourite podcasts is <a href="http://datastori.es/" target="_blank">Data Stories</a>. Who'd have thought a purely audio presentation of a visual subject would work? But it does, mostly due to its two charming hosts: <a href="http://moritz.stefaner.eu/" target="_blank">Moritz Stefaner</a> and <a href="http://enrico.bertini.me/" target="_blank">Enrico Bertini</a>, and the expert guests they interview. <br />
<br />
The guest on <a href="http://datastori.es/data-stories-25-mobile-touch-vis/" target="_blank">episode #25</a> was <a href="http://do.minik.us/blog" target="_blank">Dominikus Baur</a>, whose speciality is delivering visualizations on mobile, touch-based devices. Dominikus is part of the team that created <a href="http://do.minik.us/projects/touchwave" target="_blank">Touchwave</a>, an iOS toolkit for multi-touch interaction with stacked charts.<br />
<br />
The podcast is well worth listening to if you're interested in developing visualizations for mobile devices. The guys discuss the challenges and opportunities presented by mobile platforms.<br />
<br />
<iframe frameborder="0" height="24" scrolling="no" src="http://datastori.es/?powerpress_embed=395-mp3&powerpress_player=default" width="300"></iframe>
<br />
<br />
Small screens and limited processing power are obvious challenges. The latter motivated the choice of a native iOS implemention of Touchwave rather than a platform-neutral implementation based on HTML5/Javascript.<br />
<br />
Touch-based user interfaces, especially, multi-touch represent an opportunity for new and interesting ways of interacting with visualizations, compared with the traditional keyboard and pointer interfaces used with desktop and notebook PCs.<br />
<br />
Dominikus mentioned the support for mobile devices provided by <a href="http://www.tableausoftware.com/new-features/drag-and-drop-editing" target="_blank">Tableau</a>. I know that other visualization products, including <a href="http://spotfire.tibco.com/en/discover-spotfire/why-spotfire/universal-adaptability.aspx" target="_blank">Spotfire</a>, <a href="http://www.panopticon.com/Mobile-Dashboards" target="_blank">Panopticon</a>, <a href="http://www.qlikview.com/au/explore/products/qv-for-mobile" target="_blank">QlikView</a> and <a href="http://www.dundas.com/dashboard/features/mobile/" target="_blank">Dundas</a>, support deployment of visualization on mobile devices. How well these implementations work I can't say as I've not used them (please leave a comment if you have some experience).<br />
<br />
<a href="http://www.vislives.com/search/label/d3.js" target="_blank">My own work with D3.js</a> performs poorly on mobile devices. I developed these visualizations with desktop PC users in mind (large screens, pointer interfaces) They won't even load on my Android phone. On my Android tablet they'll load but performance is sluggish and interaction is awkward. In time, I expect the former problem will be resolved as the performance of mobile processors improves. However, the interaction problems will remain.<br />
<br />
There is a distinct lag between the adoption of mobile devices and the development of data visualization interfaces that work effectively on them. There is a clear need for need new techniques, such as those developed by Dominikus and his colleagues, if we're to provide interactive visualizations that work effectively on mobile devices.Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-2254432682215579312013-03-29T02:14:00.000-07:002013-04-12T00:26:01.460-07:00Coursera Data Analysis MOOC: Wrap-UpThe <a href="https://www.coursera.org/course/dataanalysis" target="_blank">Coursera Data Analysis MOOC</a> has concluded. You can read my earlier posts on the course (<a href="http://www.vislives.com/2013/01/coursera-data-analysis-mooc-first.html" target="_blank">first impressions</a>, <a href="http://www.vislives.com/2013/02/courser-data-analysis-mooc-half-time.html" target="_blank">half-time</a>, <a href="http://www.vislives.com/2013/03/coursera-data-analysis-graduation.html" target="_blank">graduation</a>).<br />
<br />
If you're interested in the course content then it's been made public. The video lectures have been published to <a href="https://www.youtube.com/user/jtleek2007/videos?sort=dd&tag_id=UC8xNPQ-3a5t9uMU7Vah-jWA.3.coursera&view=46" target="_blank">Prof. Jeff Leek's YouTube channel</a>, and the slide decks can be <a href="https://github.com/jtleek/dataanalysis" target="_blank">downloaded from GitHub</a>.<br />
<br />
Jeff was interviewed by Roger Peng on the <a href="http://simplystatistics.org/" target="_blank">Simply Statistics</a> podcast, in which he reflects on his experience of the MOOC.<br />
<br />
<iframe width="560" height="315" src="http://www.youtube.com/embed/qO2xUvogyJE" frameborder="0" allowfullscreen></iframe>
<br />
Jeff shared some interesting data regarding the course:<br />
<ul>
<li>102,000 students enrolled</li>
<li>51,000 watched lectures</li>
<li>20,000 answered quizzes</li>
<li>5,500 completed & graded assignments</li>
</ul>
Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-5785737325386405302013-03-15T03:10:00.002-07:002013-03-29T02:00:18.696-07:00Coursera Data Analysis MOOC: GraduationI've completed <a href="https://www.coursera.org/course/dataanalysis" target="_blank">Coursera's 2013 Data Analysis</a> course. You can read my earlier posts on the course <a href="http://www.vislives.com/2013/01/coursera-data-analysis-mooc-first.html" target="_blank">here</a> and <a href="http://www.vislives.com/2013/02/courser-data-analysis-mooc-half-time.html" target="_blank">here</a>.<br />
<br />
I was initially motivated to enrol so I could learn about Massive Open Online Courses (MOOCs). Once the course started I realised it would take significant effort on my part to see it through. I could easily have given up but decided to invest the time needed to complete the course.<br />
<br />
I'm glad I did because I gained the following:<br />
<ul>
<li>knowledge of how Coursera works</li>
<li>a broad overview of the statistical techniques that can be used for data analysis</li>
<li>improved ability to use R - a tool I often use for work</li>
</ul>
<b>How Coursera Works</b><br />
I was out with friends one evening mid-way through the course, and mentioned I'd enrolled with Coursera. I said that the course was free, and was asked "What is Coursera's business model?" I didn't know at the time but I've since read that various revenue streams are being considered:<br />
<ul>
<li>certification fees</li>
<li>introducing students to employers and recruiters</li>
<li>tutoring</li>
<li>sponsorship</li>
<li>tuition fees</li>
</ul>
According to <a href="https://en.wikipedia.org/wiki/Coursera#Business_model" target="_blank">Wikipedia</a>, Coursera was not generating revenue as of March 2012.<br />
<br />
The mechanics of a Coursera course are similar to those of a college or university course with the difference being that it takes place online and thousands of students are enrolled.<br />
<ul>
<li>lectures: content is presented as video lectures.</li>
<li>quizzes: regular online, multiple-choice tests must be completed.</li>
<li>assignments: assignments are submitted online. The lecturer can't assess them all, so students mark their peers' work.</li>
<li>getting help: the lecturer can't answer all questions so students post queries to an online forum. Students help each other out with answers, and each course has a handful of knowledgeable TAs who monitor the forum and post replies. You can vote up a post - those with the most votes are handled with the highest priority.</li>
<li>wiki: each course has a wiki to which useful course-related information can be added.</li>
<li>meet ups: if you want to take things off-line, <a href="http://www.meetup.com/" target="_blank">MeetUps</a> can be organised to discuss the course face-to-face with fellow Courserians</li>
</ul>
<b>Data Analysis Course Content</b> <br />
Ultimately, the quality of a course, whether traditional or MOOC, hinges on its content. A friend of mine, who is a university maths lecturer, enrolled in a Coursera programming language course but found the content so poor he gave up.<br />
<br />
Overall, the Data Analysis course was good quality. It was the first time <a href="http://www.biostat.jhsph.edu/~jleek/" target="_blank">Prof. Leek</a> had given the course so there were a few mistakes in the course material. These were picked up by students, who posted corrections to the online forum.<br />
<br />
There were also logistical difficulties for students in some time zones. To accommodate them deadlines for quizzes and assignments were tweaked.<br />
<br />
I expect the course will be given again, so future enrolees will enjoy the benefits of the road-testing performed by my cohort of students.<br />
<br />
Data analysis is a very broad subject, so it was difficult for Prof. Leek to provide a detailed presentation of the techniques covered in the eight-week course. Instead, a basic introduction was presented for each technique, with examples of how to perform the analysis using R. Links to further resources were provided for those students with the time and inclination to delve deeper into the underlying mathematics. This was something I didn't have time for but at least I now know where to start.<br />
<br />
<b>Conclusion</b> <br />
Coursera offers <a href="https://www.coursera.org/courses" target="_blank">a broad range of courses</a>, and then there are courses offered by <a href="http://www.mooc-list.com/" target="_blank">others</a>. Having completed my first Coursera MOOC I'm tempted to enrol in another but they do require a significant investment of time and effort. For now I'm content to consolidate what I've learned and wait for something to come along that piques my interest sufficiently for me to put in the effort required.Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-47424668477373715552013-02-22T02:10:00.000-08:002013-02-22T02:17:01.006-08:00Coursera Data Analysis MOOC: Half-Time EntertainmentIt's week five of the Coursera Data Analysis MOOC, and it's been a busy time since the course commenced (see <a href="http://www.vislives.com/2013/01/coursera-data-analysis-mooc-first.html" target="_blank">my first impressions</a>). I've just completed the weekly quiz so have time to come up for air and post a progress report.<br />
<br />
The course is similar in many respects to my time at uni: I've been late to lectures, and had to scramble to complete tests and assignments. As I mentioned previously, one of my main motivations was to learn about MOOCs. So, I wasn't too worried when by the middle of week one I hadn't been spoon-fed course material. I'd expected a flurry of emails with course information but my inbox was quiet. In fact, I needed to actually <i>visit</i> the Coursera Data Analysis Web site to attend class. By the time I did I found the course well under way, and I had a lot of catching up to do.<br />
<br />
I also realised that, just like uni, turning up wasn't going to be enough; I needed to invest serious time in understanding what was being taught and applying it to tests and assignments. So, I pulled my finger out and put aside some other projects to clear time each week to devote to the course. <br />
<br />
Having a weekly quiz with a hard deadline has been a useful motivator. It would have been easy to chuck it in - after all, enrollment is free - or let things slip until I had more time. With the quiz deadline I have a weekly goal that keeps me working on the course each day.<br />
<br />
I've just completed the first assignment. It was an interesting project focussed on a data set from the <a href="https://www.lendingclub.com/" target="_blank">Lending Club</a>; a peer-to-peer loans service. We were given two weeks to submit our work. Following this we had a week to mark at least four of our peers' assignments (failure to do so applies a 20% penalty to your own assignment). We were provided with a simple assessment template to guide us through marking.<br />
<br />
This is the first time Coursera has presented the Data Analysis course, and there have been a few hiccups along the way. Lecture notes included a few typos, scheduling of deadlines needed to be fine-tuned, and the requirements of the assignments were changed due to security issues (running a stranger's R code is inherently risky).<br />
<br />
Many of the changes have come about from feedback via the course forum. I've not had much time to participate in the forum other than occasionally scanning the top-voted posts.<br />
<br />
I've found the course material challenging and rewarding. It's clear that data analysis requires a strong grounding in statistics. Prof. Leek has provided us with a tool kit for data analysis: techniques and how to apply them using R. However, an explanation of the underlying mathematics is not covered (the course is only eight weeks). Prof. Leek has provided links to further resources that provide this background information but I haven't had time to delve into this material.<br />
<br />
That being said, I am becoming more proficient with R, which is useful in my day-to-day work. And I have gained a better understanding of the techniques available to me for data analysis work.<br />
<br />
I'll post another update at the end of the course in March.Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-24623077859453965642013-02-01T02:26:00.000-08:002013-02-22T04:36:25.301-08:00Lines vs. Bars for Categorical DataI recently commented on a thread started by Joey Cherdarchuk in the <a href="http://www.linkedin.com/groups?home=&gid=2244682&trk=anet_ug_hm" target="_blank">LinkedIn Data Visualization group</a>. The thread discusses Joey's reworking of an infographic about social media demographics. Joey used <i>diverging stacked bar charts</i> to significantly improve upon the original, which used pie charts. You can read Joey's blog post in full <a href="http://darkhorseanalytics.com/blog/diverging-stacked-bars/?goback=.gde_2244682_member_206622895" target="_blank">here</a>.<br />
<br />
I suggested an alternative would be to use a simple line chart. This is a technique often advocated by Kaiser Fung on his excellent Junk Charts blog. Here's an example of <a href="http://junkcharts.typepad.com/junk_charts/2012/03/guess-which-day-i-made-this-chart.html" target="_blank">his approach</a>. Many people react negatively to this technique as you can see in the comments section of Kaiser's post. Here's his response:<br />
<blockquote class="tr_bq">
<i>You won't
be the only reader to feel this way. Over the years, I have had
complaints from readers about lines connecting categorical data every
time I put up such a chart. Here's my reasoning: follow your eyes as you
read a dot plot, you are visually tracing the lines that I have drawn,
why not just draw the lines?</i></blockquote>
I happen to agree with Kaiser; using lines helps tie together the separate data points so you can more easily see trends and make comparisons.<br />
<br />
I applied this treatment to the social demographics data from the original infographic. You can see the results below (interactive version <a href="https://docs.google.com/spreadsheet/pub?key=0AtHbnCtagetZdFFiRnVlVUEybVBWVzZvdE9wdDZBUVE&gid=4" target="_blank">here</a>):<br />
<br />
<img src="https://docs.google.com/spreadsheet/oimg?key=0AtHbnCtagetZdFFiRnVlVUEybVBWVzZvdE9wdDZBUVE&oid=2&zx=lupv2k2dk88" />
<br />
<br />
This approach certainly has its merits. You can clearly see that for most social media platforms, participation rates increase with age. Google+ is the obvious exception and the trend for Reddit is flat. As you'd expect, the trend is most stark for LinkedIn; the professional network.<br />
<br />
The <a href="https://docs.google.com/spreadsheet/pub?key=0AtHbnCtagetZdFFiRnVlVUEybVBWVzZvdE9wdDZBUVE&gid=4" target="_blank">interactive version</a> also has examples of the data plotted using point charts and bar charts (stacked and clustered). None of which I feel work as well as the simple line chart. For example, here's a clustered bar chart.<br />
<br />
<img src="https://docs.google.com/spreadsheet/oimg?key=0AtHbnCtagetZdFFiRnVlVUEybVBWVzZvdE9wdDZBUVE&oid=4&zx=uk0vedybs1wn" />
<br />
<br />
I think it's important not to reflexively rule out line charts when dealing with categorical data as the technique can yield useful insights.<br />
<br />
<b>Update</b> (2013-02-22)<br />
<br />
During further discussion on the LinkedIn Data Visualization group, <a class="commenter" href="http://www.linkedin.com/groups?viewMemberFeed=&gid=2244682&memberID=65844814" target="_blank" title="See this member's activity">Bill Droogendyk</a> referenced an excellent article on the subject of visualizing quantitative data by one of my favourite viusualization thought leaders, Steven Few. The article entitiled "<a href="http://www.perceptualedge.com/articles/dmreview/quant_vs_cat_data.pdf" target="_blank">Quantitative vs. Categorical Data: A Difference Worth Knowing</a>" discusses the different types of categorical data:<br />
<ul>
<li>nominal</li>
<li>ordinal</li>
<li>interval</li>
</ul>
Using Few's
nomenclature, the <i>Age</i> axis used in the charts above is an interval scale, for which Few recommends
line (and bar) charts. <a href="http://junkcharts.typepad.com/junk_charts/2012/03/guess-which-day-i-made-this-chart.html" target="_blank">Kaiser Fung's example</a> uses an ordinal scale. At first glance some interpret it as nominal but fail to notice the following treatment:<br />
<blockquote class="tr_bq">
<i>I sorted the schools by the ratio of three-pointers to midrange jump shots.</i></blockquote>
<br />
By ranking the schools, the scale Fung uses is ordinal. Now here is where Fung and Few differ. Few advises against using line charts with ordinal scales, whereas Fung does so <a href="http://junkcharts.typepad.com/junk_charts/line_chart/" target="_blank">quite often</a>.<br />
<br />
I sit on the fence: I reckon it's worth considering a line chart for
categorical data (interval & ordinal) and seeing for yourself.Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-75938483001246564742013-01-30T17:15:00.000-08:002013-02-22T02:10:42.621-08:00Coursera Data Analysis MOOC: First ImpressionsOn the spur of the moment I decided to enrol in Coursera's <a href="https://www.coursera.org/course/dataanalysis" target="_blank">Data Analysis course</a>. I've been curious about MOOCs (massive open on-line courses) for some time, so when I came across this one, I decided it was time to find out more. Plus, the course topic is well-suited to the kind of work I do.<br />
<br />
The course is given by Jeff Leek, an Assistant Professor in Biostatistics from the Johns Hopkins Bloomberg School of Public Health. Jeff's introductory video is shown below.<br />
<br />
<iframe allowfullscreen="" frameborder="0" height="315" src="http://www.youtube.com/embed/-lutj1vrPwQ" width="560"></iframe>
<br />
The course is run over eight weeks and is delivered as a set of video lectures. Topics covered include:<br />
<ul>
<li>The structure of a data analysis (steps in the process, knowing when to quit, etc.)</li>
<li>Types of data (census, designed studies, randomized trials)</li>
<li>Types of data analysis questions (exploratory, inferential, predictive, etc.)</li>
<li>How to write up a data analysis (compositional style, reproducibility, etc.)</li>
<li>Obtaining data from the Web (through downloads mostly)</li>
<li>Loading data into R from different file types</li>
<li>Plotting data for exploratory purposes (boxplots, scatterplots, etc.)</li>
<li>Exploratory statistical models (clustering)</li>
<li>Statistical models for inference (linear models, basic confidence intervals/hypothesis testing)</li>
<li>Basic model checking (primarily visually)</li>
<li>The prediction process</li>
<li>Study design for prediction</li>
<li>Cross-validation</li>
<li>A couple of simple prediction models</li>
<li>Basics of simulation for evaluating models</li>
<li>Ways you can fool yourself and how to avoid them (confounding, multiple testing, etc.)</li>
</ul>
Each lecture can be viewed in your Web browser or downloaded (MP4) for off-line viewing. The lectures are slide presentations with audio of the lecturer explaining the content. You can download the slides (PDF) and transcripts if you prefer.<br />
<br />
A 10-question quiz must be completed by the end of each week. It has hard and soft deadlines. If you miss the soft deadline you can still submit answers before the hard deadline but a penalty is applied to your score. You can attempt each quiz four times.<br />
<br />
Two peer assignments must be completed; one in week 3 (due at the end of week 4) the other in week 6 (due at the end of week 7). The assignments are graded by your student peers, and you must grade at least four peer assignments to avoid a 20% penalty. Your grade is based on the median of the grades you receive from your peers.<br />
<br />
An interesting aspect of the course is the forum, to which students can post questions. Prof. Leek obviously can't answer all the questions, as the course has 100,000 students. So, you can vote on questions and the lecturer responds to the top few. Students can help each other out by responding to questions too.<br />
<br />
The course requires a working knowledge of R. I've been using R increasingly as part of my day-to-day work so am comfortable with this. Some (optional) background lectures on R are provided in the course material along with links to other resources.<br />
<br />
Successful completion of the course conveys no official qualification or accreditation. I've enrolled purely for my own edification; to learn about MOOCs like coursera, and sharpen my data analysis skills.<br />
<br />
I'll post follow ups as the course progresses.Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-16240698742276993622013-01-01T09:03:00.001-08:002013-01-01T09:03:39.586-08:00Comparison of Australian Car ValuesI was recently in the market for new wheels, and so spending a bit of time researching the Australian car market at <a href="http://www.carsales.com.au/">carsales.com.au</a> and <a href="http://www.redbook.com.au/">RedBook.com.au</a>. It got me thinking about the rates of depreciation in value of different makes of car. So, I set about creating a chart that would help me visualize this kind of information.<br />
<br />
The result is the <a href="http://bl.ocks.org/d/2913621/" target="_blank">interactive line chart</a>
shown below. You can use the interactive version of the visualization
if you have a modern, standards-compliant browser (Firefox, Chrome,
Opera, Safari, etc.) or you can try <a href="http://www.google.com/chromeframe">Chrome Frame</a> (Internet Explorer).<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVhvwTVquhrraFiozRUqkRJubniOgHJvbPzYUkKz0eBFxuO0cmJFK1wx-x0nF7-GwR7Nb_wXCcT5hIAQGDLqEH_82qNuseLtqZqnAI5Drqr06XZ_Qt2gfhtjWYjlu9sSRcrLt7NuNw-r8q/s1600/car_values_chart.png" style="margin-left: auto; margin-right: auto;"><img border="0" height="269" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVhvwTVquhrraFiozRUqkRJubniOgHJvbPzYUkKz0eBFxuO0cmJFK1wx-x0nF7-GwR7Nb_wXCcT5hIAQGDLqEH_82qNuseLtqZqnAI5Drqr06XZ_Qt2gfhtjWYjlu9sSRcrLt7NuNw-r8q/s1600/car_values_chart.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Resale value (%) for several popular models of Australian car.</td></tr>
</tbody></table>
<br />
<br />
<b>Interaction</b><br />
The chart plots a line for each of several popular models of car. The lines can be made to represent several different values:<br />
<ul>
<li>Sticker price: price when new</li>
<li>Resale value: price when selling the car on the private market</li>
<li>Resale value (%): the resale value as a percentage of the sticker price</li>
</ul>
You can also highlight individual models using the checkboxes or by moving the mouse cursor over a line.<br />
<br />
<b>Data</b><br />
Obtaining the data was laborious. I first determined popular makes by looking at the numbers of cars for sale at carsales.com.au. I focussed on sedans, ignoring SUVs, vans, utes etc. For each popular make I selected a couple of popular models - small and large.<br />
<br />
Then I visited RedBook.com.au to research price history but encountered a couple of hurdles. Firstly, it isn't possible to get 10 years of price data for an individual model because RedBook only publishes the sticker price and <i>current</i> resale values (not the resale value last year, the year before and so on). To overcome this I used the sticker price and current resale value for comparable entry-level models from 2001 - 2011. <br />
<br />
The second problem was that for some makes (BMW, Kia, Mercedes and Nissan) it wasn't possible to find two models that had been sold in Australia every year for the last 10 years. And no single model of Hyundai (a very popular make) has been sold continuously for the last decade.<br />
<br />
<b>Insights</b><br />
Once I had the data I was able to visualize it, and there were a couple of surprises. Firstly, sticker price has remained fairly stable across all makes and models. I expected this would have decreased more recently, especially for imported cars, with the appreciation in the value of the Australian dollar.<br />
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIPWBliz7SVAkNOPiF93gjOjMZMqxvPJzl4a0HxTHkaefsEVUMEh45rvMUWoJeLX6_gt7y83AHLs5dbwbBkb-yVhhgLsDLaWwk8tCfCFnTMfezYqVFtmVfqrMs9zWD_em72pdL649sfFw1/s1600/sticker_price_cars.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="310" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgIPWBliz7SVAkNOPiF93gjOjMZMqxvPJzl4a0HxTHkaefsEVUMEh45rvMUWoJeLX6_gt7y83AHLs5dbwbBkb-yVhhgLsDLaWwk8tCfCFnTMfezYqVFtmVfqrMs9zWD_em72pdL649sfFw1/s1600/sticker_price_cars.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">New car prices have remained fairly stable for the past decade in spite of the appreciation in value of the Australian dollar.</td></tr>
</tbody></table>
<br />
<br />
<br />
Resale values drop precipitously in the first few years of a car's life - no surprises there.<br />
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjjTQTnNDWsLFYPcC_FnFUcGZxoKFNIXvIWDDCdx4P1xkFvNOTQKwf7QCzDK_K_eAibKjV7Z254Hjgt1N-Lahz4SJwhaYDx4tV5bkpB1DLW2DN5lLJomq2quey_CMnYJU9ojRgkMccONtJ2/s1600/resale_prices_cars_chart.png" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="310" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjjTQTnNDWsLFYPcC_FnFUcGZxoKFNIXvIWDDCdx4P1xkFvNOTQKwf7QCzDK_K_eAibKjV7Z254Hjgt1N-Lahz4SJwhaYDx4tV5bkpB1DLW2DN5lLJomq2quey_CMnYJU9ojRgkMccONtJ2/s1600/resale_prices_cars_chart.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Resale values drop significantly in the first few years.</td></tr>
</tbody></table>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjjTQTnNDWsLFYPcC_FnFUcGZxoKFNIXvIWDDCdx4P1xkFvNOTQKwf7QCzDK_K_eAibKjV7Z254Hjgt1N-Lahz4SJwhaYDx4tV5bkpB1DLW2DN5lLJomq2quey_CMnYJU9ojRgkMccONtJ2/s1600/resale_prices_cars_chart.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"></a><br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjjTQTnNDWsLFYPcC_FnFUcGZxoKFNIXvIWDDCdx4P1xkFvNOTQKwf7QCzDK_K_eAibKjV7Z254Hjgt1N-Lahz4SJwhaYDx4tV5bkpB1DLW2DN5lLJomq2quey_CMnYJU9ojRgkMccONtJ2/s1600/resale_prices_cars_chart.png" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"></a>
<br />
<br />
<br />
<br />
What did surprise me was that when resale value is expressed as a percentage of sticker price, small Japanese cars faired best. I had expected the European marques - Audi, BMW, Mercedes and Volkswagen - to top this ranking, or even popular family sedans but it's the compact Japanese Mazda 3, Toyota Corolla, Subaru Impreza, Honda Civic and the Holden Barina (Japanese import) that top the list. The smallish VW Golf also holds its value well as does the Merc. <br />
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMB8aKPyCERTUA_w-t8leNIuOeVKbP8vju34gprxNrwBCBpHXkvyv9VeoddGYSbx_JRybgIknsbFKu7XvKNhZwXRDZik1gQHCZLGfsiqpmoSxsriO8VN2HpNyZCByOh2Q3l8G6-DPPM7DE/s1600/resale_pct_chart.png" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="326" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMB8aKPyCERTUA_w-t8leNIuOeVKbP8vju34gprxNrwBCBpHXkvyv9VeoddGYSbx_JRybgIknsbFKu7XvKNhZwXRDZik1gQHCZLGfsiqpmoSxsriO8VN2HpNyZCByOh2Q3l8G6-DPPM7DE/s1600/resale_pct_chart.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Compact Japanese cars hold their value best over the 10 years considered.</td></tr>
</tbody></table>
<br />
<br />
<br />
<b>Commodore vs. Falcon</b> <br />
In Australia there is a long-running rivalry between the Holden Commodore and Ford Falcon. Below are the charts comparing the two cars. Falcon and Commodore track each other closely for sticker price. However, the resale value of Falcon falls away from that of the Commodore in the first few years <br />
<a name='more'></a>although after 10 years the two are close together again. As a percentage of sticker price the Falcon's resale price is one of the lowest of the models compared, especially during its mid-life, but the Commodore doesn't fair much better.<br />
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSDyYAUU_vd0Fc2R7y052Y88qjAeiV7vVXcKmirj222w2FXnbIATS_AWA0N5IIfW3Iimru5t6PD1qXDwo1UkWcy6_alWT42HqqWBGawvxXBQkFRjEOKBVJGbXoSdcc2wQl7Be_z4rlxDjd/s1600/sticker_price_chart_commodore_falcon.png" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="310" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSDyYAUU_vd0Fc2R7y052Y88qjAeiV7vVXcKmirj222w2FXnbIATS_AWA0N5IIfW3Iimru5t6PD1qXDwo1UkWcy6_alWT42HqqWBGawvxXBQkFRjEOKBVJGbXoSdcc2wQl7Be_z4rlxDjd/s1600/sticker_price_chart_commodore_falcon.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The sticker price of Commodores and Falcons track each other closely.</td></tr>
</tbody></table>
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqbTvroRYmrtExgnHgxWcgPEojVz2TA91hBkDoIm0UeYmCYzZVNX7R1gysBLCkfxtfdFW2f5Ml4Hz4JECdRcY0RNlT0dgnj7wNUHijqNnuBLNO4211dX3Wcc5vpQA2RJ31hR6butp_b5xG/s1600/resale_pct_chart_commodore_falcon.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="310" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqbTvroRYmrtExgnHgxWcgPEojVz2TA91hBkDoIm0UeYmCYzZVNX7R1gysBLCkfxtfdFW2f5Ml4Hz4JECdRcY0RNlT0dgnj7wNUHijqNnuBLNO4211dX3Wcc5vpQA2RJ31hR6butp_b5xG/s1600/resale_pct_chart_commodore_falcon.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The resale value of the Falcon drops below that of the Commodore but after 10 years they're similar.</td></tr>
</tbody></table>
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZ4JD_vV_RglZfkU3-mXt7gmWmr-8Ax0adbLiNS4hR2tWFTCCXw5nPXk3iZIwgvF4bAkg0dq5ISt-uiJ2qQ6cc9KrqkynwBuarhVOC_Vizdf_MgKLj0hKsp3XARKYjHFbWh0Pxm8IOilPD/s1600/resale_pct_chart_commodore_falcon.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="318" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZ4JD_vV_RglZfkU3-mXt7gmWmr-8Ax0adbLiNS4hR2tWFTCCXw5nPXk3iZIwgvF4bAkg0dq5ISt-uiJ2qQ6cc9KrqkynwBuarhVOC_Vizdf_MgKLj0hKsp3XARKYjHFbWh0Pxm8IOilPD/s1600/resale_pct_chart_commodore_falcon.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The resale value of the Falcon as a percentage of sticker price is one of the lowest of the models considered. The Commodore doesn't fair much better.</td></tr>
</tbody></table>
<br />
This visualization was created with <a href="http://d3js.org/" target="_blank">D3.js</a>, is shared using a <a href="https://creativecommons.org/licenses/by-sa/3.0/" target="_blank">Creative Commons license</a>, and the source-code is available on <a href="https://gist.github.com/3279878" target="_blank">GitHub.</a> Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-64111898876024756372012-10-31T18:06:00.001-07:002012-10-31T18:13:07.770-07:00Summer Olympics: Home Ground AdvantageThe 2012 Summer Olympic Games have come and gone, and congratulations to Great Britain for hosting the event so magnificently and for the outstanding performance of Team GB.<br />
<br />
This reminded me of how well Australian athletes performed at the Sydney Olympics in 2000, as did Greek athletes at the 2004 Athens Games, and Chinese athletes at Beijing in 2008.<br />
<br />
I wondered: <b>is the performance of the Summer Olympics host nation exceptional?</b><br />
<br />
I began by gleaning medal counts (gold, silver, bronze and total) and rankings (number of gold then silver then bronze medals) for all host nations of all modern Summer Olympic games. I also included the <a href="https://en.wikipedia.org/wiki/1906_Intercalated_Games" target="_blank">1906 Athens Intercalated Games</a>. I found the data at <a href="http://www.sports-reference.com/olympics/summer/" target="_blank">Sports Reference</a> and
<a href="https://en.wikipedia.org/wiki/Summer_Olympic_Games" target="_blank">Wikipedia.</a><br />
<br />
Then I created an <a href="http://bl.ocks.org/d/3279878/" target="_blank">interactive line chart visualization</a> of this data using <a href="http://d3js.org/" target="_blank">D3.js</a>. You can use the interactive version of the visualization if you have a modern, standards-compliant browser (Firefox, Chrome, Opera, Safari, etc.) or you can try <a href="http://www.google.com/chromeframe" target="_blank">Chrome Frame</a> (Internet Explorer).<br />
<br />
Two types of interaction are possible:<br />
<ul>
<li>transition between medal counts and rankings using the radio buttons</li>
<li>highlight host nation performance by mousing over lines or Olympiad labels on the <i>x</i>-axis </li>
</ul>
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtScpkBaUt70Spu_EcE49cvdmMF_t1gzQrKe6V1e71PG0S_moJHO0X8Y35nccN9aFzRBIKF6KHVJUFDt27zo8rwLQA7yNFalEu10ej4ykMU8hLWuTP_s9zne0aquMPkcOdwglmUc27nxpa/s1600/olympics_australia.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="430" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtScpkBaUt70Spu_EcE49cvdmMF_t1gzQrKe6V1e71PG0S_moJHO0X8Y35nccN9aFzRBIKF6KHVJUFDt27zo8rwLQA7yNFalEu10ej4ykMU8hLWuTP_s9zne0aquMPkcOdwglmUc27nxpa/s1600/olympics_australia.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Australia's performance peaks at the Melbourne and Sydney Olympic Games</td></tr>
</tbody></table>
<br />
<br />
<br />
<br />
<br />
Interacting with the chart it is clear that the performance of host nations does peak at their home games. The image above shows peaks in Australia's total medal counts for the Melbourne and Sydney Summer Olympics.<br />
<br />
There are a few exceptions, e.g. there was no peak for the US team at the Atlanta Games. Also, there are peaks that don't coincide with a nation hosting the games. A good example of this is the L.A. Games (XXIII Olympiad), which were boycotted by Russia. As a consequence there is a spike in medal counts for those nations that did attend, see the image below.<br />
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7UWD276Dyoe6nH2vf9Yqcyv_gy328wq57Si9FrbTOTIiUcA0NIQUboAkS7QnFmOt6kP79NLOcMa6NxNoKpGIEwiAivmyOM_jVGT4rYKsrTIUHsv-Cf5JdPBZos6B0dFOaJYX_bfb8h3bg/s1600/olympics_usa.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="417" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7UWD276Dyoe6nH2vf9Yqcyv_gy328wq57Si9FrbTOTIiUcA0NIQUboAkS7QnFmOt6kP79NLOcMa6NxNoKpGIEwiAivmyOM_jVGT4rYKsrTIUHsv-Cf5JdPBZos6B0dFOaJYX_bfb8h3bg/s1600/olympics_usa.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Total medal counts for the USA. a) There is no peak for the Atlanta Games, and b) all participants in the L.A. Games received a boost (Russia boycotted).</td></tr>
</tbody></table>
<br />
<br />
<br />
<br />
<br />
An obvious trend that is visible using the visualization is that the spread in rankings has broadened with the passage of time. This is due to the increased participation in the Olympics; from 12 nations in 1896 to 205 in 2012.<br />
<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; margin-right: 1em; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1aiKQ2dZHCooxBDxUTGV8MoP7Vp8Ntu33zdiHOArXerMIhyphenhyphenZfyE-3iTAb1Cq3GLXck3xAY6JrclQUGoz1mupQ7qnmIzamgsLe5We3k8vm_6Ib51EZnl2s-tv_gb4Ud0OhfJ0Q9AghiG6F/s1600/olympic_rank_greece.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="365" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1aiKQ2dZHCooxBDxUTGV8MoP7Vp8Ntu33zdiHOArXerMIhyphenhyphenZfyE-3iTAb1Cq3GLXck3xAY6JrclQUGoz1mupQ7qnmIzamgsLe5We3k8vm_6Ib51EZnl2s-tv_gb4Ud0OhfJ0Q9AghiG6F/s1600/olympic_rank_greece.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Host nations by rank (Greece highlighted). The spread of rankings has broadened as more nations have participated with each Olympiad.</td></tr>
</tbody></table>
<br />
<br />
<br />
The visualization helped answer my original question and spot a couple of other interesting phenomena. On reflection I think a <a href="https://en.wikipedia.org/wiki/Small_multiple" target="_blank">small multiples</a> visualization would have served me better than the single combined line chart. Something for a future version perhaps.<br />
<br />
This visualization is shared using a <a href="https://creativecommons.org/licenses/by-sa/3.0/" target="_blank">Creative Commons license</a>, and the source-code is available on <a href="https://gist.github.com/3279878" target="_blank">GitHub.</a>Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-58199895886360891942012-10-16T17:26:00.001-07:002012-10-16T17:27:28.734-07:00Brownlow Medallists Visualization Updated: Jobe WatsonCongratulations Jobe Watson, winner of the 2012 Brownlow Medal.<br />
<br />
I've updated my<a href="http://www.vislives.com/2011/09/brownlow-medal-winners-visualization.html" target="_blank"> Brownlow Medallists visualization</a> to reflect this "new data" including the playing histories of previous years' medallists, who are still active.Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.comtag:blogger.com,1999:blog-3184694675311940812.post-63413323843529476072012-07-08T23:15:00.000-07:002012-07-08T23:16:58.508-07:00Pushbutton InfographicsThere's been a recent flurry of new offerings in the world of on-line tools for creating and publishing infographics, so I thought I'd provide a brief overview of the main protagonists. Some are new (still in beta), others are more established (have been around for a couple of years). The tools all follow a similar format: they provide a set of templates to which you add your content, and then publish the result. Easy.<br />
<br />
<a href="http://visual.ly/" target="_blank"><b>Visual.ly</b></a><br />
Visual.ly has been around for a while and is probably best known as a clearinghouse for infographics - a YouTube for infographics if you like. Visual.ly recently added the ability to create infographics. Currently, it offers only a handful of templates based on social media (Facebook and Twitter) themes. In order to use these you must sign-in to your Facebook or Twitter account and allow access to the Visual.ly app. An example infographic created from one of these templates is shown below.<br />
<br />
At this stage Visual.ly's offering is quite limited but more "cutomizable infographics in popular categories, like sports, politics and food" are promised soon.<br />
<br />
I think Visual.ly's most valuable resource is its <a href="http://blog.visual.ly/" target="_blank">blog</a>. Making it <i>easy</i> to create infographics is only one piece of the puzzle. It's important to be able to create <i>good</i> infographics, especially if you want to stand out from the deluge of rubbish that's out there. Visual.ly's blog offers valuable advice on how to craft high-quality infographics.<br />
<br />
<div class="visually_embed" data-category="Social Media" rel="infographic">
<img class="visually_embed_infographic" rel="http://visually.visually.netdna-cdn.com/LifeofaHashtaginfographic_4f5e39d67820c.png" src="http://visually.visually.netdna-cdn.com/LifeofaHashtaginfographic_4f5e39d67820c_w587.png" /><br />
<div class="visually_embed_bar">
<br />
<br /></div>
<a href="http://visual.ly/life-hashtag-infographic" id="visually_embed_view_more" target="_blank"></a><link href="http://visual.ly/embeder/style.css" rel="stylesheet" type="text/css"></link> <script src="http://visual.ly/embeder/embed.js" type="text/javascript">
</script></div>
<a href="http://easel.ly/" target="_blank"><b>Easel.ly</b></a><br />
Easel.ly is a more recent entrant in the field of push-button infographics - it's still in beta. I've included Easel.ly's promotional video below, which provides a quick introduction to how it works.<br />
<br />
<iframe allowfullscreen="" frameborder="0" height="337" mozallowfullscreen="" src="http://player.vimeo.com/video/37781587?color=ffffff" webkitallowfullscreen="" width="600"></iframe><br />
<br />
Easel.ly describes itself as "a theme-based web-app for creating infographics and data visualizations." Like Visual.ly it provides a selection of templates. The choice is broader than that offered by Visual.ly - 15 templates are currently available - and you don't have to connect your Facebook or Twitter account. You can also start with a blank canvas.<br />
<br />
You can then drag-and-drop "objects" (icons from a variety of categories), "shapes" (arrows, symbols, etc.) and text boxes onto your template. These can be customised (colour and size) once in place. You can also upload your own images for inclusion in your infographic.<br />
<br />
Once your infographic is complete it can be published via Easel.ly for embedding in other Web pages.<br />
<br />
<a href="http://infogr.am/" target="_blank"><b>Infogr.am</b></a><br />
<br />
<a href="http://infogr.am/i/frontpage/01-2.jpg"><img border="0" src="http://infogr.am/i/frontpage/01-2.jpg" /></a>
<br />
<br />
Infogr.am is another new kid on the infographics block. It is quite similar to the others in that it's template-based. There are two types of template to choose from:<br />
<ul>
<li>infographics: of which there are eight templates</li>
<li>charts: bar chart, pie chart, line chart, glyph matrix and <a href="http://infogr.am/1341296133" target="_blank">frog chart</a> (yes, really)</li>
</ul>
Each of the infographics templates includes one or more charts. One nice feature is that in customising a chart you provide the actual data for the chart to visualize. This is presented via a spreadsheet GUI. As well as charts you can add accompanying text (title, quotes, free text) and insert your own images.<br />
<br />
Once you've completed an infographic you can publish it via Twitter, Facebook or Pinterest, or embed it in other Web pages.<br />
<br />
<a href="http://venngage.com/" target="_blank"><b>Venngage</b></a><br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiZEEXEiQQh7uwCHT17mZezlpXZub46Hgxe6zczc_SQbJru-xyrDeBp0HzLiIa1r6-TRIVURHg8QqbQh3a2mjqDc69hNTY7j_3ch4PIC3w-4QgLgNHaFPybQ2wEVNQ9XLpCK2HxWobtwc-/s1600/venngage.png" imageanchor="1"><img border="0" height="330" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiZEEXEiQQh7uwCHT17mZezlpXZub46Hgxe6zczc_SQbJru-xyrDeBp0HzLiIa1r6-TRIVURHg8QqbQh3a2mjqDc69hNTY7j_3ch4PIC3w-4QgLgNHaFPybQ2wEVNQ9XLpCK2HxWobtwc-/s640/venngage.png" width="640" /></a>
<br />
<br />
Venngage is the latest offering from the creators of <a href="http://www.vislives.com/2011/08/vizualizeme-resume-as-infographic.html">Visualize.me</a>. It is perhaps the most complete on-line infographic tool of those considered here. So much so, that Venngage costs money ($99 per month for individuals; $249 per month for teams).<br />
<br />
As with the other tools, Venngage is template-based; you can also start with a blank canvas. Venngage's infographics editor is a point-and-click affair. A large selection of charts is available. Each is backed by data you provide by either uploading it or entering it via a speadsheet UI. Shapes, text and images can also be added.<br />
<br />
<a href="http://numberpicture.com/" target="_blank"><b>Number Picture</b></a><br />
<br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUJLVw-25UGJO0fEJzLpm0nxq8NVdZ00awuudXCSr8RJ6Xtbl5jJlTZ5sUoriPJq0uYOD3K_LsJdM9bbvavUxg9oKWr6FDzPSJnqgLUYINiwEkMcVD8TzL6ln7khGsxyT7aCunLnJvOlYR/s1600/numberpicture.png" imageanchor="1"><img border="0" height="638" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUJLVw-25UGJO0fEJzLpm0nxq8NVdZ00awuudXCSr8RJ6Xtbl5jJlTZ5sUoriPJq0uYOD3K_LsJdM9bbvavUxg9oKWr6FDzPSJnqgLUYINiwEkMcVD8TzL6ln7khGsxyT7aCunLnJvOlYR/s640/numberpicture.png" width="640" /></a>
<br />
<br />
Number Picture has been around the longest of the services considered here - I've blogged about it <a href="http://www.vislives.com/2011/05/number-picture-new-kind-of-social.html" target="_blank">before</a>. The infographics templates that Number Picture provides are fairly simple having only a title, "blurb" (text block) and "picture" (chart). To create an infographic you supply the text and data. The latter is rendered as a picture.<br />
<br />
Number Picture's emphasis is different from the others' in that it encourages users to create and share templates. The templates are created using<a href="http://processingjs.org/" target="_blank"> Processing.js</a>. Working with Processing.js is fairly easy for those of us from coding backgrounds but for non-coding folks this might be a problem, especially, if the existing templates don't provide what's needed.<br />
<br />
<br />
<b>Conclusion</b><br />
Most of the tools I've discussed are fairly basic but have enough functionality to allow you to create a simple infographic. Venngage is the most complete offering but whether it's worth the money they're asking remains to be seen. All the tools make it <i>easy</i> to create infographics. However, creating <i>good</i> infographics is a different story.<br />
<br />
There are many other on-line tools for creating data visualizations - too many for a single post - so here I've focussed on infographics tools. If I've missed any (or you have anything to contribute) then please leave a comment.<b><br /></b>Chrishttp://www.blogger.com/profile/05461074601050876693noreply@blogger.com