February 1, 2013

Lines vs. Bars for Categorical Data

I recently commented on a thread started by Joey Cherdarchuk in the LinkedIn Data Visualization group. The thread discusses Joey's reworking of an infographic about social media demographics. Joey used diverging stacked bar charts to significantly improve upon the original, which used pie charts. You can read Joey's blog post in full here.

I suggested an alternative would be to use a simple line chart. This is a technique often advocated by Kaiser Fung on his excellent Junk Charts blog. Here's an example of his approach. Many people react negatively to this technique as you can see in the comments section of Kaiser's post. Here's his response:
You won't be the only reader to feel this way. Over the years, I have had complaints from readers about lines connecting categorical data every time I put up such a chart. Here's my reasoning: follow your eyes as you read a dot plot, you are visually tracing the lines that I have drawn, why not just draw the lines?
I happen to agree with Kaiser; using lines helps tie together the separate data points so you can more easily see trends and make comparisons.

I applied this treatment to the social demographics data from the original infographic. You can see the results below (interactive version here):

This approach certainly has its merits. You can clearly see that for most social media platforms, participation rates increase with age. Google+ is the obvious exception and the trend for Reddit is flat. As you'd expect, the trend is most stark for LinkedIn; the professional network.

The interactive version also has examples of the data plotted using point charts and bar charts (stacked and clustered). None of which I feel work as well as the simple line chart. For example, here's a clustered bar chart.

I think it's important not to reflexively rule out line charts when dealing with categorical data as the technique can yield useful insights.

Update (2013-02-22)

During further discussion on the LinkedIn Data Visualization group, Bill Droogendyk referenced an excellent article on the subject of visualizing quantitative data by one of my favourite viusualization thought leaders, Steven Few. The article entitiled "Quantitative vs. Categorical Data: A Difference Worth Knowing" discusses the different types of categorical data:
  • nominal
  • ordinal
  • interval
Using Few's nomenclature, the Age axis used in the charts above is an interval scale, for which Few recommends line (and bar) charts. Kaiser Fung's example uses an ordinal scale. At first glance some interpret it as nominal but fail to notice the following treatment:
I sorted the schools by the ratio of three-pointers to midrange jump shots.

By ranking the schools, the scale Fung uses is ordinal. Now here is where Fung and Few differ. Few advises against using line charts with ordinal scales, whereas Fung does so quite often.

I sit on the fence: I reckon it's worth considering a line chart for categorical data (interval & ordinal) and seeing for yourself.