January 30, 2013

Coursera Data Analysis MOOC: First Impressions

On the spur of the moment I decided to enrol in Coursera's Data Analysis course. I've been curious about MOOCs (massive open on-line courses) for some time, so when I came across this one, I decided it was time to find out more. Plus, the course topic is well-suited to the kind of work I do.

The course is given by Jeff Leek, an Assistant Professor in Biostatistics from the Johns Hopkins Bloomberg School of Public Health. Jeff's introductory video is shown below.


The course is run over eight weeks and is delivered as a set of video lectures. Topics covered include:
  • The structure of a data analysis (steps in the process, knowing when to quit, etc.)
  • Types of data (census, designed studies, randomized trials)
  • Types of data analysis questions (exploratory, inferential, predictive, etc.)
  • How to write up a data analysis (compositional style, reproducibility, etc.)
  • Obtaining data from the Web (through downloads mostly)
  • Loading data into R from different file types
  • Plotting data for exploratory purposes (boxplots, scatterplots, etc.)
  • Exploratory statistical models (clustering)
  • Statistical models for inference (linear models, basic confidence intervals/hypothesis testing)
  • Basic model checking (primarily visually)
  • The prediction process
  • Study design for prediction
  • Cross-validation
  • A couple of simple prediction models
  • Basics of simulation for evaluating models
  • Ways you can fool yourself and how to avoid them (confounding, multiple testing, etc.)
Each lecture can be viewed in your Web browser or downloaded (MP4) for off-line viewing. The lectures are slide presentations with audio of the lecturer explaining the content. You can download the slides (PDF) and transcripts if you prefer.

A 10-question quiz must be completed by the end of each week. It has hard and soft deadlines. If you miss the soft deadline you can still submit answers before the hard deadline but a penalty is applied to your score. You can attempt each quiz four times.

Two peer assignments must be completed; one in week 3 (due at the end of week 4) the other in week 6 (due at the end of week 7).  The assignments are graded by your student peers, and you must grade at least four peer assignments to avoid a 20% penalty. Your grade is based on the median of the grades you receive from your peers.

An interesting aspect of the course is the forum, to which students can post questions. Prof. Leek obviously can't answer all the questions, as the course has 100,000 students. So, you can vote on questions and the lecturer responds to the top few. Students can help each other out by responding to questions too.

The course requires a working knowledge of R. I've been using R increasingly as part of my day-to-day work so am comfortable with this. Some (optional) background lectures on R are provided in the course material along with links to other resources.

Successful completion of the course conveys no official qualification or accreditation. I've enrolled purely for my own edification; to learn about MOOCs like coursera, and sharpen my data analysis skills.

I'll post follow ups as the course progresses.

January 1, 2013

Comparison of Australian Car Values

I was recently in the market for new wheels, and so spending a bit of time researching the Australian car market at carsales.com.au and RedBook.com.au. It got me thinking about the rates of depreciation in value of different makes of car. So, I set about creating a chart that would help me visualize this kind of information.

The result is the interactive line chart shown below. You can use the interactive version of the visualization if you have a modern, standards-compliant browser (Firefox, Chrome, Opera, Safari, etc.) or you can try Chrome Frame (Internet Explorer).

Resale value (%) for several popular models of Australian car.


Interaction
The chart plots a line for each of several popular models of car. The lines can be made to represent several different values:
  • Sticker price: price when new
  • Resale value: price when selling the car on the private market
  • Resale value (%): the resale value as a percentage of the sticker price
You can also highlight individual models using the checkboxes or by moving the mouse cursor over a line.

Data
Obtaining the data was laborious. I first determined popular makes by looking at the numbers of cars for sale at carsales.com.au. I focussed on sedans, ignoring SUVs, vans, utes etc. For each popular make I selected a couple of popular models - small and large.

Then I visited RedBook.com.au to research price history but encountered a couple of hurdles. Firstly, it isn't possible to get 10 years of price data for an individual model because RedBook only publishes the sticker price and current resale values (not the resale value last year, the year before and so on). To overcome this I used the sticker price and current resale value for comparable entry-level models from 2001 - 2011.

The second problem was that for some makes (BMW, Kia, Mercedes and Nissan) it wasn't possible to find two models that had been sold in Australia every year for the last 10 years. And no single model of Hyundai (a very popular make) has been sold continuously for the last decade.

Insights
Once I had the data I was able to visualize it, and there were a couple of surprises. Firstly, sticker price has remained fairly stable across all makes and models. I expected this would have decreased more recently, especially for imported cars, with the appreciation in the value of the Australian dollar.

New car prices have remained fairly stable for the past decade in spite of the appreciation in value of the Australian dollar.



Resale values drop precipitously in the first few years of a car's life - no surprises there.

Resale values drop significantly in the first few years.





What did surprise me was that when resale value is expressed as a percentage of sticker price, small Japanese cars faired best. I had expected the European marques - Audi, BMW, Mercedes and Volkswagen - to top this ranking, or even popular family sedans but it's the compact Japanese Mazda 3, Toyota Corolla, Subaru Impreza, Honda Civic and the Holden Barina (Japanese import) that top the list. The smallish VW Golf also holds its value well as does the Merc.

Compact Japanese cars hold their value best over the 10 years considered.



Commodore vs. Falcon
In Australia there is a long-running rivalry between the Holden Commodore and Ford Falcon. Below are the charts comparing the two cars. Falcon and Commodore track each other closely for sticker price. However, the resale value of Falcon falls away from that of the Commodore in the first few years