February 20, 2014

Statistics Done Wrong - Alex Reinhart

I've just finished reading Alex Reinhart's excellent Statistics Done Wrong; a guided tour of common statistical fallacies and misconceptions. It covers p-values, statistical power, statistical significance, pseudo-replication and stopping rules.

Statistics Done Wrong is written with all scientists in mind, assuming no knowledge of statistical methods. It's essential reading for all data scientists, including data visualization practitioners. Even though we're not data analysts we need at least a basic level of statistical literacy.

The problems Alex describes are rife in the scientific peer-reviewed literature. By coincidence I'm reading Ben Goldacre's Bad Pharma, which focusses on how clinical trials data is distorted by the pharmaceutical industry. Many of the issues Alex raises are seen in practice in Bad Pharma.

Statistics Done Wrong concludes with What Can Be Done? Here I quote the Your Job section:
Your task can be expressed in four simple steps:
  1. Read a statistics textbook or take a good statistics course. Practice.
  2. Plan your data analyses carefully and deliberately, avoiding the misconceptions and errors you have learned.
  3. When you find common errors in the scientific literature – such as a simple misinterpretation of p values – hit the perpetrator over the head with your statistics textbook. It’s therapeutic.
  4. Press for change in scientific education and publishing. It’s our research. Let’s not screw it up.

Statistics Done Wrong is on-line, free and should take you no more than an hour to read. Once you've read it share it with your data scientist colleagues. And if you want to learn more about data analysis then I recommend Coursera's Data Analysis MOOC - read my account of it here.