March 28, 2014

A Distorted and Incomplete Picture

When I saw Ben Goldacre's latest book Bad Pharma: How Drug Companies Mislead Doctors and Harm Patients on the new releases bookshelf of my local library, I borrowed it immediately. Not because I thought it would inform my data visualization work but because I really enjoyed Ben's previous book Bad Science.

So, I was pleased when I read Bad Pharma to find that it focuses on data, the raw material we work with when creating visualizations. I was also deeply disturbed by the book given that it details how modern evidence-based medicine is broken. Ben provides a useful summary of Bad Pharma in the book's introduction:
Drugs are tested by the people who manufacture them, in poorly designed trials, on hopelessly small numbers of weird, unrepresentative patients, and analysed using techniques which are flawed by design, in such a way that they exaggerate the benefits of treatments. Unsurprisingly, these trials tend to produce results that favour the manufacturer. When trials throw up results that companies don't like, they are perfectly entitled to hide them from doctors and patients, so we only ever see a distorted picture of any drug's true effects. Regulators see most of the trial data, but only from early on in a drug's life, and even then they don't give this data to doctors or patients, or even to other parts of government. This distorted evidence is then communicated and applied in a distorted fashion. In their forty years of practice after leaving medical school, doctors hear about what works through ad hoc oral traditions, from sales reps, colleagues or journals. But those colleagues can be in the pay of drug companies – often undisclosed – and the journals are too. And so are the patient groups. And finally, academic papers, which everyone thinks of as objective, are often covertly planned and written by people who work directly for the companies, without disclosure. Sometimes whole academic journals are even owned outright by one drug company. Aside from all this, for several of the most important and enduring problems in medicine, we have no idea what the best treatment is, because it's not in anyone's financial interest to conduct any trials at all. These are ongoing problems, and although people have claimed to fix many of them, for the most part they have failed; so all these problems persist, but worse than ever, because now people can pretend that everything is fine after all.
But enough about medicine. What makes Bad Pharma interesting to a data visualization practitioner are not charts and graphs (there are only a few in the book) it's the discussion of data. The book's first chapter Missing Data describes how drug trials performed by pharmaceutical companies overwhelmingly produce results that are favourable to the companies. Goldacre argues that this arises for several reasons
  • flawed experimental design: trials are designed in ways likely to produce a favourable outcome
  • flawed data analysis: see my post on Alex Reinhart's Statistics Done Wrong
  • publication bias: trials that produce unfavourable outcomes are simply not published, skewing published data towards favourable results
This reminds us to be circumspect about the data we visualize. We should ask:
  • How was the data collected?
  • How has the data been transformed or processed?
  • Is the data complete?
The answers to these questions are metadata that we need to communicate as part of any visualization we create. Without it, we risk painting a distorted and incomplete picture of the data we are visualizing.

March 21, 2014

Lyra: the Interactive Visualization Design Environment

I recently spent some time using Lyra an "interactive visualization design environment" that allows you to create visualizations without writing a single line of code. It's being developed by Arvind Satyanarayan, Kanit “Ham” Wongsuphasawat and Jeffrey Heer (think Prefuse, Protovis, D3, Vega) at the University of Washington's Interactive Data Lab.

Lyra is a bit like other interactive visualization design tools such as Tableau and Spotfire. However, under the hood it's powered by D3 (like Plot.ly). Now, I enjoy coding visualizations directly using D3 but I realise not everyone shares my enthusiasm or has the time to learn D3. Lyra gives you access to the expressiveness of D3 without requiring you to learn its API (or Javascript).

The Lyra application is shown below and consists of three panels:
  • the left-hand panel manages Data Pipelines, where you define and transform (sort, group, filter, window, apply formula) data sources
  • the centre panel displays your visualization, where you interactively select and modify visualization elements: marks (rectangles, symbols, arcs, areas, lines and text), axes and layers
  • the right-hand panel provides access to attributes of the elements in your visualization
Once you've created a visualization you can export it as an image (PNG or SVG) or a Vega specification.





If you're familiar with D3 then you'll recognise some of its idioms in Lyra. For example, D3's data-binding mechanism is implemented by dragging-and-dropping data variables onto the attributes of visual elements.

If you want to try Lyra then you have several options:
Bear in mind that Lyra is alphaware. I did encounter a few issues, e.g. saving and recovering work didn't appear to work properly. The authors are interested in constructive feedback.