http://amplab.colostate.edu/sytycg.html
For Season 1 of So You Think You Can Graph (hosted by Jessica K. Witt of Colorado State University and Gary H. McClelland of University of Colorado - Boulder), graphs were evaluated for their effectiveness of presenting the magnitude of an effect.
Human participants (recruited from the Psychology Participant Pool at CSU) estimated whether a graph showed a small, medium, or big effect or no effect. Each participant viewed 4 different graph types from the available pool of 26 contest entries. For each entry, 40 graphs were created: 10 showing a null effect, 10 showing a small (d = .3) effect, 10 showing a medium (d = .5) effect, and 10 showing a large (d = .8) effect. Participants made judgements for all 40 renditions of each of the 4 graph types to which they were assigned. Order was randomized, and participants completed 3 blocks of trials.
Winner: Congratulations Lindsay Juarez and Andrea Dinneen of the Duke Center for Advanced Hindsight!!!
Juarez & Dinneen slightly edged out close runner-up Ed Vul of University of California - San Diego!
Here is an example graph from the winner:
Here is an example graph from the runner up:
Host Commnets: We were gratified by the number of people who submitted entries. It was exciting to see a wide variety of creative ideas for communicating effect size. The best entries explicitly provided lines and labels for the different degrees of effect size. We had hoped for graphs from which viewers would infer effect size. When we were asked, we discouraged some contestants from using precise labels. (We did disallow one entry that used only labels and did not present any statistics or data in the graph.). We regret that now and hope those we discouraged will accept our apologies. We will be less restrictive in the next competition and/or more precise with the instructions. Seeing the winning entries suggests that researchers should be using more indications and verbal explanations in their graphs. In the early days of statistical graphics (see Galton’s regression graph below), verbal explanations on the graph itself were common. And think how differently we might think of published small effects if the authors had been required to extend axes of their graphs to show what the size of medium and large effects would have been.
Another surprise to us was the amount of variance of sensitivity slopes within graph types. Some graphs induced fairly uniform responses while the slopes for some graphs were highly varied. All else being equal, lower variance should be preferred and we will try to add that to the scoring for the next competition.
We thank everyone for participating and hope you and others will participate in our next competition.
WINNER CALCULATIONS:
To determine the winner, we calculated a graph score for each participant for each entry. The graph score was calculated as 10 * sensitivity + -5 * bias + -2 * reaction time.
To calculate sensitivity and bias, we conducted linear regressions for each participant for each entry. We only included data from trials for which a non-null effect was presented. The dependent measure was estimated effect size (coded as 0, 1, 2, and 3 for “null”, “small”, “medium”, and “big” effects). The independent measure was the depicted effect size (coded as -1, 0, and 1 for small, medium, and big). Sensitivity corresponded to the slope, with 1 being perfect performance and 0 being chance performance. Bias corresponded to the intercept, which was calculated as the absolute difference between the intercept and 3. Lower values correspond to better performance. Reaction time was calculated as the median reaction time (in ms). Lower values correspond to faster responses. All scores were z-scored to put on the same scale.
Here is a plot of the graph scores by graph type:
Here are plots of the individual participant responses for each contest entry (slightly prettier versions are further down):