So You Think You Can Graph

SYTYCG Logo

http://amplab.colostate.edu/sytycg.html

For Season 1 of So You Think You Can Graph (hosted by Jessica K. Witt of Colorado State University and Gary H. McClelland of University of Colorado - Boulder), graphs were evaluated for their effectiveness of presenting the magnitude of an effect.

Human participants (recruited from the Psychology Participant Pool at CSU) estimated whether a graph showed a small, medium, or big effect or no effect. Each participant viewed 4 different graph types from the available pool of 26 contest entries. For each entry, 40 graphs were created: 10 showing a null effect, 10 showing a small (d = .3) effect, 10 showing a medium (d = .5) effect, and 10 showing a large (d = .8) effect. Participants made judgements for all 40 renditions of each of the 4 graph types to which they were assigned. Order was randomized, and participants completed 3 blocks of trials.

Winner: Congratulations Lindsay Juarez and Andrea Dinneen of the Duke Center for Advanced Hindsight!!!

Juarez & Dinneen slightly edged out close runner-up Ed Vul of University of California - San Diego!

Here is an example graph from the winner:

$Example from Jaurez & Dinneen$

Example from Jaurez & Dinneen

Here is an example graph from the runner up:

Example from Vul

Host Commnets: We were gratified by the number of people who submitted entries. It was exciting to see a wide variety of creative ideas for communicating effect size. The best entries explicitly provided lines and labels for the different degrees of effect size. We had hoped for graphs from which viewers would infer effect size. When we were asked, we discouraged some contestants from using precise labels. (We did disallow one entry that used only labels and did not present any statistics or data in the graph.). We regret that now and hope those we discouraged will accept our apologies. We will be less restrictive in the next competition and/or more precise with the instructions. Seeing the winning entries suggests that researchers should be using more indications and verbal explanations in their graphs. In the early days of statistical graphics (see Galton’s regression graph below), verbal explanations on the graph itself were common. And think how differently we might think of published small effects if the authors had been required to extend axes of their graphs to show what the size of medium and large effects would have been.

Galton Graph

Another surprise to us was the amount of variance of sensitivity slopes within graph types. Some graphs induced fairly uniform responses while the slopes for some graphs were highly varied. All else being equal, lower variance should be preferred and we will try to add that to the scoring for the next competition.

We thank everyone for participating and hope you and others will participate in our next competition.

WINNER CALCULATIONS:

To determine the winner, we calculated a graph score for each participant for each entry. The graph score was calculated as 10 * sensitivity + -5 * bias + -2 * reaction time.

To calculate sensitivity and bias, we conducted linear regressions for each participant for each entry. We only included data from trials for which a non-null effect was presented. The dependent measure was estimated effect size (coded as 0, 1, 2, and 3 for “null”, “small”, “medium”, and “big” effects). The independent measure was the depicted effect size (coded as -1, 0, and 1 for small, medium, and big). Sensitivity corresponded to the slope, with 1 being perfect performance and 0 being chance performance. Bias corresponded to the intercept, which was calculated as the absolute difference between the intercept and 3. Lower values correspond to better performance. Reaction time was calculated as the median reaction time (in ms). Lower values correspond to faster responses. All scores were z-scored to put on the same scale.

Here is a plot of the graph scores by graph type:

Final Results

Here are plots of the individual participant responses for each contest entry (slightly prettier versions are further down):

Final Results

Materials can be found on OSF: https://osf.io/udfpq/

Analysis scripts:

#Get and combine participant data and graph info
dt <- read.csv("sytycg_S1_Data.csv", header=T)  #Participant Data
df <- read.csv("saveParams_SYTYCG_Season1.csv",header=T)  #Info about the graphs
df <- df[,c(2,3,5)]
dt <- merge(dt,df,by="graphNum",all.x = T) #merge data and graph info

# Recode effect size
dt$corrCentered <- ifelse(dt$effectSize == 0, 1, 
                   ifelse(dt$effectSize == .3, 2, 
                   ifelse(dt$effectSize == .5, 3, 
                   ifelse(dt$effectSize == .8, 4, NA)))) - 3


#Exclude graphs depicting a null effect
dt <- dt[which(dt$effectSize > 0),]

#Recode subject number
dt$Subject <- dt$Subject * 1000000 + dt$Session  #Some RAs entered participant ID into session instead of subject, so this way each file corresponds to unique individual

subjs <- unique(dt$Subject)
numStyles <- 4
saveEm <- data.frame(subj = NA,
                     graphName = NA,
                     bias = NA,
                     sensitivity = NA,
                     RT = NA)
sa <- 0

#For each subject (i) for each graph style (j), calculate sensitivity, bias, and RT
for (i in subjs) {

  #select styles seen by this participant
  currStyles <- as.character(unique(dt$graphName[which(dt$Subject == i)]))  
  for (j in currStyles) {
    dd <- dt[which(dt$Subject == i & dt$graphName == j),]  #get data only for this participant for this style

    if (length(dd$graphNum) > 20) { #ensure enough data to analyze
    p1 <- lm(resp ~ corrCentered, dd)    #run linear regression
    
    #Save outcomes
    sa <- sa+1
    saveEm[sa,] <- NA  #unncessary but just in case I calculated the wrong number of rows
    saveEm$subj[sa] <- i 
    saveEm$graphName[sa] <- j  
    saveEm$bias[sa] <- coefficients(p1)[1]            #raw bias score = intercept
    saveEm$sensitivity[sa] <- coefficients(p1)[2]     #sensitivity = slope
    saveEm$RT[sa] <- median(dd$RT)                    #speed = median RT
    
    } #end if enough trials
  }
}

saveEm$graphName <- as.factor(saveEm$graphName)
saveEm$absSensitivity <- abs(1-saveEm$sensitivity)  #take absolute deviation from 1 given that 1 is perfect sensitivity
saveEm$absBias <- abs(saveEm$bias - 3)  #Calculate absolute value of the bias score

saveEm$sensitivityZ <- -1*scale(saveEm$absSensitivity)[,1]  
saveEm$absBiasZ <- -1*scale(saveEm$absBias)[,1]
saveEm$speedZ <- -1*scale(saveEm$RT)[,1]
saveEm$graphScore <- 10 * saveEm$sensitivityZ + 5 * saveEm$absBiasZ + 2 * saveEm$speedZ  #For sensitivity, bigger is better, so the weight is positive.  For bias and RT, smaller is better, so the weights are negative

rm(p1,sa)

graphScores <- merge(aggregate(graphScore ~ graphName, saveEm, mean), 
                     merge(aggregate(sensitivityZ ~ graphName, saveEm, mean), 
                     merge(aggregate(absBiasZ ~ graphName, saveEm, mean),
                           aggregate(speedZ ~ graphName, saveEm, mean),  by="graphName"), by="graphName"), by="graphName")
graphScores$rank <- length(graphScores$graphScore) + 1 - rank(graphScores$graphScore)
graphScores$pch <- substr(as.character(graphScores$graphName),1,2)


print(paste("The winner is the",graphScores$graphName[which(graphScores$rank == min(graphScores$rank))],"!!"))

## [1] "The winner is the JuarezDinneen !!"

#Single letter indicators
plot(graphScores$rank, graphScores$graphScore, bty="l", 
     pch=graphScores$pch, cex.lab = 1.5, cex.axis = 1.25,
     xlab = "Rank Order", ylab = "Graph Score")

#Double letter indicators
plot(graphScores$rank, graphScores$graphScore, bty="l", 
     pch=graphScores$pch, cex.lab = 1.5, cex.axis = 1.25,
     xlab = "Rank Order", ylab = "Graph Score", col="white")
text(graphScores$rank, graphScores$graphScore, graphScores$pch)

Boxplot of sensitivity by graph type (ordered by rank on graph score). Dotted line at zero represents no sensitivity, and red line at 1 indicates perfect sensitivity.

saveEm <- merge(saveEm, graphScores[,c(1,2,6,7)], by="graphName")

boxplot(saveEm$sensitivity ~ saveEm$rank, xaxt="n", xlab="Graph Score Rank", ylab="Sensitivity")
axis(side=1, at=graphScores$rank, labels = graphScores$pch)
abline(h=0, lty=3)
abline(h=1, col="red")

All data for all graph entries (ordered by performance from best to worst):

for (ii in 1:26) {
  i <- graphScores$graphName[which(graphScores$rank == ii)]
  a <- saveEm[which(saveEm$graphName == i),]
  plot(c(-1.5,1.5),c(1,7),col="white",xlab="Depicted Effect Size", ylab = "Estimated Effect Size", bty="l", main = i, xaxt="n", yaxt="n")
  axis(side=1, at=c(-1, 0, 1), labels = c("Small", "Medium", "Big"))
  axis(side=2, at=seq(1,4), labels = c("Null", "Small", "Medium", "Big"))
  img <- readJPEG(paste0(i,"40.jpg"))
  rasterImage(img,-1.5,4,1.5,7, xpd=NA)
  for (j in 1:length(a$graphName)) {
    abline(a=a$bias[j], b=a$sensitivity[j],col=rainbow(31,end=.8)[j])
  }
  abline(a = mean(a$bias), b = mean(a$sensitivity), lwd=3)
}

subjs <- unique(saveEm$subj)

for (i in subjs) {
  plot(c(-1.5,1.5),c(1,4),col="white",main=i, xlab="Depicted Effect Size", ylab="Estimated Effect Size",
       xaxt="n", yaxt="n", bty="l")
  axis(side=1, at=c(-1, 0, 1), labels = c("Small", "Medium", "Big"))
  axis(side = 2, at=seq(1,4), labels = c("Null", "Small", "Medium", "Big"))
    
  #select styles seen by this participant
  currStyles <- as.character(unique(saveEm$graphName[which(saveEm$subj == i)]))  
  for (j in currStyles) {
    a <- which(saveEm$subj == i & saveEm$graphName == j)
        #for visual inspection of the data
    abline(a = saveEm$bias[a], b = saveEm$sensitivity[a], col=rainbow(26,end=.8)[as.integer(saveEm$graphName[a])], lwd=3)
    aa <- aggregate(resp ~ corrCentered, dt[which(dt$Subject == i & dt$graphName == j),], mean)
    points(aa$corrCentered, aa$resp, col=rainbow(26,end=.8)[as.integer(saveEm$graphName[a])], pch=19, cex=3)
} #end for j currStyles
} #end for i subjs

rm(a,aa)
print("all done")

## [1] "all done"

So You Think You Can Graph - Season 1

Jessica K. Witt & Gary H. McClelland

10/9/2019