• In a land before ggplot2
• Auditioning graphics
• ggplot2 for rapid prototyping
• ggplot2 in the print graphics workflow
• We produce a *lot* of charts at the FT
• Approximately 50 data visualisations per day
• Mostly lines and bars
• I have nothing against lines and bars, but...
1. Ubiquity: if your chart becomes 'just another bar chart popping up on Twitter', it's useless as an attention-grabbing device
2. Discourages creativity: if your charts are perceived by your own journalists simply as 'jobs' to be done, they will be misused. The worst case scenario is that they become seen simply as 'things to break up text', interchangeable with quotes, photographs
"[Data visualisation] is not meant just to be seen but to be read, like written text.” - Alberto Cairo
Bars and lines are NIBs, 'more to follow' stories. These have their place, but you need to add some GaaS (Graphic as a Story)
...Isn't this talk meant to be about ggplot2?
We need to produce high quality, striking data visualisations
Until recently, such narrative graphics would be the job of an information designer, most likely trained in tools like Adobe Illustrator. The output would be beautiful, but the journey could be long and winding
To reach the same end point more quickly, we need to be able to audition different visual treatments as quickly as possible
Enter, stage right...
Makes it easy to quickly experiment with:
• Different ways of encoding data in shape, space and colour
• Small multiples vs multiple series
• Display ratios
• Layering visual elements
An example:
Like all good projects, inspired by a tweet
The top-end quality, strength-in-depth and competitive balance of European football leagues have changed markedly over the last 15 years. We need to show:
1. Which countries have the best teams?
2. Which leagues are the most/least balanced?
3. What is the 'quality gap' between a given pair of leagues?
4. How does the nth best team in league x today compare to its predecessors?
5. How have all of the above changed over time?
1. Which countries have the best teams? linear comparison
2. Which leagues are the most/least balanced? distribution of parts within the whole
3. What is the 'quality gap' between a given pair of leagues? difference in area between two curves
4. How does the nth best team in league x today compare to its predecessors? value in context
5. How have all of the above changed over time? evolution of an already detailed pattern over time
1. Which countries have the best teams? linear comparison >> points
2. Which leagues are the most/least balanced? distribution of parts within the whole >> points on a line
3. What is the 'quality gap' between a given pair of leagues? difference in area between two curves >> ribbon*
4. How does the nth best team in league x today compare to its predecessors? value in context >> shaded range
5. How have all of the above changed over time? evolution of an already detailed pattern over time >> faceted plots
#1
ggplot(oneSeason,
aes(x=1,score,col=country)) +
geom_point(size=3) +
theme_bw() +
theme(legend.position="right", legend.direction="vertical") +
xlab('')
(with room to breathe)
#2
ggplot(oneSeason
,aes(rank,score,col=country)) +
geom_point(size=3) +
theme_bw() +
theme(legend.position="top")
(with room to breathe)
#3
ggplot(oneSeason
,aes(rank,score,col=country)) +
geom_path() +
geom_point(size=3) +
theme_bw() +
theme(legend.position="top")
#4 step 1: preparing the data
engEsp <- oneSeason %>%
filter(country %in% c('ENG','ESP')) %>%
dplyr::select(score, rank, country) %>%
spread(., country, score) %>%
rowwise() %>%
mutate(
gap = ESP-ENG,
max = max(ESP,ENG),
min = min(ESP,ENG)
)
#4 step 2: the plot
ggplot(engEsp,
aes(rank,score,fill=gap>0)) +
geom_rect(aes(xmin = rank-0.5, xmax=rank+0.5, ymin=min, ymax=max), alpha=0.5) +
theme_bw() +
theme(legend.position="none")
#5
ggplot(oneSeason %>%
filter(country == 'ENG'),
aes(rank,score,col=country)) +
geom_ribbon(aes(ymin=atw, ymax=atb,
fill=country), col='transparent', alpha=0.5) +
geom_path(alpha=1, size=1.5) +
geom_point(size=3) +
theme_bw() +
theme(legend.position="none")
#6
ggplot(oneSeason %>%
filter(country == 'ENG'),
aes(rank,score,col=country, fill=country)) +
geom_rect(aes(xmin=rank-0.5, xmax=rank+0.5, ymin = atw, ymax = atb), col='transparent', alpha=0.5) +
geom_point(size=5) +
theme_bw() +
theme(legend.position="none")
#7
ggplot(oneSeason %>%
filter(country == 'ENG'),
aes(rank,score,col=country, fill=country)) +
geom_rect(aes(xmin=rank-0.45, xmax=rank+0.45, ymin=atw, ymax=atb),col='transparent', alpha=0.5) +
geom_point(size=5) +
theme_bw() +
theme(legend.position="none")
First, the top teams and balance within each league
#8
ggplot(allSeasons,
aes(rank,score,col=country)) +
geom_path() +
geom_point(size=1) +
facet_wrap(~year, ncol=4) +
theme_bw() +
theme(legend.position="top")
First, the top teams and balance within each league
Quality gap between a pair of leagues
#9 step 1: preparing the data
engEspAllYears <- allSeasons %>%
filter(country %in% c('ENG','ESP')) %>%
dplyr::select(score,year,rank,country) %>%
spread(., country, score) %>%
rowwise() %>%
mutate(
gap = ESP-ENG,
max = max(ESP,ENG),
min = min(ESP,ENG)
)
Quality gap between a pair of leagues
#9 step 2: the plot
ggplot(engEspAllYears,
aes(rank,score,fill=gap>0)) +
geom_rect(aes(xmin = rank-0.5, xmax=rank+0.5, ymin=min, ymax=max), alpha=0.5) +
facet_wrap(~year, ncol=4) +
theme_bw() +
theme(legend.position="none")
Quality gap between a pair of leagues
Individual teams in national, historical context
#10
ggplot(allSeasons %>% filter(country=='ENG'),
aes(rank,score,col=country,fill=country))+
geom_rect(aes(xmin=rank-0.45, xmax=rank+0.45, ymin=atw, ymax=atb), col='transparent', alpha=0.5) +
geom_path(alpha=1) +
geom_point(size=1) +
facet_wrap(~year, ncol=4) +
theme_bw() +
theme(legend.position="none")
Individual teams in national, historical context
Dataset and all plot-generating R scripts are here
128 charts in total 😎
ggplot2 is awesome because:
• It takes minimal time and effort to audition multiple different ideas for a graphic, and to iterate on them
• It gets people thinking in the continuous visual encoding space, rather than the discrete chart-type space
• ggplot2 %>%
printNewspaper (!!!)
Tweet @jburnmurdoch or email john.burn-murdoch@ft.com
These slides are at bit.ly/ggplot-dataviz