boB Rudis
5 years ago
22 changed files with 454 additions and 121 deletions
@ -0,0 +1,15 @@ |
|||||
|
#' @md |
||||
|
#' @title June 2019 U.S. Democratic Debate Candidate/Topic Times |
||||
|
#' @description The New York Times and other media outlets kept track of the time each |
||||
|
#' candidate spent talking including the timestamp of the start of the blathering |
||||
|
#' and the topic up for debate. This dataset only includes candidates and |
||||
|
#' topic times. The complete datasets (See References) also include moderator |
||||
|
#' metadata and opening/closing statement records. |
||||
|
#' @format data frame with columns: `elapsed` (dbl), `timestamp` (drtn), `speaker` (chr), `topic` (chr) |
||||
|
#' @docType data |
||||
|
#' @keywords datasets |
||||
|
#' @name debates2019 |
||||
|
#' @references <https://www.nytimes.com/interactive/2019/admin/100000006581096.embedded.html> |
||||
|
#' @references <https://www.nytimes.com/interactive/2019/admin/100000006584572.embedded.html> |
||||
|
#' @usage data("debates2019") |
||||
|
NULL |
@ -0,0 +1,4 @@ |
|||||
|
`%l0%` <- function(x, y) if (length(x) == 0) y else x |
||||
|
`%||%` <- function(x, y) if (is.null(x)) y else x |
||||
|
`%@%` <- function(x, name) attr(x, name, exact = TRUE) |
||||
|
`%nin%` <- function(x, table) match(x, table, nomatch = 0) == 0 |
Before Width: | Height: | Size: 270 KiB After Width: | Height: | Size: 250 KiB |
@ -0,0 +1,14 @@ |
|||||
|
## code to prepare `debates2019` dataset goes here |
||||
|
|
||||
|
read_csv( |
||||
|
file = "https://rud.is/data/2019-dem-debates.csv.gz", |
||||
|
col_types = cols( |
||||
|
elapsed = col_double(), |
||||
|
timestamp = col_time(format = ""), |
||||
|
speaker = col_character(), |
||||
|
topic = col_character() |
||||
|
) |
||||
|
) -> debates2019 |
||||
|
|
||||
|
|
||||
|
usethis::use_data(debates2019, overwrite = TRUE) |
Binary file not shown.
@ -0,0 +1,23 @@ |
|||||
|
% Generated by roxygen2: do not edit by hand |
||||
|
% Please edit documentation in R/datasets.R |
||||
|
\docType{data} |
||||
|
\name{debates2019} |
||||
|
\alias{debates2019} |
||||
|
\title{June 2019 U.S. Democratic Debate Candidate/Topic Times} |
||||
|
\format{data frame with columns: \code{elapsed} (dbl), \code{timestamp} (drtn), \code{speaker} (chr), \code{topic} (chr)} |
||||
|
\usage{ |
||||
|
data("debates2019") |
||||
|
} |
||||
|
\description{ |
||||
|
The New York Times and other media outlets kept track of the time each |
||||
|
candidate spent talking including the timestamp of the start of the blathering |
||||
|
and the topic up for debate. This dataset only includes candidates and |
||||
|
topic times. The complete datasets (See References) also include moderator |
||||
|
metadata and opening/closing statement records. |
||||
|
} |
||||
|
\references{ |
||||
|
\url{https://www.nytimes.com/interactive/2019/admin/100000006581096.embedded.html} |
||||
|
|
||||
|
\url{https://www.nytimes.com/interactive/2019/admin/100000006584572.embedded.html} |
||||
|
} |
||||
|
\keyword{datasets} |
After Width: | Height: | Size: 257 KiB |
After Width: | Height: | Size: 107 KiB |
@ -0,0 +1,2 @@ |
|||||
|
*.html |
||||
|
*.R |
After Width: | Height: | Size: 102 KiB |
@ -0,0 +1,133 @@ |
|||||
|
--- |
||||
|
title: "Using {ggchicklet}" |
||||
|
output: |
||||
|
rmarkdown::html_vignette: |
||||
|
df_print: kable |
||||
|
vignette: > |
||||
|
%\VignetteIndexEntry{Using {ggchicklet}} |
||||
|
%\VignetteEncoding{UTF-8} |
||||
|
%\VignetteEngine{knitr::rmarkdown} |
||||
|
editor_options: |
||||
|
chunk_output_type: console |
||||
|
--- |
||||
|
|
||||
|
```{r, include = FALSE} |
||||
|
knitr::opts_chunk$set( |
||||
|
message = FALSE, |
||||
|
warning = FALSE, |
||||
|
collapse = TRUE, |
||||
|
comment = "## " |
||||
|
) |
||||
|
``` |
||||
|
|
||||
|
The New York Times reporters [kept track of the candiate speaking time spent per-topic](https://www.nytimes.com/interactive/2019/admin/100000006581096.embedded.html) for the June 2019 initial U.S. Democratic debates. They used a segmented, rounded-corner bar chart --- ordered by timestamp --- that I've dubbed a "chicklet" chart since they look like the fairly well-known gum/candy. This is the image from one of them: |
||||
|
|
||||
|
![](nytimes.png) |
||||
|
|
||||
|
The rounded corners aesthetic looked great and said feature begat the creation of {ggchicklet}. |
||||
|
|
||||
|
Let's load up the packages we'll need: |
||||
|
|
||||
|
```{r setup} |
||||
|
library(hrbrthemes) # my preferred theme |
||||
|
library(ggchicklet) # this pacakge! |
||||
|
library(dplyr) # we need to do a bit of data wrangling |
||||
|
library(forcats) # so we include {dplyr} and {forcats} |
||||
|
library(ggplot2) # duh! |
||||
|
``` |
||||
|
|
||||
|
If you peek at the source code for the New York Times javascript-created charts you'll see |
||||
|
that all the data is right there. Rather than make you figure out how to wrangle it, a majority |
||||
|
subset has been included in the package and can be accessed via: |
||||
|
|
||||
|
```{r data} |
||||
|
data("debates2019") |
||||
|
|
||||
|
head(debates2019, 10) |
||||
|
``` |
||||
|
|
||||
|
The `elapsed` column contains how long the candidate spoke and `timestamp` is the time they started speaking. We'll use both to control the look and feel of the {ggchicklet} chart. |
||||
|
|
||||
|
There are also candidates: |
||||
|
|
||||
|
```{r data-ex-01} |
||||
|
distinct(debates2019, speaker) %>% |
||||
|
arrange(speaker) %>% |
||||
|
print(n=nrow(.)) |
||||
|
``` |
||||
|
|
||||
|
and the topics debates: |
||||
|
|
||||
|
```{r data-ex-02} |
||||
|
distinct(debates2019, topic) %>% |
||||
|
arrange(topic) %>% |
||||
|
print(n=nrow(.)) |
||||
|
``` |
||||
|
|
||||
|
First, we'll use `forcats::fct_reorder()` to reorder the `speaker`s by total speaking time to |
||||
|
make it easier to compare the differences in total time spoken between candidate. |
||||
|
|
||||
|
Then, we'll use `forcats::fct_other()` to limit the number of `topic`s to only those highlighted |
||||
|
by the New York Times (and to show how to do that). |
||||
|
|
||||
|
We need to use `group = timestamp` to ensure the segments are ordered by time (vs category/topic) |
||||
|
and `fill = topic` to color them appropriately. Note that just using `fill = topic` would group |
||||
|
the segments by topic. |
||||
|
|
||||
|
```{r chicklet, fig.width=600/72, fig.height=600/72} |
||||
|
debates2019 %>% |
||||
|
mutate(speaker = fct_reorder(speaker, elapsed, sum, .desc=FALSE)) %>% |
||||
|
mutate(topic = fct_other( |
||||
|
topic, |
||||
|
c("Immigration", "Economy", "Climate Change", "Gun Control", "Healthcare", "Foreign Policy")) |
||||
|
) %>% |
||||
|
ggplot(aes(speaker, elapsed, group = timestamp, fill = topic)) + |
||||
|
geom_chicklet(width = 0.75) + |
||||
|
scale_y_continuous( |
||||
|
expand = c(0, 0.0625), |
||||
|
position = "right", |
||||
|
breaks = seq(0, 14, 2), |
||||
|
labels = c(0, sprintf("%d min.", seq(2, 14, 2))) |
||||
|
) + |
||||
|
scale_fill_manual( |
||||
|
name = NULL, |
||||
|
values = c( # NYTimes colors |
||||
|
"Immigration" = "#ae4544", |
||||
|
"Economy" = "#d8cb98", |
||||
|
"Climate Change" = "#a4ad6f", |
||||
|
"Gun Control" = "#cc7c3a", |
||||
|
"Healthcare" = "#436f82", |
||||
|
"Foreign Policy" = "#7c5981", |
||||
|
"Other" = "#cccccc" |
||||
|
), |
||||
|
breaks = setdiff(unique(debates2019$topic), "Other") |
||||
|
) + |
||||
|
guides( |
||||
|
fill = guide_legend(nrow = 1) |
||||
|
) + |
||||
|
coord_flip() + |
||||
|
labs( |
||||
|
x = NULL, y = NULL, fill = NULL, |
||||
|
title = "How Long Each Candidate Spoke", |
||||
|
subtitle = "Nights 1 & 2 of the June 2019 Democratic Debates", |
||||
|
caption = "Each bar segment represents the length of a candidate’s response to a question.\n\nOriginals <https://www.nytimes.com/interactive/2019/admin/100000006581096.embedded.html?>\n<https://www.nytimes.com/interactive/2019/admin/100000006584572.embedded.html?>\nby @nytimes Weiyi Cai, Jason Kao, Jasmine C. Lee, Alicia Parlapiano and Jugal K. Patel\n\n#rstats reproduction by @hrbrmstr" |
||||
|
) + |
||||
|
theme_ipsum_rc(grid="X") + |
||||
|
theme(axis.text.x = element_text(color = "gray60", size = 10)) + |
||||
|
theme(legend.position = "top") |
||||
|
``` |
||||
|
|
||||
|
You can use `ggplot2::geom_col()` to create a similar chart without the rounded rectangles but `geom_chicklet()` sets some useful defaults: |
||||
|
|
||||
|
- "`white`" stroke for the chicklet/segment (`geom_col()` has `NA` for the stroke) |
||||
|
- automatic reversing of the `group` order (`geom_col()` uses the standard sort order) |
||||
|
- radius setting of `unit(3, "px")` |
||||
|
- chicklet legend geom |
||||
|
|
||||
|
You will need to modify `colour`/`color` to use something besides "`white`" if you are using |
||||
|
a non-white background and do not want a white stroke. Larger width chicklet segments may |
||||
|
look better with a larger radius. |
||||
|
|
||||
|
Note also that the `group`ing column does not need to be a time-like object; any type of |
||||
|
ordered column will work to set the display order. |
||||
|
|
Loading…
Reference in new issue