“I Have The Best Words.” How Trump’s First SOTU Compares To All The Others.

Data and R code for the analysis supporting this Jan. 31, 2018 BuzzFeed News post analyzing the text of every State of the Union address, including the 2018 speech by President Donald Trump. Supporting files are in this GitHub repository.

Data

We gathered the text of every State of the Union address from the American Presidency Project at the University of California, Santa Barbara. These include speeches to joint sessions of Congress in presidents’ first years of office, which are not officially State of the Union addresses. Some of the addresses were written, some spoken. Where there was both a spoken address and a written message, we used the speech. In 1973, Richard Nixon sent an overview, plus multiple reports to Congress on various areas of policy. Here, we used his overview message.

Setting up

# load required packages
library(readr)
library(dplyr)
library(stringr)
library(lubridate)
library(tidyr)
library(tidytext)
library(quanteda)
library(ggplot2)
library(DT)

Load and process data

# load data
sou <- read_csv("data/sou.csv")
presidents <- read_csv("data/presidents.csv")

sou <- sou %>%
  left_join(presidents)

The Addresses Have Gotten Easier To Understand.

We counted the words, syllables, and sentences in each address, and used these numbers to calculate the Flesch-Kincaid reading grade level for each. This presents scores as US school grade levels, and is widely used in education and to assess the readibility of official documents.

# color palette for parties
party_pal <- c("#1482EE","#228B22","#E9967A","#686868","#FF3300","#EEC900")

# word, sentence, and syllable counts, plus reading scores
sou <- sou %>%
  mutate(year = year(date),
         syllables = nsyllable(text),
         sentences = nsentence(text),
         words = ntoken(text, remove_punct = TRUE),
         fk_ease = 206.835 - 1.105*(words/sentences) - 84.6*(syllables/words),
         fk_grade = 0.39*(words/sentences) + 11.8*(syllables/words) - 15.59) %>%
  arrange(date)

# reading score chart
ggplot(sou, aes(x=date, y=fk_grade, color=party, size=words)) +
  geom_point(alpha=0.5) +
  geom_smooth(se=F, color="black", method="lm", size=0.5, linetype = "dotted") +
  scale_size_area(max_size = 10, guide = FALSE) +
  scale_color_manual(values = party_pal, name = "", breaks = c("Democratic","Republican","Whig","Democratic-Republican","Federalist","None")) +
  scale_y_continuous(limits = c(4,27), breaks = c(5,10,15,20,25)) +
  theme_minimal(base_size = 24, base_family = "ProximaNova-Semibold") +
  xlab("") +
  ylab("Reading level") +
  guides(col = guide_legend(ncol = 2, override.aes = list(size = 4))) +
  theme(legend.position=c(0.3,0.22),
        legend.text = element_text(color="#909090", size = 18),
        panel.grid.minor = element_blank())