A Readability Analysis of Campaign Speeches from the 2016 US Presidential Campaign
📝 Abstract
Readability is defined as the reading level of the speech from grade 1 to grade 12. It results from the use of the REAP readability analysis (vocabulary - Collins-Thompson and Callan, 2004; syntax - Heilman et al ,2006, 2007), which use the lexical contents and grammatical structure of the sentences in a document to predict the reading level. After analysis, results were grouped into the average readability of each candidate, the evolution of the candidate’s speeches’ readability over time and the standard deviation, or how much each candidate varied their speech from one venue to another. For comparison, one speech from four past presidents and the Gettysburg Address were also analyzed.
💡 Analysis
Readability is defined as the reading level of the speech from grade 1 to grade 12. It results from the use of the REAP readability analysis (vocabulary - Collins-Thompson and Callan, 2004; syntax - Heilman et al ,2006, 2007), which use the lexical contents and grammatical structure of the sentences in a document to predict the reading level. After analysis, results were grouped into the average readability of each candidate, the evolution of the candidate’s speeches’ readability over time and the standard deviation, or how much each candidate varied their speech from one venue to another. For comparison, one speech from four past presidents and the Gettysburg Address were also analyzed.
📄 Content
A Readability Analysis of Campaign Speeches from the 2016 US Presidential Campaign
Elliot Schumacher, Maxine Eskenazi CMU-LTI-16-001 March 15, 2016.
Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave., Pittsburgh, PA 15213 www.lti.cs.cmu.edu
© 2016, Elliot Schumacher, Maxine Eskenazi
Introduction
The goal of this report is to assess the readability of the campaign speeches of five presidential
candidates in the 2016 US presidential race and to examine their evolution over time and
according to the type of speech. Readability can be defined here as the reading level, from grade
1 to grade 12, of a document. It is determined by looking at the lexical contents and the
grammatical structure of the sentences in a document. It is based on the observation that some
words (and grammatical structures) appear with greater frequency at one grade level than
another. For example, we would expect that we could see the word “win” fairly frequently in
third grade documents while the word “successful” would be more frequent in, say, seventh
grade documents. We would not see dependent clauses very often at the second grade level
whereas they would be quite frequent at the seventh grade level.
For this analysis, we use a readability model, REAP, that was developed for vocabulary at by
Collins-Thompson and Callan (2004) and further developed for grammar by Heilman et al (2006,
2007). It is based on a database of sets of texts, one set for each grade level. Most of the texts
come from student-written texts that teachers have published on their websites, noting the grade
that each represents. The lexical reading difficulty measure is based on the smoothed individual
probabilities of words occurring at each reading level. For example, the word, determine, was
predictive of Grade 11 text, and was more predictive of high school-level text than lower-level
text. The grammar reading difficulty measure is based on the one- to three-level depth parse trees
of the sentences. This means that the measure is based on typical grammatical constructions in
sentences of each grade level.
Background
Early readability measures made assumptions about what a difficult text was. The Dale-Chall
Readability Formula (Dale and Chall, 1948) defined the readability level as a linear function of
the average number of words in a sentence and the percentage of rare words in the document.
Flesch-Kincaid (Kincaid et al 1975) was based on the average sentence length and the average
number of syllables per word.
More recently, the Lexile Framework (version 1.0, Stenner, 1996) uses word frequency estimates
as a measure of lexical difficulty and sentence length as a grammatical feature. Other approaches
characterized text in more holistic terms. Coh-Metrix (Graesser et al 2011) measures text
cohesiveness, accounting for both the reading difficulty of the text and other lexical and syntactic
measures as well as a measure of prior knowledge needed for comprehension and the genre of
the text. These factors account for the difficulty of constructing the mental representation of the
text.
All of the measures, REAP included, were originally developed to help teachers choose
appropriate documents for their students in reading classes. The campaign speeches, while most
were written in advance, are destined to be spoken. Written speech is very different from spoken
speech. When we speak we usually use less structured language with shorter sentences. So while
measures such as Flesch-Kincaid are appropriate for written speech, they are not really reflective
of the structure of spoken language. REAP has been trained on written texts, as described above.
But it concentrates on how often words and grammatical constructs are used at each grade level
and less on the length of the sentence and of each word. So REAP corresponds better to an
analysis of spoken language than its predecessor.
Methodology
A database was collected containing documents from each of the five current presidential candidates: Ted Cruz (5), Hillary Clinton (7), Marco Rubio (6), Bernie Sanders (6), Donald Trump (8) (see References and Appendix). The documents are transcriptions of their campaign speeches. They range from the declaration of candidacy speech to campaign trail speeches to victory speeches to defeat speeches. The numbers show it was sometimes difficult to find transcriptions rather than videos. In the future an Automatic Speech Recognition system (ASR) could be used to obtain text from the videos. Given that this process would produce some error, it was not used for the present study. For comparison we also analyzed the readability of Lincoln’s Gettysburg Address (Bliss version) and a speech from Barack Obama, George W. Bush, Bill Clinton and Ronald Reagan (the latter two at the same venue in different years). Two levels of analysis were
This content is AI-processed based on ArXiv data.