From Language to Data

Land Acknowledgment

This class at the University of Virginia respectfully acknowledges the custodians of the land we are on today, the Monacan Nation, and pay our respect to their elders past, present and emerging. To learn more about UVA’s past and present connections to the Monacan Nation, I recommend this website from the Office for Equal Opportunity and Civil Rights. To learn more about the Monacan people, read their history.

“Empirical and Scientific Engagement” description

This course, one of four “Engagements” courses you’ll take as a first-year student, focuses on what we can learn from examining the ways in which facts and evidence are identified, collected, analyzed, and interpreted. Both within and beyond the university, you will encounter claims about the natural and social worlds and be confronted with situations that require you to evaluate and make decisions based on evidence. Empirical methods are a crucial component to addressing and answering such a broad range of essential questions. We will explore how questions and hypotheses are formulated and evaluated.

“From Language to Data” description

Targeted advertisements make people think that their phones are listening to them. Getting an ad for something you were just talking or thinking about feels uncanny. But your phone does not need to listen to you to effectively target you. Targeted advertising fuels the internet. And the successful transformation of language into data fuels targeted advertising.

In this course, we study how and why people transform language into data. What do we gain? What do we lose? We consider several types of texts—including speech, digital communications, and published texts—and many ways they get turned into data.

This course not only engages questions from academic fields like digital humanities and computational linguistics, but also practical questions of everyday life in the twenty-first century, from communicating with our friends to researching new topics.

Goals

By the end of this course, students will be able to…

Syllabus

The official course syllabus is the version of this document currently available on our course site.

Calendar

Date Mtg. Guiding question Readings Assignment due
Mar 15 1 Why transform language into data? - -
Mar 20 2 Why not transform language into data?

Chun1

Zuboff2

Haugen (22:00-33:50)3

My data
Mar 22 3 What is (big) data?

Drucker4

D’Ignazio and Klein5

Borges6

-
Mar 27 4 What is metadata?

Pomerantz7

Krause8

Irani9

Perrigo10

-
Mar 29 5 CLASS CANCELLED - -
Apr 3 6 What is a corpus?

Algee-Hewitt and McGurl11

Risam12

-
Apr 5 7 How have people transformed language into data?

Holmes and Kardos13

Mendenhall14

Weizenbaum15

-
Apr 10 8 What can counting types and tokens teach us?

Archer16

Le et al.17

Bailey18

My corpus
Apr 12 9 What can counting (with) metadata teach us?

Porter19

Johnson20

Anderson and Daniels21

-
Apr 17 10 How should we think about change over time?

Heuser and Le Khac22

Underwood et al.23

My proposal
Apr 19 11 What is a topic model?

Blei24

Lucy et al.25

-
Apr 24 12 What are word embeddings?

Smith26

Gavin27

-
Apr 26 13 Final project preparation in class - -
May 1 14 Final project presentations - Final presentations

Health

If you are sick and need to miss class, email me. Students who miss class due to illness will receive excused absences and alternative assignments.

Grading

This class uses contract grading. Contract grading aspires to be more transparent and equitable than traditional grading. It should motivate you to focus on the course material instead of worrying about whether you have a 92.4% or a 92.6%.

Here is how it works. You will tell me what grade you intend to earn. Then, you will sign a contract stating that you will satisfactorily complete a specific amount of work that will earn you that grade. You can contract for an A, B, or C. If you do not satisfactorily complete enough work to fulfill the terms of your contract, you may still receive a D or F.

Read the contract for a full explanation.

Email

I reply to emails within two business days. If it has been more than two business days, please write me again.

Office Hours

I will schedule office hours in person or virtually at a mutually convenient time.

Engagements Experience

As a student at the University of Virginia, you are part of an exciting and robust intellectual community. The goal of the Engagement Experience (EE) is to make you feel welcomed into this community, where you’ll discover new ideas and meet fascinating people. The EE is part of the Engagements program, existing alongside the Engagement course you are currently taking.

The EE must be satisfactorily completed for every contract. There are three components this spring:

  1. Attend the Spring '23 Engagements Experience Lecture

  2. Complete an Academic Advising Activity

  3. Complete an Engaging Grounds experience

Component 1: Engagements Experience Lecture

Attend—either virtually, or in person—ONE Engagements lecture each quarter. This spring, the second event is Eze Amos’ talk about the series of photographs he took on August 11-12, 2017 of the Unite the Right and counter-protests. The talk will be held on April 11th at 7 pm in the Paramount Theater in downtown Charlottesville. For those of you who are unable to attend in person, the talk will be recorded and you must watch that recording.

If you attend in person, you MUST plan to stay the entire time. This is a chance to practice scholarly expectations, which include hearing all of a talk and the Q&A.

Component 2: Academic Advising Activity

As you approach the culmination of your first year at UVA, you will have made significant progress towards the completion of the Engagements Pathway requirements. You will have taken a course in each Engagements pillar and made headway into completing your Literacy and Disciplines Requirements. The EE will help keep you on track by asking you to complete one academic advising activity during each quarter. These activities will help prepare you for coursework in the following semester.

Meet with your academic advisor

In advance of registering for classes for the Fall 2023 semester, you must attend your scheduled faculty advising meeting. If you have not received a message from your advisor about scheduling these meetings by March 27th, reach out to them to schedule something.

Component 3: Engaging Grounds

Each quarter you will participate in one Engaging Grounds experience. You may choose which experience to complete from the curated list of Engaging Grounds experiences below. Note that many options may be completed at your convenience, but Scholar Speaks (on-Grounds lectures or presentations) have a set date and time. See the calendar on the Engagements Experience website for Scholar Speak Opportunities.

The list will be updated regularly throughout the semester. While you have many options for completing this requirement, don't wait until the last second as these opportunities are finite.

Engaging Grounds

The College Fellows have curated the following list of Engaging Grounds experiences as a representation of the myriad of wonderful opportunities you have as students at UVA. While we hope that you will participate in each before graduation, you should select and complete one for each Engagements quarter.

Submitting the EE

Attest that you have completed all elements of the Engagements Experience by writing “On my honor, I have completed all components of the EE” under the Canvas assignment by May 2nd.

Notice of Non-Discrimination

The University of Virginia does not discriminate on the basis of age, color, disability, gender identity or expression, marital status, military status (which includes active duty service members, reserve service members, and dependents), national or ethnic origin, political affiliation, pregnancy (including childbirth and related conditions), race, religion, sex, sexual orientation, veteran status, and family medical or genetic information, in its programs and activities as required by Title IX of the Education Amendments of 1972, Americans with Disabilities Act of 1990, as amended, Section 504 of the Rehabilitation Act of 1973, Titles VI and VII of the Civil Rights Act of 1964, Age Discrimination Act of 1975, Governor’s Executive Order Number One (2018), and other applicable statutes and University policies. UVA prohibits sexual and gender-based harassment, including sexual assault, and other forms of interpersonal violence.

Reporting Discriminatory Conduct

Per UVA policy HRM-040, I am a Responsible Employee. If you mention prohibited conduct to me, I am required to report it. Prohibited conduct includes, but is not limited to, sexual and gender-based harassment and violence, bias and discrimination/harassment, hazing, interference with speech rights, threats or acts of violence. If you wish to report prohibited conduct online, use UVA’s Just Report It.

Accommodations

In accordance with the ADA, as amended, and Section 504 of the Rehabilitation Act, the University of Virginia offers an array of individualized accommodations and services to qualified students with disabilities. Accommodations are determined using an interactive process between the student and Student Disability Access Center staff. Contact the Student Disability Access Center with questions.

Religious Accommodations

Students who wish to request academic accommodation for a religious observance should submit their request to me by email as far in advance as possible. If you have questions or concerns about your request, you can contact the University’s Office for Equal Opportunity and Civil Rights at or 434-924-3200. Accommodations do not relieve you of the responsibility for completion of any part of the coursework you miss as the result of a religious observance. More information about religious accommodations is available here, and frequently asked questions are here.

Campus Resources

Acknowledgments

This syllabus has benefitted from syllabi by Mark Algee-Hewitt, David Bamman, Andrew Goldstone, Ryan Heuser, Lauren Klein, Laura McGrath, David Mimno, Dan Sinykin, Ted Underwood, and Melanie Walsh.


  1. Wendy Hui Kyong Chun, Discriminating Data: Correlation, Neighborhoods, and the New Politics of Recognition (Cambridge, Massachusetts: The MIT Press, 2021).↩︎

  2. Shoshana Zuboff, “You Are the Object of a Secret Extraction Operation,” The New York Times, November 12, 2021, sec. Opinion, https://www.nytimes.com/2021/11/12/opinion/facebook-privacy.html.↩︎

  3. The Facebook Files: What Next? Panel 1: The Activists, 2021, https://www.youtube.com/watch?v=YUubanGIZc0.↩︎

  4. Johanna Drucker, Visualization and Interpretation: Humanistic Approaches to Display (Cambridge, Massachusetts: The MIT Press, 2020), 43–57.↩︎

  5. Lauren F. Klein and Catherine D’Ignazio, “Numbers Don’t Speak For Themselves,” in Data Feminism (Cambridge, Massachusetts: The MIT Press, 2020).↩︎

  6. Jorge Luis Borges, “The Library of Babel,” in Collected Fictions, trans. Andrew Hurley, Penguin Classics Deluxe Edition (New York, NY: Penguin Books, 1998).↩︎

  7. Jeffrey Pomerantz, Metadata, The MIT Press Essential Knowledge Series (Cambridge, Massachusetts ; London, England: The MIT Press, 2015).↩︎

  8. Heather Krause, “Data Biographies: Getting to Know Your Data,” Global Investigative Journalism Network, March 27, 2017, https://gijn.org/2017/03/27/data-biographies-getting-to-know-your-data/.↩︎

  9. Lilly Irani, “Justice for Data Janitors,” in Think in Public: A Public Books Reader, ed. Sharon Marcus and Caitlin Zaloom, Public Books Series (New York: Columbia University Press, 2019), 23–39.↩︎

  10. Billy Perrigo et al., “Inside Facebook’s African Sweat Shop,” TIME Magazine 199, no. 7/8 (February 28, 2022): 32–39.↩︎

  11. Mark Algee-Hewitt and Mark McGurl, “Between Canon and Corpus: Six Perspectives on 20th-Century Novels,” Stanford Literary Lab Pamphlets, no. 8 (January 2015), https://litlab.stanford.edu/LiteraryLabPamphlet8.pdf.↩︎

  12. Roopika Risam, New Digital Worlds: Postcolonial Digital Humanities in Theory, Praxis, and Pedagogy (Northwestern University Press, 2018).↩︎

  13. David I. Holmes and Judit Kardos, “Who Was the Author? An Introduction to Stylometry,” Chance 16, no. 2 (2003): 5–8.↩︎

  14. Thomas Corwin Mendenhall, “The Characteristic Curves of Composition,” Science 9, no. 214 (1887): 237–49.↩︎

  15. Joseph Weizenbaum, “ELIZA—a Computer Program for the Study of Natural Language Communication between Man and Machine,” Communications of the ACM 9, no. 1 (1966): 36–45.↩︎

  16. Dawn Archer, What’s in a Word-List? Investigating Word Frequency and Keyword Extraction (Ashgate, 2012).↩︎

  17. Xuan Le et al., “Longitudinal Detection of Dementia through Lexical and Syntactic Changes in Writing: A Case Study of Three British Novelists,” Literary and Linguistic Computing 26, no. 4 (2011): 435–61.↩︎

  18. Moya Bailey, Misogynoir Transformed: Black Women’s Digital Resistance, Intersections : Transdisciplinary Perspectives on Genders and Sexualities (New York: New York University Press, 2021).↩︎

  19. J. D. Porter, “Popularity/Prestige,” Pamphlets of the Stanford Literary Lab 17 (September 2018), https://litlab.stanford.edu/LiteraryLabPamphlet17.pdf.↩︎

  20. Timothy R. Johnson, Ryan C. Black, and Justin Wedeking, “Pardon the Interruption: An Empirical Analysis of Supreme Court Justices’ Behavior During Oral Arguments,” Loy. L. Rev. 55 (2009): 331.↩︎

  21. Hanah Anderson and Matt Daniels, “Film Dialogue from 2,000 Screenplays, Broken Down by Gender and Age,” The Pudding, accessed February 4, 2022, https://pudding.cool/2017/03/film-dialogue/index.html.↩︎

  22. Ryan Heuser and Long Le-Khac, “A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method,” Pamphlets of the Stanford Literary Lab, no. 4 (May 2012), https://litlab.stanford.edu/LiteraryLabPamphlet4.pdf.↩︎

  23. Ted Underwood, David Bamman, and Sabrina Lee, “The Transformation of Gender in English-Language Fiction,” 2018.↩︎

  24. David M. Blei, “Probabilistic Topic Models,” Communications of the ACM 55, no. 4 (April 1, 2012): 77, https://doi.org/10.1145/2133806.2133826.↩︎

  25. Li Lucy et al., “Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks,” AERA Open 6, no. 3 (July 2020): 233285842094031, https://doi.org/10.1177/2332858420940312.↩︎

  26. Noah A. Smith, “Contextual Word Representations: Putting Words into Computers,” Communications of the ACM 63, no. 6 (May 21, 2020): 66–74, https://doi.org/10.1145/3347145.↩︎

  27. Michael Gavin, “Vector Semantics, William Empson, and the Study of Ambiguity,” Critical Inquiry 44, no. 4 (2018): 641–73.↩︎