From Language to Data
Contents
Land Acknowledgment
This class at the University of Virginia respectfully acknowledges the custodians of the land we are on today, the Monacan Nation, and pay our respect to their elders past, present and emerging. To learn more about UVA’s past and present connections to the Monacan Nation, I recommend this website from the Office for Equal Opportunity and Civil Rights. To learn more about the Monacan people, read their history.
“Empirical and Scientific Engagement” description
This course, one of four “Engagements” courses you’ll take as a first-year student, focuses on what we can learn from examining the ways in which facts and evidence are identified, collected, analyzed, and interpreted. Both within and beyond the university, you will encounter claims about the natural and social worlds and be confronted with situations that require you to evaluate and make decisions based on evidence. Empirical methods are a crucial component to addressing and answering such a broad range of essential questions. We will explore how questions and hypotheses are formulated and evaluated.
“From Language to Data” description
Targeted advertisements make people think that their phones are listening to them. Getting an ad for something you were just talking or thinking about feels uncanny. But your phone does not need to listen to you to effectively target you. Targeted advertising fuels the internet. And the successful transformation of language into data fuels targeted advertising.
In this course, we study how and why people transform language into data. What do we gain? What do we lose? We consider several types of texts—including speech, digital communications, and published texts—and many ways they get turned into data.
This course not only engages questions from academic fields like digital humanities and computational linguistics, but also practical questions of everyday life in the twenty-first century, from communicating with our friends to researching new topics.
Goals
By the end of this course, students will be able to…
Identify and articulate insights from challenging readings
Describe empirical evidence that can be derived from English-language texts
Articulate strengths and weaknesses of empirical approaches to language
Evaluate what we can know with respect to the digital cultural record
Syllabus
The official course syllabus is the version of this document currently available on our course site.
Calendar
Date | Mtg. | Guiding question | Readings | Assignment due |
---|---|---|---|---|
Mar 15 | 1 | Why transform language into data? | - | - |
Mar 20 | 2 | Why not transform language into data? | Chun1 Zuboff2 Haugen (22:00-33:50)3 |
My data |
Mar 22 | 3 | What is (big) data? | Drucker4 D’Ignazio and Klein5 Borges6 |
- |
Mar 27 | 4 | What is metadata? | Pomerantz7 Krause8 Irani9 Perrigo10 |
- |
Mar 29 | 5 | CLASS CANCELLED | - | - |
Apr 3 | 6 | What is a corpus? | Algee-Hewitt and McGurl11 Risam12 |
- |
Apr 5 | 7 | How have people transformed language into data? | Holmes and Kardos13 Mendenhall14 Weizenbaum15 |
- |
Apr 10 | 8 | What can counting types and tokens teach us? | Archer16 Le et al.17 Bailey18 |
My corpus |
Apr 12 | 9 | What can counting (with) metadata teach us? | Porter19 Johnson20 Anderson and Daniels21 |
- |
Apr 17 | 10 | How should we think about change over time? | Heuser and Le Khac22 Underwood et al.23 |
My proposal |
Apr 19 | 11 | What is a topic model? | Blei24 Lucy et al.25 |
- |
Apr 24 | 12 | What are word embeddings? | Smith26 Gavin27 |
- |
Apr 26 | 13 | Final project preparation in class | - | - |
May 1 | 14 | Final project presentations | - | Final presentations |
Health
If you are sick and need to miss class, email me. Students who miss class due to illness will receive excused absences and alternative assignments.
Grading
This class uses contract grading. Contract grading aspires to be more transparent and equitable than traditional grading. It should motivate you to focus on the course material instead of worrying about whether you have a 92.4% or a 92.6%.
Here is how it works. You will tell me what grade you intend to earn. Then, you will sign a contract stating that you will satisfactorily complete a specific amount of work that will earn you that grade. You can contract for an A, B, or C. If you do not satisfactorily complete enough work to fulfill the terms of your contract, you may still receive a D or F.
Read the contract for a full explanation.
If you have obligations that require you to miss specific classes (e.g. away games), please let me know about them as far in advance as possible.
Students requesting an extension due to foreseeable circumstances must email me 48 hours prior to the deadline.
- For example, food poisoning is not foreseeable. A midterm for another class is foreseeable.
I reply to emails within two business days. If it has been more than two business days, please write me again.
Office Hours
I will schedule office hours in person or virtually at a mutually convenient time.
Engagements Experience
As a student at the University of Virginia, you are part of an exciting and robust intellectual community. The goal of the Engagement Experience (EE) is to make you feel welcomed into this community, where you’ll discover new ideas and meet fascinating people. The EE is part of the Engagements program, existing alongside the Engagement course you are currently taking.
The EE must be satisfactorily completed for every contract. There are three components this spring:
Attend the Spring '23 Engagements Experience Lecture
Complete an Academic Advising Activity
Complete an Engaging Grounds experience
Component 1: Engagements Experience Lecture
Attend—either virtually, or in person—ONE Engagements lecture each quarter. This spring, the second event is Eze Amos’ talk about the series of photographs he took on August 11-12, 2017 of the Unite the Right and counter-protests. The talk will be held on April 11th at 7 pm in the Paramount Theater in downtown Charlottesville. For those of you who are unable to attend in person, the talk will be recorded and you must watch that recording.
If you attend in person, you MUST plan to stay the entire time. This is a chance to practice scholarly expectations, which include hearing all of a talk and the Q&A.
Component 2: Academic Advising Activity
As you approach the culmination of your first year at UVA, you will have made significant progress towards the completion of the Engagements Pathway requirements. You will have taken a course in each Engagements pillar and made headway into completing your Literacy and Disciplines Requirements. The EE will help keep you on track by asking you to complete one academic advising activity during each quarter. These activities will help prepare you for coursework in the following semester.
Meet with your academic advisor
In advance of registering for classes for the Fall 2023 semester, you must attend your scheduled faculty advising meeting. If you have not received a message from your advisor about scheduling these meetings by March 27th, reach out to them to schedule something.
Component 3: Engaging Grounds
Each quarter you will participate in one Engaging Grounds experience. You may choose which experience to complete from the curated list of Engaging Grounds experiences below. Note that many options may be completed at your convenience, but Scholar Speaks (on-Grounds lectures or presentations) have a set date and time. See the calendar on the Engagements Experience website for Scholar Speak Opportunities.
The list will be updated regularly throughout the semester. While you have many options for completing this requirement, don't wait until the last second as these opportunities are finite.
Engaging Grounds
The College Fellows have curated the following list of Engaging Grounds experiences as a representation of the myriad of wonderful opportunities you have as students at UVA. While we hope that you will participate in each before graduation, you should select and complete one for each Engagements quarter.
Scholar Speak – At UVA, faculty (both at UVA and invited from other universities) offer seminars, colloquia, and other forms of disseminating their original research on a regular basis. Select and attend a lecture or presentation from the calendar of events on the Engagements Experience Website, https://gened.as.virginia.edu/engagements-experience.
Enslaved African Americans at UVA Self-Guided Tour - After downloading the "Walking Tours of Grounds" app, complete this tour.
A visit to McCormick Observatory where you participate in one of their “Public Night” programs or attend an event there hosted by the Department of Astronomy.
A visit to the piece of the Berlin Wall by Alderman Library after reading this UVA Today article. Reflect on what it means to transport this piece to another national context and a public university, in particular.
A visit to the former site of the George Rogers Clark monument. Read this UVA Today about the statue’s history and this call for its removal. What is this space like now?
A visit to special collections where you will tour one of the exhibits and view one item from their collection in the reading room.
A visit to the UVA gardens after exploring these visualizations about UVA at the time of its founding.
A visit to the Rotunda andone of the Rotunda tours.
A visit to the Fralin Museum where you view at least one of their special, rotating exhibits, andat least one of the exhibits that are part of the permanent collection.
Submitting the EE
Attest that you have completed all elements of the Engagements Experience by writing “On my honor, I have completed all components of the EE” under the Canvas assignment by May 2nd.
Notice of Non-Discrimination
The University of Virginia does not discriminate on the basis of age, color, disability, gender identity or expression, marital status, military status (which includes active duty service members, reserve service members, and dependents), national or ethnic origin, political affiliation, pregnancy (including childbirth and related conditions), race, religion, sex, sexual orientation, veteran status, and family medical or genetic information, in its programs and activities as required by Title IX of the Education Amendments of 1972, Americans with Disabilities Act of 1990, as amended, Section 504 of the Rehabilitation Act of 1973, Titles VI and VII of the Civil Rights Act of 1964, Age Discrimination Act of 1975, Governor’s Executive Order Number One (2018), and other applicable statutes and University policies. UVA prohibits sexual and gender-based harassment, including sexual assault, and other forms of interpersonal violence.
Reporting Discriminatory Conduct
Per UVA policy HRM-040, I am a Responsible Employee. If you mention prohibited conduct to me, I am required to report it. Prohibited conduct includes, but is not limited to, sexual and gender-based harassment and violence, bias and discrimination/harassment, hazing, interference with speech rights, threats or acts of violence. If you wish to report prohibited conduct online, use UVA’s Just Report It.
Accommodations
In accordance with the ADA, as amended, and Section 504 of the Rehabilitation Act, the University of Virginia offers an array of individualized accommodations and services to qualified students with disabilities. Accommodations are determined using an interactive process between the student and Student Disability Access Center staff. Contact the Student Disability Access Center with questions.
Religious Accommodations
Students who wish to request academic accommodation for a religious observance should submit their request to me by email as far in advance as possible. If you have questions or concerns about your request, you can contact the University’s Office for Equal Opportunity and Civil Rights at UVAEOCR@virginia.edu or 434-924-3200. Accommodations do not relieve you of the responsibility for completion of any part of the coursework you miss as the result of a religious observance. More information about religious accommodations is available here, and frequently asked questions are here.
Campus Resources
If you are struggling with mental health, UVA Counseling and Psychological Services can help. The TimelyCare app—available to UVA students who have paid comprehensive health fees with their tuition—also provides mental health support.
The UVA Writing Center will advise you on any stage of writing.
If you are struggling to manage money, a Peer Financial Counselor can help.
UVA helps students experiencing food insecurity get free nutritious food.
Charlottesville’s Sexual Assault Resource Agency responds to sexual and/or gender-based violence: 434-977-7273
Students who wish to improve their English may be interested in the Sundberg International Center’s programming.
Students who wish to improve study skills like time management may benefit from UVA’s resources for Academic Success.
Acknowledgments
This syllabus has benefitted from syllabi by Mark Algee-Hewitt, David Bamman, Andrew Goldstone, Ryan Heuser, Lauren Klein, Laura McGrath, David Mimno, Dan Sinykin, Ted Underwood, and Melanie Walsh.
Wendy Hui Kyong Chun, Discriminating Data: Correlation, Neighborhoods, and the New Politics of Recognition (Cambridge, Massachusetts: The MIT Press, 2021).↩︎
Shoshana Zuboff, “You Are the Object of a Secret Extraction Operation,” The New York Times, November 12, 2021, sec. Opinion, https://www.nytimes.com/2021/11/12/opinion/facebook-privacy.html.↩︎
The Facebook Files: What Next? Panel 1: The Activists, 2021, https://www.youtube.com/watch?v=YUubanGIZc0.↩︎
Johanna Drucker, Visualization and Interpretation: Humanistic Approaches to Display (Cambridge, Massachusetts: The MIT Press, 2020), 43–57.↩︎
Lauren F. Klein and Catherine D’Ignazio, “Numbers Don’t Speak For Themselves,” in Data Feminism (Cambridge, Massachusetts: The MIT Press, 2020).↩︎
Jorge Luis Borges, “The Library of Babel,” in Collected Fictions, trans. Andrew Hurley, Penguin Classics Deluxe Edition (New York, NY: Penguin Books, 1998).↩︎
Jeffrey Pomerantz, Metadata, The MIT Press Essential Knowledge Series (Cambridge, Massachusetts ; London, England: The MIT Press, 2015).↩︎
Heather Krause, “Data Biographies: Getting to Know Your Data,” Global Investigative Journalism Network, March 27, 2017, https://gijn.org/2017/03/27/data-biographies-getting-to-know-your-data/.↩︎
Lilly Irani, “Justice for Data Janitors,” in Think in Public: A Public Books Reader, ed. Sharon Marcus and Caitlin Zaloom, Public Books Series (New York: Columbia University Press, 2019), 23–39.↩︎
Billy Perrigo et al., “Inside Facebook’s African Sweat Shop,” TIME Magazine 199, no. 7/8 (February 28, 2022): 32–39.↩︎
Mark Algee-Hewitt and Mark McGurl, “Between Canon and Corpus: Six Perspectives on 20th-Century Novels,” Stanford Literary Lab Pamphlets, no. 8 (January 2015), https://litlab.stanford.edu/LiteraryLabPamphlet8.pdf.↩︎
Roopika Risam, New Digital Worlds: Postcolonial Digital Humanities in Theory, Praxis, and Pedagogy (Northwestern University Press, 2018).↩︎
David I. Holmes and Judit Kardos, “Who Was the Author? An Introduction to Stylometry,” Chance 16, no. 2 (2003): 5–8.↩︎
Thomas Corwin Mendenhall, “The Characteristic Curves of Composition,” Science 9, no. 214 (1887): 237–49.↩︎
Joseph Weizenbaum, “ELIZA—a Computer Program for the Study of Natural Language Communication between Man and Machine,” Communications of the ACM 9, no. 1 (1966): 36–45.↩︎
Dawn Archer, What’s in a Word-List? Investigating Word Frequency and Keyword Extraction (Ashgate, 2012).↩︎
Xuan Le et al., “Longitudinal Detection of Dementia through Lexical and Syntactic Changes in Writing: A Case Study of Three British Novelists,” Literary and Linguistic Computing 26, no. 4 (2011): 435–61.↩︎
Moya Bailey, Misogynoir Transformed: Black Women’s Digital Resistance, Intersections : Transdisciplinary Perspectives on Genders and Sexualities (New York: New York University Press, 2021).↩︎
J. D. Porter, “Popularity/Prestige,” Pamphlets of the Stanford Literary Lab 17 (September 2018), https://litlab.stanford.edu/LiteraryLabPamphlet17.pdf.↩︎
Timothy R. Johnson, Ryan C. Black, and Justin Wedeking, “Pardon the Interruption: An Empirical Analysis of Supreme Court Justices’ Behavior During Oral Arguments,” Loy. L. Rev. 55 (2009): 331.↩︎
Hanah Anderson and Matt Daniels, “Film Dialogue from 2,000 Screenplays, Broken Down by Gender and Age,” The Pudding, accessed February 4, 2022, https://pudding.cool/2017/03/film-dialogue/index.html.↩︎
Ryan Heuser and Long Le-Khac, “A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method,” Pamphlets of the Stanford Literary Lab, no. 4 (May 2012), https://litlab.stanford.edu/LiteraryLabPamphlet4.pdf.↩︎
Ted Underwood, David Bamman, and Sabrina Lee, “The Transformation of Gender in English-Language Fiction,” 2018.↩︎
David M. Blei, “Probabilistic Topic Models,” Communications of the ACM 55, no. 4 (April 1, 2012): 77, https://doi.org/10.1145/2133806.2133826.↩︎
Li Lucy et al., “Content Analysis of Textbooks via Natural Language Processing: Findings on Gender, Race, and Ethnicity in Texas U.S. History Textbooks,” AERA Open 6, no. 3 (July 2020): 233285842094031, https://doi.org/10.1177/2332858420940312.↩︎
Noah A. Smith, “Contextual Word Representations: Putting Words into Computers,” Communications of the ACM 63, no. 6 (May 21, 2020): 66–74, https://doi.org/10.1145/3347145.↩︎
Michael Gavin, “Vector Semantics, William Empson, and the Study of Ambiguity,” Critical Inquiry 44, no. 4 (2018): 641–73.↩︎