Challenging Nineteenth-Century Data Legacies

Erik Fredner

2024-02-08

Recalibrating and challenging

Some humanists still view the use of data to study literature as a category error. Yet, in US fiction of the long nineteenth century, the language of probability and statistics becomes an increasingly prominent feature within the literature itself. Today, I will show how that discourse of probability computationally distinguishes postbellum from antebellum fiction better than almost any other set of words. This finding suggests that we should nuance Catherine Gallagher’s famous argument about the rise of fictionality: US fiction published after the Civil War solicits its readers’ belief not only through the probability of its representations but also through the discourse of probability.1 I will explain how this finding speaks to this seminar’s theme of recalibrating data for the humanities today and challenging nineteenth-century data legacies.

In part because of its rising significance to literature and life over the long nineteenth century, it is imperative that we continue to challenge some of the statistical imaginaries of this period. If epitomized by eugenics, these legacies are not limited to eugenics alone. Yet at the same time, I want to caution my fellow humanists against dismissing the statistical imagination out of hand. In The Philadelphia Negro, W.E.B. Du Bois argues that statistical methods are essential to reveal the diversity of the Black population of the city, which was otherwise represented as monolithic. As Du Bois’s work suggests, there is a tension here: One the one hand, statistics are reductive. They contort complex people to make them fit into simple boxes. On the other hand, trying to understand groups without statistics may be even more reductive, substituting real people with interpreters’ biases. Challenging these legacies requires us to acknowledge this tension between reductiveness and representativeness.

In my experience, my fellow literary scholars find this tension uncomfortable. We favor complexity and irreducibility. This is evident even in discarded strains of literary studies, like biographical criticism or the heresy of paraphrase.2

There are many reasons for some humanists’ discomfort with and mistrust of data, but there is one that I find the most theoretically compelling: This is the idea that data threatens the practice of reading, the core activity and expertise of the humanities.3

In my view, successfully recalibrating these methods will involve persuading skeptics that they can be used not in lieu of but as occasions for reading.

The 1890 Census and state projects of simplification

Don’t mistake me as saying that data is an unalloyed good. To be sure, many nineteenth century data legacies continue to haunt us, and need to be exorcised. We can see a clear example of this in the interrelation between the US Census and early computing.

The census determines many aspects of our lives, from how political power is apportioned to how social categories like race and gender are recognized by the state.4

While this is a phenomenon of the long nineteenth century, it is also a contemporary one. For example, only since April of 2022 have trans and nonbinary folks been able to self-select their gender—including the new X for “unspecified or another gender identity”—on US passports.

[slide x passport]

Today, activists and allies are urging trans and nonbinary people to renew their documents before the 2024 elections, in case subsequent administrations should roll back that recognition.

By the late nineteenth century, the census had become less of a population count than “a full-fledged instrument to monitor the overall state of American society.”5 But every census presents a greater administrative challenge than the last because two things tend to grow: the population, and the number of questions to be asked of that population. In 1860, the census asked fourteen questions.

[1860 census schedule blank]

In 1890, it asked thirty.6

[1890 census schedule blank]

Sixteen additional questions may not seem like too many, but it certainly would if you tried to tabulate sixty million questionnaires manually.

The Census Bureau responded to that challenge by soliciting new technology. Before 1890, the word “computer” referred to workers, usually women, who counted and did arithmetic professionally.7

Herman Hollerith won an 1889 competition to design a tabulating machine for census schedules, which operated, as later computers would, on punch-card technology. Hollerith’s machine would ultimately cost the human computers their jobs. But additional processing speed and accuracy from the tabulating machine also made it possible for census-takers to ask those sixteen additional questions and more.8

I tell this anecdote about the census and computing to make two points: First, the increasing technological capacity in the late nineteenth century to extract and interpret data both addressed and created demand for even more data.

Second, that demand for data about people leads to some of the most harmful legacies of nineteenth century data regimes with which we concern ourselves today. For example, the 1890 census was the first to ask respondents to identify their race using gradations of hypodescent, including the labels quadroon and octoroon. It was also the first to ask respondents about their immigration status.9 These specifications fall under the umbrella that James C. Scott has described as the state’s project of legibility and simplification.10 But it also demonstrates how that project of simplification tends to get more complex as data processing improves.

This points to the same tension that Du Bois suggests between reductiveness and representativeness. At the cost of potentially misrepresenting complex individuals, the state gains a more capacious description of the composition of a community than would be possible through other means.

Representativeness and reading

Some humanists mistrust data because it is reductive and surveillant in the ways just described. But, as I noted earlier, some also mistrust it because of a perception that it harms reading. This is not paranoid. In the essay where he coined the term distant reading, Franco Moretti presented computational approaches in opposition to reading:

“what we really need is a little pact with the devil: we know how to read texts, now let’s learn how not to read them.”11

There is much that scholars do not know about what Margaret Cohen has called “the great unread,” mostly because we haven’t read it.12 Contra Moretti, we could take this as a research problem for reading, as Nan Z. Da suggested in her argument against computational literary study:

…one million words roughly equals ten novels; one and a half billion represents about fifteen thousand novels, which at one novel a month will only take one thousand people one year to read.13

Yet Da assumes here that one thousand professional readers could organize and distribute their reading such that they could agree to say anything about thousands of different texts read by a thousand different people.

Moreover, there is also a corollary to this injunction to read comprehensively that humanists broadly believe but rarely acknowledge: No one needs to read all 41 of Horatio Alger’s novels for boys to learn enough about the rags-to-riches story.

Occasioning attention

Recalibrating how and why we use data for humanistic research requires us to short-circuit the belief that data-driven approaches reduce reading. I want to exemplify how computational approaches can lead us toward reading by briefly summarizing a key finding from my book project.

My research explores an old conclusion of US literary history in a new way: Why 1865?

1865 is the boundary line for most American literature survey courses, and divides The Norton Anthology of American Literature in two. 1865 reliably appears in job descriptions, at conferences, and in book titles. There are historical reasons: the Thirteenth Amendment passes, Lee surrenders to Grant, Booth shoots Lincoln, the Civil War ends, and Reconstruction begins. But what does all that tell us about the literature?14

Some would say that 1865 doesn’t tell us much. Christopher Hager and Cody Marrs have argued “Against 1865,” pointing out that it misrepresents the careers of major authors like Walt Whitman and Frederick Douglass who publish across that line.15

I realized that it could be informative to take the opposite approach of Hager and Marrs. Where they begin from the premise that 1865 must be criticized because it usually goes without saying, I wanted to know what we would learn if, instead, we were to take 1865 a little too seriously.

Corpus

To study this division, I needed a corpus with many examples before and after 1865.16

The Stanford Literary Lab holds a full-text copy of the Gale American Fiction corpus, which I have used for this project. Gale’s advantages are its period coverage; its bibliographic record; its hand-made metadata; and its scale:

[SLIDE corpus stats]

It is hard to imagine a billion words, so I provide a few comparators here, of which the eighty or so IKEA bookcases seems to me the most imaginable.

Works included in Gale come from four bibliographies of US fiction:

[bibliographies slide]

Three by Lyle Wright cover the period from 1774 to 1900, and one by Geoffrey Smith covers 1901 to 1925.

Wright and Smith share the same criteria for their selections:

[slide describing principles of selection]

These criteria usually produce results that literary historians would approve of, but not always. To take what seems like one of the most egregious examples today, Wright’s choice to exclude “juveniles” leads to the inclusion of Louisa May Alcott’s other novels and short story collections, but the exclusion of the Little Women series.

That said, my major concern in working with these bibliographies is that the corpus could be unusably racist and sexist. Gale’s metadata does not make any claims about its 8,500 unique authors’ race or gender, so we have to evaluate its representativeness on these axes by sampling.

I compared Gale’s authors to the eligible subset of the 464 authors who have ever been selected for any edition of The Norton Anthology of American Literature.

[SLIDE Gale diversity]

That subset contained 43 anthologized authors who are women and/or people of color whose publishing lives overlapped with Gale’s dates and who published prose fiction that fits Wright and Smith’s stated criteria.

84% of the authors who are women and/or people of color who we would expect to be included are included. Of the seven who do not appear, some do not for understandable reasons, such as the rediscovery of their eligible works after the bibliographies on which Gale is based had been published.

If Gale nevertheless overrepresents white male authors relative to the demography of the US in the period, this hardly means that women are absent from it.

[SLIDE top authors by word count]

Eight of the top twenty most prolific authors in the corpus are women, including the most prolific by far, EDEN Southworth, who published about six Prousts.

On these and other bases, I think it is accurate to say that Gale represents texts that have been repeatedly recognized as US prose fiction. Roopika Risam identifies repeated recognition as a challenge posed by the digital cultural record: Without acknowledging the compounding effects of repeated recognition, digitization today merely reproduces yesterday’s biases.17

Findings

So what does experimentally treating 1865 as a boundary line in this corpus reveal? In the interest of time, I am going to focus on only one finding.

My hypothesis was that quantitatively distinctive differences between antebellum and postbellum US fiction would not reflect common literary historical arguments such as the rise of realism.18 Instead, I expected the most significant differences to reflect changes in the ordinary usage of moderately frequent words like lunch, which becomes the common name for the midday meal during this period. I also hypothesized that industrialization would introduce a distinctive vocabulary in the postbellum period, including increasingly frequent references to things like telegraphs, telephones, and trains.

Comparing the relative frequencies of all overlapping words in the antebellum and postbellum corpora, I found some evidence for those hypotheses and others. But there were also words that appeared at the top of the list across multiple different measures of distinctiveness that had to do with probability, statistics, and representativeness. This list included normal, typical, average, chance, queer, everyday, usual, and common, among many others.

Unlike other distinctive terms I had predicted I would see, I had weaker intuitions as to why these would be so distinctive of the postbellum period. Queer theory and disability studies have both drawn our attention to the late nineteenth century as crucial in the constitution of their privileged categories. But importance doesn’t necessarily predict quantiative distinctiveness.19

The philosopher Ian Hacking’s work suggests that this may be a reflection of a larger process over the long nineteenth century whereby the concept of “human nature” was supplanted by the idea of “normal humanity,” which is a concept that emerges from work in probability and statistics, especially by Quetelet and Galton.20

If there is indeed something distinctively statistical and probabilistic about postbellum fiction, I wanted to know how widely distributed it is. We can answer that question using a two-step process: First, I use the list of terms already alluded to as seeds with which to create what Ryan Heuser and Long Le-Khac call a semantic field of significant correlates.21

Following Heuser and Le-Khac, I then filtered the highly correlated terms based on their inclusion or exclusion in a shared category of the OED’s Historical Thesaurus of English.

[keywords correlated with probabilistic seeds]

In this case, the terms fall under the thesaurus category Relative Properties (01.16), which includes words that relate to relationship, kind, order, number, measurement, quantity, and wholeness.

We can then measure the relative frequency of this semantic field over time:

[boxplot showing proportion of frequencies per year in probability discourse]

This boxplot shows the number of probabilistic words in each five-year group per million words. Even though this research began with the artifice of treating 1865 as a historic rupture, this visualization suggests the process is more continuous than abrupt.

One way that we can evaluate the significance of this pattern to the 1865 theory is by measuring how well this semantic field can classify postbellum from antebellum works using logistic regression.

I trained hundreds of models to classify texts as antebellum or postbellum based on the word frequencies of subsets of their vocabularies.

[comparison of model performance: random vs. probability correlates vs. top dunning]

I generated 100 different models with 100 equal random samples of training and test data with replacement from Gale for three different lists of words. Those groups of words were the probabilistic semantic field already discussed; an equal number of most distinctive terms of the postbellum period using Dunning’s log-likelihood, a common measure of distinctiveness; and an equal number of random words. I find that, on average, the words associated with the probabilistic semantic field classify unseen texts nearly as well as the words selected to optimize for distinctiveness alone, without consideration of their semantic coherence. This is the group that would include words like lunch.

This suggests that this probabilistic semantic field appears to be a highly distinctive, widespread phenomenon in US prose fiction over the long nineteenth century, and that its prevalence can be used to discern whether any given work was written before or after 1865 with about 87% accuracy.

Reading findings

But what difference does this classificatory power make for our reading?

I make two major arguments about it. The first I have already alluded to: Probability has long been an aesthetic criterion of postbellum literature.22 My research suggests that it is not solely that the representations of the work itself are probable, but that the work increasingly authorizes its representations through the discourse of probability.

My second argument is that one of the most common ways in which this statistical and probabilistic rhetoric functions in US fiction is to authorize generalizations about people, pretending that qualitative judgments have quantitative backing. As you would expect, this is frequently used to make racist, sexist, and nationalist assertions. Yet, at the same time, it could also be argued that an increasing need to make such assertions betrays their increasing contingency.

Reading findings: Caroline and Sara

I find thinking through this latter point with W. E. B. Du Bois’s work to be especially helpful. Du Bois begins his long writing life in the last twenty-five years of the Gale corpus. Among all major American authors, Du Bois is the most insightful statistical thinker, not least because he studied and practiced statistical methods.

Early in his career, Du Bois expresses hope that the statistics he collects and publicizes will disabuse whites of their belief in the uniformity of Black experience. In The Philadelphia Negro, Du Bois argues that statistics are the best way to represent the true diversity of the Black population, which he argues that his readers would otherwise assume to be homogeneous.23

Yet later in life, Du Bois becomes more pessimistic about the persuasive power of statistical thinking because it is so often used cynically.

Du Bois’s fraught relationship to statistical thinking appears in his fiction. It is best expressed not in one novel, but rather in one character who he rewrites across two novels. In The Quest of the Silver Fleece, her name is Caroline Wynn.24 In Dark Princess, her name is Sara Andrews.25

Caroline and Sara are almost the same: They are both young Black women of the Talented Tenth who can pass. They both work in politics in major US cities. They both exert outsized political power through their ability to use statistics to understand and sway voters. They both prepare their novels’ protagonists to become politicians by checking their idealism. They both try to marry their novels’ protagonists to secure their political power. And they both appear in the middle of their respective novels as interludes before the protagonists return to more radical political commitments made with more radical women.

What draws Caroline and Sara to our attention is that the source of their extraordinary power—their ability to accurately predict how populations think and act—is also the source of their undoing. They can read people, but they fail to read persons.

As a result, Caroline and Sara are similar in one more way: Their statistical imaginations cost both of them everything that they had worked for. They are sometimes described as the novels’ villains, but I would argue that Du Bois paints a more complex portrait of these women. They personify that tension Du Bois identifies between reductiveness and representativeness. As a result, they can be read as cautionary tales. Du Bois foreshadows this theme at the beginning of The Quest of the Silver Fleece:

“Drat statistics! … These are folks.”26

What is true of a population may be false of an individual. Knowledge of one does not predict knowledge of the other. Crucially, for Du Bois, this point goes both ways: As knowledge of populations does not give Caroline or Sara understanding of individuals in their lives, neither would having deep knowledge of individuals necessarily predict knowledge of populations.

Like the census’s tabulating machines, computational approaches can create a demand for interpretation of the new data they produce. And in 2024, demand for interpretation is no small thing. At the same time, challenging nineteenth-century data legacies requires us to learn the lessons that Du Bois started trying to teach about the tensions between reductiveness and representativeness more than a century ago.

Works cited

Abbate, Janet. Recoding Gender: Women’s Changing Participation in Computing. MIT Press, 2012.
Alonso, William, Paul Starr, and National Committee for Research on the 1980 Census, eds. The Politics of Numbers. The Population of the United States in the 1980s. New York: Russell Sage Foundation, 1987.
Alterman, Hyman. Counting People: The Census in History. 1st ed. New York: Harcourt, Brace & World, 1969.
Anderson, Margo J. The American Census: A Social History. Second Edition. New Haven: Yale University Press, 2015.
Bode, Katherine. “Why You Can’t Model Away Bias.” Modern Language Quarterly 81, no. 1 (March 2020): 95–124. https://doi.org/10.1215/00267929-7933102.
Bouk, Dan. How Our Days Became Numbered: Risk and the Rise of the Statistical Individual. University of Chicago Press, 2015.
Bouk, Daniel B. Democracy’s Data: The Hidden Stories in the U.S. Census and How to Read Them. First edition. New York: MCD ; Farrar, Straus and Giroux, 2022.
Brooks, Cleaneth. The Well Wrought Urn: Studies in the Structure of Poetry. New York: Reynold and Hitchcock, 1947.
Cohen, Margaret. The Sentimental Education of the Novel. Princeton, N.J: Princeton University Press, 1999.
Cryle, Peter, and Elizabeth Stephens. Normality: A Critical Genealogy. University of Chicago Press, 2017.
D’Ignazio, Catherine, and Lauren F. Klein. Data Feminism. Strong Ideas Series. Cambridge, Massachusetts: The MIT Press, 2020.
Da, Nan Z. “The Computational Case Against Computational Literary Studies.” Critical Inquiry 45, no. 3 (March 2019): 601–39. https://doi.org/10.1086/702594.
Davis, Lennard J. Enforcing Normalcy: Disability, Deafness, and the Body. London ; New York: Verso, 1995.
Didier, Emmanuel, Theodore M. Porter, and Priya Vari Sen. America by the Numbers: Quantification, Democracy, and the Birth of National Statistics. Infrastructure Series. Cambridge, Massachusetts: The MIT Press, 2020.
Du Bois, W. E. B. The Philadelphia Negro: A Social Study. The Oxford W.E.B. Du Bois. New York, NY: Oxford University Press, 2007.
———. The Quest of the Silver Fleece. The Oxford W. E. B. Du Bois. Oxford ; New York: Oxford University Press, 2007.
Gallagher, Catherine. “The Rise of Fictionality.” In The Novel, edited by Franco Moretti, 336–63. Princeton: Princeton University Press, 2006.
Gates, Henry Louis. The Signifying Monkey: A Theory of African American Literary Criticism. Twenty-fifth anniversary edition. Oxford: Oxford University Press, 2014.
Glazener, Nancy. Reading for Realism: The History of a U.S. Literary Institution, 1850-1910. Durham: Duke University Press, 1997.
Guillory, John. Cultural Capital: The Problem of Literary Canon Formation. First edition, enlarged. Chicago: The University of Chicago Press, 2023.
———. Professing Criticism: Essays on the Organization of Literary Study. Chicago: University of Chicago Press, 2022.
———. “The Ethical Practice of Modernity: The Example of Reading.” In The Turn to Ethics, edited by Marjorie B. Garber, Beatrice Hanssen, and Rebecca L. Walkowitz, 29–46. Culture Work. New York: Routledge, 2000.
Hacking, Ian. The Taming of Chance. Ideas in Context. Cambridge: Cambridge University Press, 1990. https://doi.org/10.1017/CBO9780511819766.
Hager, Christopher, and Cody Marrs. “Against 1865: Reperiodizing the Nineteenth Century.” J19: The Journal of Nineteenth-Century Americanists 1, no. 2 (September 2013): 259–84. https://doi.org/10.1353/jnc.2013.0026.
Heuser, Ryan, and Long Le-Khac. “A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method.” Pamphlets of the Stanford Literary Lab, no. 4 (May 2012).
Hicks, Granville. The Great Tradition: An Interpretation of American Literature Since the Civil War. Rev. ed. New York: Biblo and Tannen, 1967.
Johnson, Samuel. The Lives of the English Poets: And a Criticism on Their Works. Dublin: Whitestone, Williams, Colles, Wilson [etc.], 1779.
Kaplan, Amy. The Social Construction of American Realism. Chicago: University of Chicago Press, 1988.
Levine, Robert S. “Reimagining 1820-1865.” In Timelines of American Literature, edited by Cody Marrs and Christopher Hager, 134–44. Baltimore: Johns Hopkins University Press, 2019.
Levitan, Kathrin. A Cultural History of the British Census: Envisioning the Multitude in the Nineteenth Century. 1st ed. Palgrave Studies in Cultural and Intellectual History. New York: Palgrave Macmillan, 2011.
Marrs, Cody, ed. American Literature in Transition, 1851-1877. Nineteenth-Century American Literature in Transition, volume 3. Cambridge ; New York, NY: Cambridge University Press, 2022.
———. Nineteenth-Century American Literature and the Long Civil War. New York: Cambridge University Press, 2015.
———. “Three Theses on Reconstruction.” American Literary History 30, no. 3 (September 2018): 407–28. https://doi.org/10.1093/alh/ajy017.
Marrs, Cody, and Christopher Hager, eds. Timelines of American Literature. Baltimore: Johns Hopkins University Press, 2019.
Moretti, Franco. “Conjectures on World Literature.” New Left Review, no. 1 (February 2000): 54–68.
Nobles, Melissa. Shades of Citizenship: Race and the Census in Modern Politics. Stanford, CA: Stanford University Press, 2000.
Prewitt, Kenneth. What Is Your Race? The Census and Our Flawed Efforts to Classify Americans. Princeton, N.J: Princeton University Press, 2013.
Risam, Roopika. New Digital Worlds: Postcolonial Digital Humanities in Theory, Praxis, and Pedagogy. Evanston, Illinois: Northwestern University Press, 2019.
Roberts, David Lindsay. Republic of Numbers: Unexpected Stories of Mathematical Americans Through History. Johns Hopkins University Press, 2019.
Rodriguez, Clara E. Changing Race: Latinos, the Census, and the History of Ethnicity in the United States. Critical America. New York: New York University Press, 2000.
Sainte-Beuve, Charles Augustin. Portraits Littéraires. Nouv. éd. Paris: Didier, 1855.
Schor, Paul. Counting Americans: How the US Census Classified the Nation. New York, NY: Oxford University Press, 2017.
Schuller, Kyla. The Biopolitics of Feeling: Race, Sex, and Science in the Nineteenth Century. Anima. Durham: Duke University Press, 2017.
Scott, James C. Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed. Veritas paperback edition. New Haven: Yale University Press, 2020.
Soto, Michael. Measuring the Harlem Renaissance: The U.S. Census, African American Identity, and Literary Form. Amherst: University of Massachusetts Press, 2016.
States, United, Carroll D. Wright, and United States, eds. The History and Growth of the United States Census. Washington: Govt. Print. Off, 1900.
Taine, Hippolyte. History of English Literature. New York: J.W. Lovell company, 1879.
Thompson, Debra (Debra E. ). The Schematic State: Race, Transnationalism, and the Politics of the Census. Cambridge, United Kingdom: Cambridge University Press, 2016.
Thorvaldsen, Gunnar. Censuses and Census Takers: A Global History. Routledge Studies in Modern History. London: Routledge/Taylor & Francis Group, 2018.
Thrailkill, Jane F. Affecting Fictions: Mind, Body, and Emotion in American Literary Realism. Cambridge, Mass: Harvard University Press, 2007.
US Census Bureau, Census History Staff. “1890 - History - U.S. Census Bureau.” https://www.census.gov/history/www/through_the_decades/questionnaires/1890_2.html, n.d. Accessed February 4, 2024.
———. “Index of Questions 1860.” https://www.census.gov/history/www/through_the_decades/index_of_questions/1860_1.html, December 2023.
———. “Index of Questions 1890.” https://www.census.gov/history/www/through_the_decades/index_of_questions/1890_1.html, December 2023.

  1. Catherine Gallagher, “The Rise of Fictionality,” in The Novel, ed. Franco Moretti (Princeton: Princeton University Press, 2006), 340.↩︎

  2. Samuel Johnson, The Lives of the English Poets: And a Criticism on Their Works (Dublin: Whitestone, Williams, Colles, Wilson [etc.], 1779); Charles Augustin Sainte-Beuve, Portraits Littéraires, Nouv. éd (Paris: Didier, 1855); Hippolyte Taine, History of English Literature (New York: J.W. Lovell company, 1879); Cleaneth Brooks, The Well Wrought Urn: Studies in the Structure of Poetry (New York: Reynold and Hitchcock, 1947), 198–215.↩︎

  3. John Guillory, “The Ethical Practice of Modernity: The Example of Reading,” in The Turn to Ethics, ed. Marjorie B. Garber, Beatrice Hanssen, and Rebecca L. Walkowitz, Culture Work (New York: Routledge, 2000), 29–46; John Guillory, Professing Criticism: Essays on the Organization of Literary Study (Chicago: University of Chicago Press, 2022); John Guillory, Cultural Capital: The Problem of Literary Canon Formation, First edition, enlarged (Chicago: The University of Chicago Press, 2023).↩︎

  4. See, e.g., United States, Carroll D. Wright, and United States, eds., The History and Growth of the United States Census (Washington: Govt. Print. Off, 1900), Hyman Alterman, Counting People: The Census in History, 1st ed. (New York: Harcourt, Brace & World, 1969), William Alonso, Paul Starr, and National Committee for Research on the 1980 Census, eds., The Politics of Numbers, The Population of the United States in the 1980s (New York: Russell Sage Foundation, 1987), Melissa Nobles, Shades of Citizenship: Race and the Census in Modern Politics (Stanford, CA: Stanford University Press, 2000), Clara E. Rodriguez, Changing Race: Latinos, the Census, and the History of Ethnicity in the United States, Critical America (New York: New York University Press, 2000), Kathrin Levitan, A Cultural History of the British Census: Envisioning the Multitude in the Nineteenth Century, 1st ed, Palgrave Studies in Cultural and Intellectual History (New York: Palgrave Macmillan, 2011), Kenneth Prewitt, What Is Your Race? The Census and Our Flawed Efforts to Classify Americans (Princeton, N.J: Princeton University Press, 2013), Dan Bouk, How Our Days Became Numbered: Risk and the Rise of the Statistical Individual (University of Chicago Press, 2015), Margo J. Anderson, The American Census: A Social History, Second Edition (New Haven: Yale University Press, 2015), Michael Soto, Measuring the Harlem Renaissance: The U.S. Census, African American Identity, and Literary Form (Amherst: University of Massachusetts Press, 2016), Debra (Debra E. ) Thompson, The Schematic State: Race, Transnationalism, and the Politics of the Census (Cambridge, United Kingdom: Cambridge University Press, 2016), Paul Schor, Counting Americans: How the US Census Classified the Nation (New York, NY: Oxford University Press, 2017), Gunnar Thorvaldsen, Censuses and Census Takers: A Global History, Routledge Studies in Modern History (London: Routledge/Taylor & Francis Group, 2018), Emmanuel Didier, Theodore M. Porter, and Priya Vari Sen, America by the Numbers: Quantification, Democracy, and the Birth of National Statistics, Infrastructure Series (Cambridge, Massachusetts: The MIT Press, 2020), Daniel B. Bouk, Democracy’s Data: The Hidden Stories in the U.S. Census and How to Read Them, First edition (New York: MCD ; Farrar, Straus and Giroux, 2022).↩︎

  5. Anderson, The American Census, 88.↩︎

  6. Census History Staff US Census Bureau, “Index of Questions 1860” (https://www.census.gov/history/www/through_the_decades/index_of_questions/1860_1.html, December 2023); Census History Staff US Census Bureau, “Index of Questions 1890” (https://www.census.gov/history/www/through_the_decades/index_of_questions/1890_1.html, December 2023).↩︎

  7. Janet Abbate, Recoding Gender: Women’s Changing Participation in Computing (MIT Press, 2012), 12.↩︎

  8. David Lindsay Roberts, Republic of Numbers: Unexpected Stories of Mathematical Americans Through History (Johns Hopkins University Press, 2019), 104.↩︎

  9. Census History Staff US Census Bureau, “1890 - History - U.S. Census Bureau (https://www.census.gov/history/www/through_the_decades/questionnaires/1890_2.html, n.d.), accessed February 4, 2024.↩︎

  10. James C. Scott, Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed, Veritas paperback edition (New Haven: Yale University Press, 2020), 9–85.↩︎

  11. Franco Moretti, “Conjectures on World Literature,” New Left Review, no. 1 (February 2000): 57.↩︎

  12. Margaret Cohen, The Sentimental Education of the Novel (Princeton, N.J: Princeton University Press, 1999), 23.↩︎

  13. Nan Z. Da, “The Computational Case Against Computational Literary Studies,” Critical Inquiry 45, no. 3 (March 2019): 638–39, https://doi.org/10.1086/702594.↩︎

  14. Scholars from Granville Hicks to Amy Kaplan to Henry Louis Gates, Jr. have used 1865 to think about the ways in which the end of the Civil War upended social relations, and how that remade literature. Granville Hicks, The Great Tradition: An Interpretation of American Literature Since the Civil War, Rev. ed (New York: Biblo and Tannen, 1967); Amy Kaplan, The Social Construction of American Realism (Chicago: University of Chicago Press, 1988); Henry Louis Gates, The Signifying Monkey: A Theory of African American Literary Criticism, Twenty-fifth anniversary edition (Oxford: Oxford University Press, 2014).↩︎

  15. Christopher Hager and Cody Marrs, “Against 1865: Reperiodizing the Nineteenth Century,” J19: The Journal of Nineteenth-Century Americanists 1, no. 2 (September 2013): 259–84, https://doi.org/10.1353/jnc.2013.0026; Cody Marrs, Nineteenth-Century American Literature and the Long Civil War (New York: Cambridge University Press, 2015); Robert S. Levine, “Reimagining 1820-1865,” in Timelines of American Literature, ed. Cody Marrs and Christopher Hager (Baltimore: Johns Hopkins University Press, 2019), 134–44; Cody Marrs, “Three Theses on Reconstruction,” American Literary History 30, no. 3 (September 2018): 407–28, https://doi.org/10.1093/alh/ajy017; Cody Marrs, ed., American Literature in Transition, 1851-1877, Nineteenth-Century American Literature in Transition, volume 3 (Cambridge ; New York, NY: Cambridge University Press, 2022); Cody Marrs and Christopher Hager, eds., Timelines of American Literature (Baltimore: Johns Hopkins University Press, 2019).↩︎

  16. As Katherine Bode insists by her axiom that “you can’t model away bias.” Lauren Klein and Catherine D’Ignazio also argue that attending to the conditions of data’s production is one of the key tenets of data feminism. Katherine Bode, “Why You Can’t Model Away Bias,” Modern Language Quarterly 81, no. 1 (March 2020): 95–124, https://doi.org/10.1215/00267929-7933102; Catherine D’Ignazio and Lauren F. Klein, Data Feminism, Strong Ideas Series (Cambridge, Massachusetts: The MIT Press, 2020).↩︎

  17. Roopika Risam, New Digital Worlds: Postcolonial Digital Humanities in Theory, Praxis, and Pedagogy (Evanston, Illinois: Northwestern University Press, 2019).↩︎

  18. Nancy Glazener, Reading for Realism: The History of a U.S. Literary Institution, 1850-1910 (Durham: Duke University Press, 1997).↩︎

  19. See, e.g, Lennard J. Davis, Enforcing Normalcy: Disability, Deafness, and the Body (London ; New York: Verso, 1995); Kyla Schuller, The Biopolitics of Feeling: Race, Sex, and Science in the Nineteenth Century, Anima (Durham: Duke University Press, 2017); Peter Cryle and Elizabeth Stephens, Normality: A Critical Genealogy (University of Chicago Press, 2017).↩︎

  20. Ian Hacking, The Taming of Chance, Ideas in Context (Cambridge: Cambridge University Press, 1990), 160–70, https://doi.org/10.1017/CBO9780511819766.↩︎

  21. Ryan Heuser and Long Le-Khac, “A Quantitative Literary History of 2,958 Nineteenth-Century British Novels: The Semantic Cohort Method,” Pamphlets of the Stanford Literary Lab, no. 4 (May 2012).↩︎

  22. Jane F. Thrailkill, Affecting Fictions: Mind, Body, and Emotion in American Literary Realism (Cambridge, Mass: Harvard University Press, 2007).↩︎

  23. W. E. B. Du Bois, The Philadelphia Negro: A Social Study, The Oxford W.E.B. Du Bois (New York, NY: Oxford University Press, 2007).↩︎

  24. W. E. B. Du Bois, The Quest of the Silver Fleece, The Oxford W. E. B. Du Bois (Oxford ; New York: Oxford University Press, 2007).↩︎

  25. Du Bois.↩︎

  26. Du Bois, 7.↩︎