Advanced Data Science
Land Acknowledgment
This class at the University of Richmond respectfully acknowledges the traditional custodians of the land we are on today, the Powhatan people, and pays respect to their elders past, present, and emerging.
To learn more about the land on which the University of Richmond exists, I recommend students read the report “Knowledge of This Cannot Be Hidden” by Shelby M. Driskill and Dr. Lauranett L. Lee, which discusses both the University’s geographic connection to the Powhatan people, as well as the presence of a burying ground for enslaved laborers on campus.
Accessibility
I strive to make this course accessible. If you encounter barriers to accessibility, please let me know as soon as possible.
Description
This course introduces advanced approaches to the analysis of data. It emphasizes what Hadley Wickham calls the “whole game” of data science: creating, importing, tidying, transforming, visualizing, and communicating about data. We will focus on data about and derived from texts in this class.
See the course catalog.
Learning Goals
By the end of this course, students will be able to…
- Collect, manipulate, tidy, visualize, and explore data using basic and advanced techniques.
- Identify specific challenges and opportunities of working with data derived from texts.
- Understand key aspects of the R programming language and
the
tidyverse
. - Use the RStudio integrated development environment.
- Use Quarto to create high-quality research documents and websites.
- Use programming language documentation, cookbooks, and large language models to solve programming problems.
Materials
- Class materials will be posted on Blackboard throughout the semester.
- Please bring a computer, pencil, and paper to each class.
Inclusivity
I expect you to…
- Treat your classmates with respect.
- Be patient.
- Support each other in your learning.
Help
- I reserve class time to answer questions and help with problem sets.
- I encourage you to work on problem sets and study with your classmates during and outside of class.
- Come to office hours.
Grades
Assignment | Percentage |
---|---|
Participation | 10% |
Code Interview | 10% |
Group Projects | 60% |
Final Project | 20% |
Letter Grade | Range |
---|---|
A | 93-100 |
A- | 90-92 |
B+ | 87-89 |
B | 83-86 |
B- | 80-82 |
C+ | 77-79 |
C | 73-76 |
C- | 70-72 |
D+ | 67-69 |
D | 63-66 |
D- | 60-62 |
F | 0-59 |
- I round fractional grades.
- I reserve the right to curve grades.
Participation
- Participation includes preparation, attendance, and effective use of class time.
- Everyone gets three unexcused absences.
- Additional unexcused absences will harm your participation grade.
- Email me before the class you will miss to request an excused absence.
Code Interview
Near the end of the semester, you will complete a code interview. I will ask you to explain your approach and solutions to a few randomly selected problems from the notebooks assigned throughout the semester.
If you do not complete the notebooks, you cannot succeed on the code interview.
Group Projects
- Groups of two to three students will be randomly assigned for each project.
- In addition to submitting the project, group members will complete self and peer evaluations.
- Generally, groups with mutually positive peer evaluations
will share a grade.
- If a group member receives a negative peer evaluation, they may receive a lower grade than the rest of the group.
- I want groups to share work equitably.
Final Project
Unlike earlier projects in the semester, final projects are individual. Your final project will result in a formal report including original code, data visualizations, explanatory and interpretive writing, and proper citations.
Late work
- Late work is penalized one letter grade (i.e., -10 points) per day.
- If you need an extension on an assignment, it never hurts to
ask!
- I will be much more likely to grant extensions requested well in advance.
- If you request an extension within 24 hours of a deadline, it will not be approved except in exceptional circumstances.
Honor
This course is taught in accordance with the University of Richmond Honor Code, which can be accessed via The Honor Councils website.
If you are found to have violated the Honor Code, you will fail this course.
If you ever have any questions about whether an action would be an honor violation, re-read the syllabus. If it is still unclear, please ask me.
Generative Artificial Intelligence
Generative artificial intelligence (GenAI) programs, especially large language models (LLMs), are useful tools for coding. However, overreliance on GenAI impedes learning.
Moreover, LLMs often answer questions incorrectly or incompletely. In order to be an effective user of these technologies, it is crucial for you to be able to recognize when that happens, and how to respond.
Prohibited Uses of GenAI
- Submitting model output, in part or in whole, as if it were your original work. This includes code or writing.
- Uploading any data used in this course (e.g.,
.csv
files) to multimodal GenAI tools like ChatGPT.
Permitted Uses of GenAI
I ask you to do these things in this order when you can’t figure something out:
- Review the course notes.
- Talk to your classmates.
- Search for credible information online (e.g., StackOverflow).
- Talk to the Custom GPT I created for this class.
- If you use information from the Custom GPT, cite your interactions with it.
This page explains how to share a link to a ChatGPT interaction.
Schedule
The schedule outlines major topics to be covered in the course. I reserve the right to change the schedule as the semester progresses. If I do change the schedule, I will inform you as far in advance as possible.
Meeting | Date | Topic |
---|---|---|
1 | 01-13 | Introduction |
2 | 01-15 | Quarto and Markdown |
01-22 | NO CLASS | |
3 | 01-27 | tidyverse review |
4 | 01-29 | Tidy texts |
5 | 02-03 | Sampling and corpora |
6 | 02-05 | Relative frequencies |
7 | 02-10 | Review |
8 | 02-12 | Group Project 1 workshop |
9 | 02-17 | Sentiment analysis |
10 | 02-19 | tf-idf |
11 | 02-24 | K-nearest neighbors |
12 | 02-26 | Text classification |
13 | 03-03 | Group Project 2 workshop |
14 | 03-05 | Elastic nets |
03-10 | Spring Break | |
03-12 | Spring Break | |
15 | 03-17 | Gradient boosted trees |
16 | 03-19 | Logistic regression |
17 | 03-24 | Word embeddings |
18 | 03-26 | Modeling text data |
19 | 03-31 | Text classification |
20 | 04-02 | Topic modeling |
21 | 04-07 | Review |
22 | 04-09 | Group Project 3 workshop |
23 | 04-14 | Code Interviews |
24 | 04-16 | Workshop |
25 | 04-21 | Workshop |
26 | 04-23 | Final Project presentations |
Communication
- I respond to email within 2 business days.
- If you have not received a response after 2 business days, please write me again.
- I recommend scheduling your emails to arrive early in the morning (e.g., 7 AM).
Wellness
Health
- Please do not come to class if you are sick.
- If you are recovering from an illness, are no longer infectious, and are well enough to attend class, please be courteous and wear a mask.
Counseling and Psychological Services
Mental health is crucial for academic success. Counseling and Psychological Services at the University of Richmond supports student success and enhances student well-being by providing comprehensive clinical services to currently enrolled, full-time, degree-seeking students.
Title IX
The University of Richmond and its faculty are committed to ensuring a safe and supportive learning environment. If you disclose to me or another mandatory reporter an incident of sexual misconduct (including sexual harassment or sexual violence), I am obligated by law to share that information with the University’s Title IX Coordinator. For more information on our sexual misconduct policy, how to report, and confidential resources available to you, please visit the University’s Title IX help page.
Religious Observance
Any student may be excused from class or other assignments because of religious observance. I will make reasonable accommodations when students’ religious practices conflict with their academic responsibilities. If you will miss an academic obligation because of religious observance, you are responsible for contacting me within the first two weeks of the semester. You are also responsible for completing missed work in a timely manner.
For more information, see the University’s religious observance policy.
Resources
The University of Richmond has many resources on campus that may help you succeed.
Disability Services
The University of Richmond’s office of Disability Services strives to ensure that students with disabilities and/or temporary conditions (i.e., concussions & injuries) are provided opportunity for full participation and equal access. Students who are experiencing a barrier to access due to a disability and/or temporary condition are encouraged to apply for accommodations by visiting: disability.richmond.edu. Disability Services can be reached at disability@richmond.edu or 804-662-5001.
Once accommodations have been approved, students must 1) Submit their Disability Accommodation Notice (DAN) to each of their professors via the Disability Services Student Portal available at this link: sl.richmond.edu/be. and 2) Request a meeting with each professor to create an accommodation implementation plan. It is important to complete these steps as soon as possible because accommodations are never retroactive, and professors are permitted a reasonable amount of time for implementation. Disability Services is available to assist, as needed.
Weinstein Learning Center
The Weinstein Learning Center is your go-to destination for academic support. Their services are tailored to help you achieve your academic goals throughout your time at University of Richmond. To learn more and view service schedules and appointment times, visit https://wlc.richmond.edu. Available services include:
Academic Skills Coaching
Meet with a professional staff member who will collaborate with you to assess and develop your academic and life skills (e.g., critical reading and thinking, information conceptualization, concentration, test preparation, time management, stress management, and more).
Content Tutoring
Peer consultants offer assistance in specific courses and subject areas. They are available for appointments (in-person and virtual) and drop-in sessions. See schedules at https://wlc.richmond.edu for supported courses and drop-in times.
English Language Learning
Attend one-on-one or group consultations, workshops, and other services focused on English, academic, and/or intercultural skills.
Quantitative and Programming Resources
Peer consultants and professional staff offer workshops or one-on-one appointments to build quantitative and programming skills and provide statistical assistance for research projects.
Speech and Communication
Prepare and practice for academic presentations, speaking engagements, and other occasions of public expression. Peer consultants offer recording, playback, and coaching for both individual and group presentations. Students can expect recommendations regarding clarity, organization, style, and delivery.
Technology Learning
Visit our student lab dedicated to supporting digital media projects. Services include camera checkout, video/audio recording assistance, use of virtual reality equipment, poster printing, 3D printing and modeling, and consultation services on a variety of software.
Writing
Assists student writers at all levels of experience, across all majors. Meet with peer consultants who can offer feedback on written work and suggest pre-writing, drafting, and revision strategies.
Boatwright Library
Students may consult librarians to assist with their research, which may be especially useful for the final project. Use the Ask a Librarian service to reach librarians by email, phone, chat, text, or in person.
Acknowledgments
The course builds on previous iterations of DSST389 taught by Lilla Orr and Taylor Arnold.
Recommended Reading
- Taylor Arnold and Lauren Tilton1
- Peter C. Bruce, Andrew Bruce, and Peter Gedeck2
- Kieran Joseph Healy3
- Chester Ismay4
- Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani5
- Matthew Lee Jockers and Rosamond Thalken6
- Max Kuhn and Julia Silge7
- Julia Silge and David Robinson8
- Edward R. Tufte9
- Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund10
- Hadley Wickham and Jennifer Bryan11
Humanities Data in R: Exploring Networks, Geospatial Data, Images, and Text, 2nd ed. 2024, Quantitative Methods in the Humanities and Social Sciences (Springer International Publishing, 2024), https://doi.org/10.1007/978-3-031-62566-4.↩︎
Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python, Second edition (O’Reilly Media, Inc, 2020).↩︎
Data Visualization: A Practical Introduction (Princeton University Press, 2019).↩︎
Statistical Inference via Data Science: A Modern Dive into R and the Tidyverse, Second edition (CRC Press, 2025).↩︎
An Introduction to Statistical Learning: With Applications in R, Springer Texts in Statistics (Springer US, 2021), https://doi.org/10.1007/978-1-0716-1418-1.↩︎
Text Analysis with R: For Students of Literature, 2nd edition, Quantitative Methods in the Humanities and Social Sciences (Springer, 2020).↩︎
Tidy Modeling with R: A Framework for Modeling in the Tidyverse (O’Reilly Media, 2022).↩︎
Text Mining with R: A Tidy Approach, First edition (O’Reilly, 2017).↩︎
The Visual Display of Quantitative Information, 2nd ed (Graphics Press, 2001).↩︎
R for Data Science: Import, Tidy, Transform, Visualize, and Model Data, Second edition (O’Reilly, 2023).↩︎
R Packages: Organize, Test, Document, and Share Your Code, Second edition (O’Reilly, 2023).↩︎