Advanced Data Science
Contents
- Institution: University of Richmond
- Repository: github.com/erikfredner/dsst389-2024
Land Acknowledgment
This class at the University of Richmond respectfully acknowledges the traditional custodians of the land we are on today, the Powhatan people, and pays respect to their elders past, present, and emerging.
To learn more about the land on which the University of Richmond exists, I recommend students read the report “Knowledge of This Cannot Be Hidden” by Shelby M. Driskill and Dr. Lauranett L. Lee, which discusses both the University’s geographic connection to the Powhatan people, as well as the presence of a burying ground for enslaved laborers on campus.
Accessibility
I strive to make this course accessible. If you encounter barriers to accessibility, please let me know as soon as possible.
Description
This course introduces advanced approaches to the analysis of data. It emphasizes what Hadley Wickham calls the “whole game” of data science: creating, importing, tidying, transforming, visualizing, and communicating about data. We will focus on data about and derived from texts in this class.
See the course catalog.
Learning Goals
By the end of this course, students will be able to…
- Collect, manipulate, tidy, visualize, and explore data using basic and advanced techniques.
- Identify specific challenges and opportunities of working with data derived from texts.
- Understand key aspects of the R programming language
and the
tidyverse. - Use the RStudio integrated development environment.
- Use Quarto to create high-quality research documents and websites.
- Use programming language documentation, cookbooks, and large language models to solve programming problems.
Materials
- Class materials will be posted on Blackboard throughout the semester.
- Please bring a computer, pencil, and paper to each class.
Inclusivity
I expect you to…
- Treat your classmates with respect.
- Be patient.
- Support each other in your learning.
Help
- I reserve class time to answer questions and help with problem sets.
- I encourage you to work on problem sets and study with your classmates during and outside of class.
- Come to office hours.
Grades
| Assignment | Percentage |
|---|---|
| Participation | 10% |
| Code Interview | 10% |
| Group Projects | 60% |
| Final Project | 20% |
| Letter Grade | Range |
|---|---|
| A | 93-100 |
| A- | 90-92 |
| B+ | 87-89 |
| B | 83-86 |
| B- | 80-82 |
| C+ | 77-79 |
| C | 73-76 |
| C- | 70-72 |
| D+ | 67-69 |
| D | 63-66 |
| D- | 60-62 |
| F | 0-59 |
- I round fractional grades.
- I reserve the right to curve grades.
Participation
- Participation includes preparation, attendance, and effective use of class time.
- Everyone gets three unexcused absences.
- Additional unexcused absences will harm your participation grade.
- Email me before the class you will miss to request an excused absence.
Code Interview
Near the end of the semester, you will complete a code interview. I will ask you to explain your approach and solutions to a few randomly selected problems from the notebooks assigned throughout the semester.
If you do not complete the notebooks, you cannot succeed on the code interview.
Group Projects
- Groups of two to three students will be randomly assigned for each project.
- In addition to submitting the project, group members will complete self and peer evaluations.
- Generally, groups with mutually positive peer
evaluations will share a grade.
- If a group member receives a negative peer evaluation, they may receive a lower grade than the rest of the group.
- I want groups to share work equitably.
Final Project
Unlike earlier projects in the semester, final projects are individual. Your final project will result in a formal report including original code, data visualizations, explanatory and interpretive writing, and proper citations.
Late work
- Late work is penalized one letter grade (i.e., -10 points) per day.
- If you need an extension on an assignment, it never
hurts to ask!
- I will be much more likely to grant extensions requested well in advance.
- If you request an extension within 24 hours of a deadline, it will not be approved except in exceptional circumstances.
Honor
This course is taught in accordance with the University of Richmond Honor Code, which can be accessed via The Honor Councils website.
If you are found to have violated the Honor Code, you will fail this course.
If you ever have any questions about whether an action would be an honor violation, re-read the syllabus. If it is still unclear, please ask me.
Generative Artificial Intelligence
Generative artificial intelligence (GenAI) programs, especially large language models (LLMs), are useful tools for coding. However, overreliance on GenAI impedes learning.
Moreover, LLMs often answer questions incorrectly or incompletely. In order to be an effective user of these technologies, it is crucial for you to be able to recognize when that happens, and how to respond.
Prohibited Uses of GenAI
- Submitting model output, in part or in whole, as if it were your original work. This includes code or writing.
- Uploading any data used in this course (e.g.,
.csvfiles) to multimodal GenAI tools like ChatGPT.
Permitted Uses of GenAI
I ask you to do these things in this order when you can’t figure something out:
- Review the course notes.
- Talk to your classmates.
- Search for credible information online (e.g., StackOverflow).
- Talk to the Custom GPT I created for this class.
- If you use information from the Custom GPT, cite your interactions with it.
This page explains how to share a link to a ChatGPT interaction.
Schedule
The schedule outlines major topics to be covered in the course. I reserve the right to change the schedule as the semester progresses. If I do change the schedule, I will inform you as far in advance as possible.
| Meeting | Date | Topic |
|---|---|---|
| 1 | 01-13 | Introduction |
| 2 | 01-15 | Quarto and Markdown |
| 01-22 | NO CLASS | |
| 3 | 01-27 | tidyverse
review |
| 4 | 01-29 | Tidy texts |
| 5 | 02-03 | Sampling and corpora |
| 6 | 02-05 | Relative frequencies |
| 7 | 02-10 | Review |
| 8 | 02-12 | Group Project 1 workshop |
| 9 | 02-17 | Sentiment analysis |
| 10 | 02-19 | tf-idf |
| 11 | 02-24 | K-nearest neighbors |
| 12 | 02-26 | Text classification |
| 13 | 03-03 | Group Project 2 workshop |
| 14 | 03-05 | Elastic nets |
| 03-10 | Spring Break | |
| 03-12 | Spring Break | |
| 15 | 03-17 | Gradient boosted trees |
| 16 | 03-19 | Logistic regression |
| 17 | 03-24 | Word embeddings |
| 18 | 03-26 | Modeling text data |
| 19 | 03-31 | Text classification |
| 20 | 04-02 | Topic modeling |
| 21 | 04-07 | Review |
| 22 | 04-09 | Group Project 3 workshop |
| 23 | 04-14 | Code Interviews |
| 24 | 04-16 | Workshop |
| 25 | 04-21 | Workshop |
| 26 | 04-23 | Final Project presentations |
Communication
- I respond to email within 2 business days.
- If you have not received a response after 2 business days, please write me again.
- I recommend scheduling your emails to arrive early in the morning (e.g., 7 AM).
Wellness
Health
- Please do not come to class if you are sick.
- If you are recovering from an illness, are no longer infectious, and are well enough to attend class, please be courteous and wear a mask.
Counseling and Psychological Services
Mental health is crucial for academic success. Counseling and Psychological Services at the University of Richmond supports student success and enhances student well-being by providing comprehensive clinical services to currently enrolled, full-time, degree-seeking students.
Title IX
The University of Richmond and its faculty are committed to ensuring a safe and supportive learning environment. If you disclose to me or another mandatory reporter an incident of sexual misconduct (including sexual harassment or sexual violence), I am obligated by law to share that information with the University’s Title IX Coordinator. For more information on our sexual misconduct policy, how to report, and confidential resources available to you, please visit the University’s Title IX help page.
Religious Observance
Any student may be excused from class or other assignments because of religious observance. I will make reasonable accommodations when students’ religious practices conflict with their academic responsibilities. If you will miss an academic obligation because of religious observance, you are responsible for contacting me within the first two weeks of the semester. You are also responsible for completing missed work in a timely manner.
For more information, see the University’s religious observance policy.
Resources
The University of Richmond has many resources on campus that may help you succeed.
Disability Services
The University of Richmond’s office of Disability Services strives to ensure that students with disabilities and/or temporary conditions (i.e., concussions & injuries) are provided opportunity for full participation and equal access. Students who are experiencing a barrier to access due to a disability and/or temporary condition are encouraged to apply for accommodations by visiting: disability.richmond.edu. Disability Services can be reached at disability@richmond.edu or 804-662-5001.
Once accommodations have been approved, students must 1) Submit their Disability Accommodation Notice (DAN) to each of their professors via the Disability Services Student Portal available at this link: sl.richmond.edu/be. and 2) Request a meeting with each professor to create an accommodation implementation plan. It is important to complete these steps as soon as possible because accommodations are never retroactive, and professors are permitted a reasonable amount of time for implementation. Disability Services is available to assist, as needed.
Weinstein Learning Center
The Weinstein Learning Center is your go-to destination for academic support. Their services are tailored to help you achieve your academic goals throughout your time at University of Richmond. To learn more and view service schedules and appointment times, visit https://wlc.richmond.edu. Available services include:
Academic Skills Coaching
Meet with a professional staff member who will collaborate with you to assess and develop your academic and life skills (e.g., critical reading and thinking, information conceptualization, concentration, test preparation, time management, stress management, and more).
Content Tutoring
Peer consultants offer assistance in specific courses and subject areas. They are available for appointments (in-person and virtual) and drop-in sessions. See schedules at https://wlc.richmond.edu for supported courses and drop-in times.
English Language Learning
Attend one-on-one or group consultations, workshops, and other services focused on English, academic, and/or intercultural skills.
Quantitative and Programming Resources
Peer consultants and professional staff offer workshops or one-on-one appointments to build quantitative and programming skills and provide statistical assistance for research projects.
Speech and Communication
Prepare and practice for academic presentations, speaking engagements, and other occasions of public expression. Peer consultants offer recording, playback, and coaching for both individual and group presentations. Students can expect recommendations regarding clarity, organization, style, and delivery.
Technology Learning
Visit our student lab dedicated to supporting digital media projects. Services include camera checkout, video/audio recording assistance, use of virtual reality equipment, poster printing, 3D printing and modeling, and consultation services on a variety of software.
Writing
Assists student writers at all levels of experience, across all majors. Meet with peer consultants who can offer feedback on written work and suggest pre-writing, drafting, and revision strategies.
Boatwright Library
Students may consult librarians to assist with their research, which may be especially useful for the final project. Use the Ask a Librarian service to reach librarians by email, phone, chat, text, or in person.
Acknowledgments
The course builds on previous iterations of DSST389 taught by Lilla Orr and Taylor Arnold.
Recommended Reading
- Taylor Arnold
and Lauren TiltonHumanities Data in
R: Exploring Networks,
Geospatial Data, Images, and
Text, 2nd ed. 2024, Quantitative
Methods in the Humanities and
Social Sciences (Springer International
Publishing, 2024), https://doi.org/10.1007/978-3-031-62566-4.
- Peter C.
Bruce, Andrew Bruce, and Peter GedeckPractical Statistics for Data
Scientists: 50+ Essential Concepts Using R and
Python, Second edition (O’Reilly Media,
Inc, 2020).
- Kieran
Joseph HealyData Visualization: A Practical
Introduction (Princeton University Press, 2019).
- Chester
IsmayStatistical Inference via Data Science:
A Modern Dive into R and the Tidyverse,
Second edition (CRC Press, 2025).
- Gareth
James, Daniela Witten, Trevor Hastie, and Robert
TibshiraniAn Introduction to
Statistical Learning: With
Applications in R, Springer
Texts in Statistics (Springer US,
2021), https://doi.org/10.1007/978-1-0716-1418-1.
- Matthew Lee
Jockers and Rosamond ThalkenText Analysis with R: For
Students of Literature, 2nd edition, Quantitative
Methods in the Humanities and Social Sciences (Springer,
2020).
- Max Kuhn and
Julia SilgeTidy Modeling with R: A
Framework for Modeling in the Tidyverse (O’Reilly
Media, 2022).
- Julia Silge and David
RobinsonText Mining with R: A Tidy
Approach, First edition (O’Reilly, 2017).
- Edward R.
TufteThe Visual Display of Quantitative
Information, 2nd ed (Graphics Press, 2001).
- Hadley Wickham,
Mine Çetinkaya-Rundel, and Garrett GrolemundR for Data Science: Import, Tidy,
Transform, Visualize, and Model Data, Second edition
(O’Reilly, 2023).
- Hadley Wickham
and Jennifer BryanR Packages: Organize, Test, Document,
and Share Your Code, Second edition (O’Reilly,
2023).