The Process and Ethics of Using Hacked and Leaked Data (CS4139.01)

Michael Corey

The advent of big data has also led to the advent of big data leaks. As the size of data leaks the ability to research, learn from, and report on these leaks has likewise become increasingly complicated. Beyond the technical ability to analyze this data, there are questions of cyber security when dealing with data of unknown provenance. Thirdly, there is the thorny question of the ethics of using hacked and leaked data, especially around how to balance the privacy of the individual, the benefits to society, and the nature of the (often sensitive) subject matter at hand.

This course will be structured around the newly published Hacks, Leaks, and Revelations, which is a comprehensive introduction to how to safely handle leaked data. This text will be heavily supplemented by additional readings on internet ethics and privacy and tutorials from leading sites in the OSINT and data journalism space. Students taking this course should expect to complete written feedback on the ethics of handling leaked data, to run analyses on example data, and to produce a final project analyzing one or more large-scale datasets.

Note: The textbook for this course covers examples analyzing data related to racism, white supremacy, and political extremism.

Learning Outcomes:
Students will:
- Learn how to safely containerize and analyze large data troves.
- Develop proficiency in using the command line, SQL, and Python to extract findings from large-scale data sets.
- Learn to express their opinions and thoughts on internet privacy and how that balances with the public good.
- Produce a final project deriving insights from semi-structured data.

Delivery Method: Fully in-person
Prerequisites: Students should have basic computer skills, the willingness to learn about containerization and the command line, and an interest in analyzing data. Due to the potentially disturbing nature of some of the material covered in the course, the instructor asks to speak with students who want to take the course before their registration is approved - contact
Course Level: 4000-level
Credits: 4
T/F 8:30AM - 10:20AM (Full-term)
Maximum Enrollment: 16
Course Frequency: One time only

Categories: 4000 , All courses , Computer Science , Four Credit , Fully In-Person , New Courses , Updates