Collecting and Vetting Public Data for Research (CS4137.01)

Michael Corey

In this course we will go over major methods for collecting and vetting public data to be used in research or computing settings. The course will start by learning about publicly available data sets, then progress through using APIs to call data providers, web-scraping public data, and finally capturing streaming data and converting it into usable datasets.

This course will be taught in Python using Jupyter Notebooks. Students will be expected to be fluent in Python or R for data analysis before starting the course and to have undertaken basic coursework in statistics.

This course will be especially helpful for students who are preparing STEM or Social Science plan projects that require data for analysis. It may also be of interest to CS students looking to learn web-scraping and how to capture streaming data.

Learning Outcomes:
- Students will learn to collect, confirm, and leverage publicly available data to assist in their own research and computing goals.

- Students will gain an appreciation for the possibilities, but especially limitations, of data used in research.

- Computing students will gain research skills and research students will gain computing skills.

Delivery Method: Fully in-person
Prerequisites: Students who have at least one class each in statistics or data science and data visualization or mapping should contact faculty directly at
Course Level: 4000-level
Credits: 4
M/Th 7:00PM-8:50PM (Full-term)
Maximum Enrollment: 16
Course Frequency: Every 2-3 years

Categories: 4000 , All courses , Computer Science , Four Credit , Fully In-Person , Mathematics , Politics , Sociology , Updates