Many products of human invention — political speeches, product reviews, status updates on Twitter and Facebook, literary texts, music, and paintings — have been analyzed, not uncontroversially, as “data”.
In this graduate-level course (open to all departments, especially those in the humanities and social sciences), we will pursue two ends: we will investigate the landscape of modern quantitative methods for treating data as a lens onto the world, surveying a range of methods in machine learning and data analysis that leverage information produced by people in order to draw inferences (such as discerning the authorship of documents and the political position of social media users, charting the reuse of language in legislative bills, tagging the genres of songs, and extracting social networks from literary texts). Second, we will cast a critical eye on those methods, and investigate the assumptions those algorithms make about the world and the data through which we see it, in order to understand their limitations and when to apply them. How and when can empirical methods support other forms of argumentation, and what are their limits?
Many of these techniques are shared among the nascent communities of practice known as “computational social science”, “computational journalism” and the “digital humanities”; this course provides foundational skills for students to conduct their own research in these areas.
No computational background is required. Homeworks will be designed to give students a choice depending on their background — either a.) implementing and evaluating a quantitative method on a dataset (knowledge of Python is recommended), or b.) writing an analysis/critique of an algorithm and published work that has used it. The course will be capped with a final collaborative project.