Instructors
Laura K. Nelson
Department/school
Course number
100S
Semester
Schedule
Monday and Wednesday, 10AM - 12PM
Units
4
CCN
33820

Increasingly, humanity’s cultural material is being captured and stored in the form of electronic text. From historical documents, literature and poems, diaries, political speeches, and government documents, to emails, text messages, and social media, students from the humanities and social sciences now have access to immense amounts of rich, and diverse, text. This course will introduce students to cutting edge ways of structuring, analyzing, and interpreting digitized text-as-data, and will do so by exploring questions fundamental to the humanities and social sciences. The ultimate goal is to encourage students to think about novel ways they can apply these techniques to their own text and research questions, and to provide the skills necessary to apply the methods in their own research. We will use the open source (and free!) programming language Python. We will also provide demonstration corpora relevant to both the humanities and social sciences.Specific skills covered include structuring and pre-processing text, dictionary methods, supervised and unsupervised machine learning, word scores and word weighting, grammar-parsing and concordances, working with metadata, and crowd-based content analysis. Then, through a series of lectures, small group projects, and tutorials in Python, students will learn how to load, pre-process, analyze, and interpret text data using all of these methods.