Advance Orientation Materials
Research assistants and others participating in WE1S can browse the following materials to familiarize themselves with the project. For research assistants, this reading is on paid time.
Materials on Project Mission and Context
- (1) WE1S Prospectus — This prospectus is a shortened distillation of the original grant proposal submitted by WE1S to the Andrew W. Mellon Foundation.
- (2) 4Humanities.org — Browse this site to learn about the umbrella organization of the WE1S project. 4Humanities is the grassroots international initiative started at UCSB in 2010 to research and create advocacy for the humanities. 4Humanities focuses especially on digital research and advocacy methods.
Materials on Project Technical Methods
The main digital humanities method used by WE1S to understand pubic discourse on the humanities is “topic modeling” — an important computational “machine learning” approach shared with other areas in the social sciences and the sciences. WE1S is also experimenting with other text analysis methods that extend, complement, or provide alternatives to topic modeling, including “word embedding” (or “word vectors”).
WE1S is implementing the above methods by developing a workspace and workflow management system that is innovative as a paradigm for open, reproducible research in the digital humanities but that borrows for its nuts-and-bolts from common basic methods in the digital sciences, social sciences, and humanities. These nuts and bolts include the use of Markdown, scripting languages, serialization protocols (such as JSON), Jupyter “data science” notebooks, “containerization” solutions such as Docker, and versioning repository systems such as Github.
It is not important that all WE1S participants be hands-on with the project’s core technical or nuts-and-bolts methods. But for basic literacy about what is involved as participants analyze the results of topic modeling or listen to demos and presentations about the project’s workspace and workflow system, some preliminary reading is useful. Participants are asked to browse the following materials designed to provide orientation for a subset of the project’s technical platform, concentrating on methods with which they are unfamiliar. The most important materials to concentrate on are those on topic modeling:
- (3) David M. Blei, “Probabilistic Topic Models” (2013) — (read only to end of p. 79, before the math begins)
- (4) Edwin Chen, “Introduction to Latent Dirichlet Allocation” (2011)
- (5)Ted Underwood, “Topic Modeling Made Just Simple Enough” (2012)
- (6) Andrew Goldstone’s interface for exploring topic models. The Signs model has some extra, later-developed features. Especially helpful in learning how to work with these models is the guide page on “Interpreting the topic model of Signs“
- (7) John Mohr and Petko Bogdanov, “Topic Models: What They Are and Why They Matter” (2013) . This article is paywalled. UCSB students have free access through campus network or from off-campus through the UCSB VPN or Library Proxy server. CSUN students can access the article here through the campus network or VPN. There is also an open-access manuscript version.
Word Embedding (Word Vectors)
- (8) Benjamin Schmidt, “Vector Space Models for the Digital Humanities” (2015)
- (9) Scott Kleinman, “JSON Format and Its Uses in WE1S” (2018)
Juypter Notebooks (previously called iPython Notebooks) are “an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.”
- (10) Helen Shen, “Interactive Notebooks: Sharing the Code” (2014)
- (11) See also the Jupyter Notebooks home page.
- (12) Scott Kleinman, Slides for WE1S January 26, 2018, Workshop on GitHub and Markdown. (Begin at slide for “Markdown.”)
- (13) Korbin Brown, “What Is GitHub, and What Is It Used For?” (2016)