Key Collections (with Topic Models & Visualizations)

WE1S studies a corpus of journalistic media and other documents related to the humanities from which we derive “non-consumptive use” datasets of word frequencies, metadata, and other derived data that we use for computational modeling. From our datasets, we draw subsets that we call our “collections” filtered by keyword, source, or document type to help us address particular research questions. (See also the metadata tags we add to our data to help analyze groups of publication sources.) From the approximately 30 collections we have made (some only experimental), we make 19 available for exploration below through topic models accompanied by interactive visualizations.

“Cards” below provide summaries of the collections and “start page” links leading to fuller information on each collection and access to its topic models and visualizations.

We have also deposited for download in the Zenodo open-science repository our data [but no readable text for sources under copyright], model files, and visualizations, along with our tools for making these. See our collections, project production files, and tools workspace in Zenodo.

In addition, surveys of students and others we conducted at two of our project campuses—UC Santa Barbara and U. Miami—complement our big-data analysis of media documents. Anonymized results are presented as collections of survey results.

We also describe in detail selected topic models that became especially important for some of our investigations.

See also Materials Overview


WE1S Collections of News and Other Media

Each collection described in a card below represents thousands to hundreds of thousands of documents assembled in specific combinations of sources and years from WE1S's overall corpus of news, media, and social media materials (currently primarily from the U.S.). Due to copyright and other constraints, WE1S makes available for download or interactive exploration only word-frequency and other data and metadata derived from the original texts, along with topic models and visualizations of the material.

Different collections are designed to facilitate asking certain kinds of research questions--e.g., about the profile of the humanities in the media at large, in top U.S. newspapers, in college and university newspapers, in articles mentioning the humanities and/or the sciences, etc. For example, Collection 1 is the large set of most of what WE1S gathered except social media that mentions the humanities. Collection 32 by contrast, is the large set of materials that is approximately a level sample of articles from top U.S. newspapers (not necessarily including mentions of humanities). Collection 14 includes articles from campus student newspapers. Collection 38 includes Reddit posts mentioning the humanities. And so on.

The descriptive cards below provide summary information about each collection and its source, start page with fuller description and links topic model visualizations, and location of its downloadable dataset models (not including plain text)s. The cards also commonly include screenshots from some of WE1S' topic models of collections. (For cards describing some some of these models, see below on this page.)

WE1S Collections of Survey Results

To complement its big-data analysis of news and other media mentioning the "humanities," WE1S also surveyed and held focus group meetings with students and others at two of its project's campuses: UC Santa Barbara and U. Miami. The following are data collections of survey results.

WE1S Topic Models of Collections (selected)

WE1S systematically topic-modeled its "collections"to create models at various levels of topic granularity–typically 25, 50, 100, 150, 200, and 250 topics. Each model of a collecton comes with a number of interactive visualizations–including those in the WE1S Topic Model Obveratory. Below are cards describing a few of the specific models that have been important in WE1S's research. (These topic models are labeled according to a "shelfmark" system so that "C-14.100" means the 100-topic (or "grain") model of Collection 14.)