Key Collections (with Topic Models & Visualizations)

WE1S studies a corpus of journalistic media and other documents related to the humanities from which we derive “non-consumptive use” datasets of word frequencies, metadata, and other derived data that we use for computational modeling. From our datasets, we draw subsets that we call our “collections” filtered by keyword, source, or document type to help us address particular research questions. (See also the metadata tags we add to our data to help analyze groups of publication sources.) From the approximately 30 collections we have made (some only experimental), we make 19 available for exploration below through topic models accompanied by interactive visualizations.

“Cards” below provide summaries of the collections and “start page” links leading to fuller information on each collection and access to its topic models and visualizations.

We have also deposited for download in the Zenodo open-science repository our data [but no readable text for sources under copyright], model files, and visualizations, along with our tools for making these. See our collections, project production files, and tools workspace in Zenodo.

In addition, surveys of students and others we conducted at two of our project campuses—UC Santa Barbara and U. Miami—complement our big-data analysis of media documents. Anonymized results are presented as collections of survey results.

We also describe in detail selected topic models that became especially important for some of our investigations.

WE1S Collections of News and Other Media

Each collection described in a card below represents thousands to hundreds of thousands of documents assembled in specific combinations of sources and years from WE1S's overall corpus of news, media, and social media materials (currently primarily from the U.S.). Due to copyright and other constraints, WE1S makes available for download or interactive exploration only word-frequency and other data and metadata derived from the original texts, along with topic models and visualizations of the material.

Different collections are designed to facilitate asking certain kinds of research questions--e.g., about the profile of the humanities in the media at large, in top U.S. newspapers, in college and university newspapers, in articles mentioning the humanities and/or the sciences, etc. For example, Collection 1 is the large set of most of what WE1S gathered except social media that mentions the humanities. Collection 32 by contrast, is the large set of materials that is approximately a level sample of articles from top U.S. newspapers (not necessarily including mentions of humanities). Collection 14 includes articles from campus student newspapers. Collection 38 includes Reddit posts mentioning the humanities. And so on.

The descriptive cards below provide summary information about each collection and its source, start page with fuller description and links topic model visualizations, and location of its downloadable dataset models (not including plain text)s. The cards also commonly include screenshots from some of WE1S' topic models of collections. (For cards describing some some of these models, see below on this page.)

WE1S Collection 1 (C-1)

Collection 1: U.S. News Media, c. 1989-2019 (WE1S core collection of articles mentioning "humanities")

WE1S Collection 2 (C-2)

Collection 2: U.S. News Media, c. 1989-2019 (articles mentioning "humanities" or "liberal arts")

WE1S Collection 3 (C-3)

Collection 3: U.S. News Media, c. 1989-2019 (articles mentioning "humanities" or "the arts")

WE1S Collection 4 (C-4)

Collection 4: U.S. Top Newspapers, 1977-2018 (articles mentioning "humanities")

WE1S Collection 5 (C-5)

Collection 5: U.S. top Newspapers, 1977-2018 (articles mentioning "humanities" or "liberal arts")

WE1S Collection 14 (C-14)

Collection 14: U.S. Student Newspapers (articles mentioning "humanities" or "liberal arts")

WE1S Collection 15 (C-15)

Collection 15: Articles mentioning "humanities" or "literature" from ProQuest’s Ethnic NewsWatch and GenderWatch

WE1S Collection 18 (C-18)

Collection 18: U.S. Student Newspapers (articles mentioning "science(s)")

WE1S Collection 20 (C-20)

Collection 20: Collection 20: U.S. Top Newspapers, 2000-2018 (sample of all articles)

WE1S Collection 21 (C-21)

Collection 21: U.S. Top Newspapers, 2000-2018 (articles mentioning humanities or science)

WE1S Collection 27 (C-27)

Collection 27: Full Twitter Corpus c. 2014-2019

WE1S Collection 28 (C-28)

Collection 28: Tweets Containing Keyword "Humanities," c. 2014-2017

WE1S Collection 29 (C-29)

Collection 29: Tweets containing keyword "humanities," c. 2014-2017 (tweets aggregated by author).

WE1S Collection 32 (C-32)

Collection 32: U.S. Top Newspapers (sample of all articles)

WE1S Collection 33 (C-33)

Collection 33: Articles classified as being about the humanities or science, 1998-2018

WE1S Collection 36 (C-36)

Collection 36: Articles containing the word “humanities” but that have been classified as not being about the humanities, 1998-2018

WE1S Collection 37 (C-37)

Collection 37: Articles containing the words “science” or “sciences” but that have been classified as not being about science, 1998-2018

WE1S Collection 38 (C-38)

Collection 38: U.S. Reddit on the Humanities

WE1S Collection 39 (C-39)

Collection 39 (and Reddit Corpus-A): Reddit—Students on the Humanities

WE1S Collections of Survey Results

To complement its big-data analysis of news and other media mentioning the "humanities," WE1S also surveyed and held focus group meetings with students and others at two of its project's campuses: UC Santa Barbara and U. Miami. The following are data collections of survey results.

WE1S Collection HS-1 (C-HS-1)

Collection HS-1: UCSB Undergraduate Survey (responses from 2019 UCSB undergraduate student survey)

WE1S Collection HS-2 (C-HS-2)

Collection HS-5: U. Miami Non-Undergraduate Survey (demographic data)

WE1S Topic Models of Collections (selected)

WE1S systematically topic-modeled its "collections"to create models at various levels of topic granularity–typically 25, 50, 100, 150, 200, and 250 topics. Each model of a collecton comes with a number of interactive visualizations–including those in the WE1S Topic Model Obveratory. Below are cards describing a few of the specific models that have been important in WE1S's research. (These topic models are labeled according to a "shelfmark" system so that "C-14.100" means the 100-topic (or "grain") model of Collection 14.)

WE1S Topic Model (C-14.100)

WE1S

A 4Humanities Project