WE1S Repositories & Deposits

Zenodo is the open-science repository for research data and related outputs created through the European OpenAIRE initiative and operated by CERN. Zenodo follows FAIRE (Findable Accessible Interoperable Reusable) principles.


GitHub is a development platform (proprietary) commonly used by software and other project developers to evolve, maintain, and distribute their code and documentation.

WE1S practices principles of research sustainability and openness by depositing its data (datasets and “collections”), tools, and lab notes in the Zenodo open-science repository.

We also distribute our code resources — for our computing “Workspace” (tools and workflow) and the Docker containerization of its computing environment — in GitHub repositories.

Below are searchable and sortable tables of our Zenodo deposits and GitHub repos.

Glossary of terms useful for understanding WE1S deposits and repositories.
  • “Corpus / Corpora” — The total set of texts (and data about them) that WE1S works with. (Compare Collection.)
  • “Datasets” — Complete sets of data representing the WE1S corpus of texts that has been derived from the original texts but is not itself readable as plain text. For example, data that the WE1S Workspace generates from texts include: bags-of-words or term frequencies, ngram counts, etc.
  • “Collection” — Derived data, topic model files, and visualization data and files representing a subset of WE1S’s datasets and corpus (e.g. just top newspapers, or student newspapers, or only newspaper articles containing both the words humanities and science, etc.).

WE1S Deposits in Zenodo

Deposit TitleTypeBrief DescriptionOpen LicenseDOI
Collection 1DataU.S. News Media, c. 1989-2019 (WE1S core collection of articles mentioning humanities") -- A collection of word-frequency and other data representing 82,324 unique articles mentioning "humanities" (no duplicate or close-variant documents) published mostly during 1989-2019 in 850 U.S. news sources and their associated blogs. (About 5,000 articles originate from earlier in the 1980s.) The word "humanities" occurs 134,948 times in the collection. WE1S and other researchers use this data to look for broad patterns and to help guide closer study.CC BY-SA 4.010.5281/zenodo.4902187
Collection 2DataU.S. News Media, c. 1989-2019 (articles mentioning "humanities" or "liberal arts") -- A collection of word-frequency and other data representing 94,816 unique articles mentioning "humanities" or "liberal arts" (no duplicate or close-variant documents) published mostly during 1989-2019 in 884 U.S. news sources and their associated blogs. (5,492 articles originate from earlier years going back to 1977.) WE1S and other researchers use this data to look for broad patterns and to help guide closer study.CC BY-SA 4.010.5281/zenodo.4908882
Collection 3DataU.S. News Media, c. 1989-2019 (articles mentioning "humanities" or "the arts") -- A collection of word-frequency and other data representing 108,207 unique articles mentioning "humanities" or "the arts" (no duplicate or close-variant documents) published mostly during 1989-2019 in 1,170 U.S. news sources and their associated blogs. (5,308 articles originate from earlier years going back to 1977.) WE1S and other researchers use this data to look for broad patterns and to help guide closer study.CC BY-SA 4.010.5281/zenodo.4913688
Collection 4DataU.S. Top Newspapers, 1977-2018 (articles mentioning "humanities") -- A collection of word-frequency and other data representing 28,375 unique articles mentioning "humanities" (no duplicate or close-variant documents) published from 1977 to 2018 in the 15 top-circulation U.S. news sources and their associated blogs. The word "humanities" occurs 39,852 times in 28,375CC BY-SA 4.010.5281/zenodo.4919794
Collection 5DataU.S. Top Newspapers, 1977-2018 (articles mentioning "humanities" or "liberal arts") -- A collection of word-frequency and other data representing 30,323 unique articles mentioning "humanities" or "liberal arts" (no duplicate or close-variant documents) published from 1977 to 2018 in the 15 top-circulation U.S. news sources and their associated blogs. The word "humanities" occurs 39,890 times in 28,398 documents in the collection, while the phrase "liberal arts" occurs 2,888 times in 2,380 documents. WE1S and other researchers use this data to look for broad patterns and to help guide closer study.CC BY-SA 4.010.5281/zenodo.4914736
Collection 14DataU.S. Student Newspapers (articles mentioning "humanities" or "liberal arts") -- A collection of word-frequency and other data representing 21,182 unique articles mentioning the "humanities" or "liberal arts" (no duplicates or close variants) published in 1998-2018 (primarily 2005-2018) in about 650 U.S university and college student newspapers that are on the UWire news service. WE1S and other researchers use this data to look for broad patterns and help guide closer study.CC BY-SA 4.010.5281/zenodo.4920178
Collection 15DataArticles mentioning "humanities" or "literature" from ProQuest's Ethnic NewsWatch and GenderWatch -- A collection of word-frequency and other data representing 835 unique articles mentioning "humanities" or "literature" (no duplicate or close-variant documents) published mostly during 2016, 2018, and 2019 in 109 U.S. news sources gathered in ProQuest's Ethnic NewsWatch ("ethnic and minority press") and GenderWatch (sources gathered for "gender and women's studies, and gay, lesbian, bisexual, and transgender [GLBT] research"). WE1S and other researchers use this data to look for broad patterns and to help guide closer study.CC BY-SA 4.010.5281/zenodo.4925152
Collection 18DataU.S. Student Newspapers (articles mentioning "science(s)" -- A collection of word-frequency and other data representing 81,445 unique articles mentioning "science" or "sciences" from the UWire news service. Articles were published in 2000-2018 in 601 university and college student newspapers, mainly from the United States. There is a noticeable spike up in the number of articles mentioning "science(s) between 2017 and 2018 from 8,116 to 14162. WE1S and other researchers can use this data to look for broad patterns and guide closer study.CC BY-SA 4.010.5281/zenodo.4914288
Collection 20DataU.S. Top Newspapers, 2000-2018 (sample of all articles) -- A collection of word-frequency and other data representing 29,183 unique articles (no duplicates or close variants) published during 2000-2018 in 15 top U.S. newspapers and their associated online blogs. WE1S and other researchers use this data to look for broad patterns and help guide closer study.CC BY-SA 4.010.5281/zenodo.4927419
Collection 21DataU.S. Top Newspapers, 2000-2018 (articles mentioning "humanities" or "science") -- A collection that contains data representing all 15,692 articles from its set of sources in these years mentioning "humanities" but only a sampling of the 388,691 articles mentioning "science" or "sciences" from those same sources and years. It downsamples "science(s)" articles (while maintaining the proportions of articles from particular sources and years) to achieve a 50/50 balance of articles related to the humanities and sciences. The purpose is to allow media discourse on the humanities to be studied alongside that on the sciences and not be buried so far down in the statistical pile that it cannot easily be seen in detail. Collection 21 is thus not a representation of the relative weight of discussion of the humanities and sciences but instead an aid to studying the fine features and structures of each.CC BY-SA 4.010.5281/zenodo.4927745
Collection 28DataTweets containing keyword "humanities", c. 2014-2017 -- This collection of the WE1S Twitter corpus consists of 799,744 tweets containing the keyword "humanities" from authors who tweeted the term "humanities" more than once between Jan. 1, 2014, and Dec. 31, 2017. (See also C-29, which aggregates tweets by author.)CC BY-SA 4.010.5281/zenodo.4940253
Collection 29DataTweets containing keyword "humanities", c. 2014-2017 (tweets aggregated by author) -- This collection of the WE1S Twitter corpus consists of 799,744 tweets containing the keyword "humanities" from authors who tweeted the term "humanities" more than once between Jan.1, 2014, and Dec. 31, 2017. This version of our Twitter corpus compiles tweets by each author into single "documents" for topic-modeling analysis, resulting in 132,562 total documents.CC BY-SA 4.010.5281/zenodo.4940259
Collection 32DataU.S. Top Newspapers (sample of all articles) -- A collection of word-frequency and other data representing 204,617 unique articles (no duplicates or close variants) published during 2012-2018 in 15 top U.S. newspapers and their associated online blogs. WE1S and other researchers use this data to look for broad patterns and help guide closer study. Included is data based on an approximately 1:40 proportional balance between articles mentioning "humanities" (about 5,000) and a sample of articles on everything else (about 200,000 more or less "random" documents found through searching on common English words). In essence, the collection is a sampled representation of "everything" in these sources for these years (limited by the fact that it is not feasible to know how many articles were actually published in these publications, to determine how completely they were collected in available database repositories, or to harvest everything from such databases.) CC BY-SA 4.010.5281/zenodo.4940326
Collection 33DataArticles classified as being about the humanities or the sciences from U.S. top-circulating newspapers and student newspapers, c. 1998-2018 -- A collection of word-frequency and other data representing 13,214 unique articles (no duplicate or close-variant documents) classified as being about the humanities or science published from 1998-2018 in 507 U.S. top-circulating and student newspapers and their associated blogs. The collection includes 2,477 articles from U.S. top-circulating newspapers and 10,737 articles from student newspapers. Using supervised classification models, 2,869 articles in the collection have been classified as being about the humanities, and 10,345 articles in the collection have been classified as being about science. WE1S and other researchers use this data to look for broad patterns and to help guide closer study.CC BY-SA 4.010.5281/zenodo.4940725
Topic Model Interpretation ProtocolDocumentsThe WhatEvery1Says (WE1S) Project developed a topic-model interpretation protocol that declares standard instructions and observation steps for researchers using topic models — a transparent, documented, and understandable process for the interaction between machine learning and human interpretation. We make Interpretation Protocol "as is" in their original Qualtrics survey formats (exported as QSF files for others who can import them into Qualtrics) as well as adapted Word .docx formats (using customized versions of Word's "document properties" in each file to re-create the editable, repeated "running notes" in the original surveys). These files include instructions and references that are specific to the WE1S project and its materials. We hope that they can be forked, evolved, and adapted by other projects to evolve a consensus practice of open, reproducible digital humanities research.CC BY-SA 4.010.5281/zenodo.4940170
Lab-1DocumentsLab-1 is the documentation deposit for WE1S research team 1, which studied the media representation of the humanities "crisis". Included in the deposit are the team's reports and lab notes (working folders) for different periods of time during the WE1S project.CC BY-SA 4.010.5281/zenodo.4891827
Lab-3DocumentsDocumentation deposit (reports and lab notes) of the WhatEvery1Says (WE1S) project's Team 3 — research team studying the relation between social groups and the humanities as represented in journalistic media.CC BY-SA 4.010.5281/zenodo.4828366
Lab-4DocumentsDocumentation deposit (reports and lab notes) of the WhatEvery1Says (WE1S) project's Team 4 — research team studying the value of the humanities as represented in journalistic media.CC BY-SA 4.010.5281/zenodo.4831043
Lab-5DocumentsDocumentation deposit (reports and lab notes) of the WhatEvery1Says (WE1S) project's Team 5 — research team studying the broader profile of the humanities in society as represented in journalistic media..CC BY-SA 4.010.5281/zenodo.4831113
Lab-6DocumentsDocumentation deposit (reports and lab notes) of the WhatEvery1Says (WE1S) project's Team 6 — research team studying the humanities in different media, including social media.CC BY-SA 4.010.5281/zenodo.4831165
Lab-7DocumentsDocumentation deposit (reports and lab notes) of the WhatEvery1Says (WE1S) project's Team 7 — research team studying the impact of government, funding agencies, and foundations on the humanities as perceived in journalistic media.CC BY-SA 4.010.5281/zenodo.4830907

WE1S GitHub Repos

[TBD]