WE1S has created or adapted for its use tools and software for processing, analyzing, and visualizing topic models based on large collections of texts. This software is assembled into an open-source workflow platform we call the WE1S Workspace, which we make available to others through a containerized computing environment that users can download for deployment on their own systems (from the level of laptops up). The container has its own operating system, with RAM and storage from the container host. The host runs on virtual hardware, which in turn runs on the user’s physical hardware. (See S-14.)
The WE1S containerized computing environment and Workspace are explained through brief explanatory cards available below. To download the environment and its contained Workspace, go to our GitHub repositories [under construction]. (See also “Intellectual Property Description of WE1S Software”).
Glossary of important terms for understanding explanatory cards on this page
- “Computer environment” — The whole computing platform containing (in replicable containers) the software and tools offered by WE1S. (See card S-1.) Users can download and install the environment on their own computers.
- “Workspace” — The data-notebook (Jupyter notebook) system within the WE1S computing environment for collecting, managing, analyzing, topic modeling, visualizing, and other operations on texts. (See card S-2; see also M-15 on Jupyter notebooks.) When initially downloaded as part of the computing environment, the Workspace includes a Jupyter notebook for initiating a project and installing the modules of software and tools for working with texts and data.
- “Project” — The folder location and file structure created by running the Jupyter notebook that initially comes with the WE1S Workspace. A project is where users work on collections of texts and data using the Workspace’s modules.
- “Module” — A specific bundle of one or more Jupyter notebooks (and supporting scripts and files) available in the WE1S Workspace after initiating a project. Each module focuses on a particular task–e.g., creating a topic model or visualizing it. There are explanatory cards for each module below on this page.
- “Template” — When a new project is created, folders for each module containing notebooks and supporting scripts or other resources are copied from a central location to your project folder. This is known as the project “template”, and the individual files are called the “template” files. Templates have version numbers so that it is clear what version of the template was used to produce a project even if the template files are updated after the project was created.
- “Corpus / Corpora” — The total set of texts (and data about them) that WE1S works with. (Compare Collection.)
- “Data” — Data representing texts in a corpus that has been derived from the original texts but is not itself readable as plain text. For example, data that the WE1S Workspace generates from texts include: bags-of-words or term frequencies, ngram counts, etc.
- “Datasets” — Complete sets of data. WE1S makes all the data it derives from its corpus available for open science in the form of six datasets deposited in the Zenodo repository. See WE1S Repositories & Deposits (Compare Collection.)
- “Collection” — Derived data, topic model files, and visualization data and files representing a subset of WE1S’s datasets and corpus (e.g. just top newspapers, or student newspapers, or only newspaper articles containing both the words humanities and science, etc.).
- “Metadata” — Secondary data about the data being worked on in the WE1S Workspace. Metadata includes such citation information about collections of texts as author, publication, date, etc. But it can also include other kinds of labels or tags created by a user to facilitate addressing research questions. For example, the WE1S Project labeled publication sources in some of the collections it topic modeled based on geographical region, kind of publication, self-identified association with particular social groups, etc.
WE1S Computing Environment & WE1S Workspace
The following cards explain the architecture of the WE1S containerized computing environment and the WE1S Workspace that runs within it for processing, topic modeling, and visualizing collections of text (as well as some other text collection and analysis functions). Users can use the environment and Workspace to study their own collections of texts; or they can work with the WE1S collections of journalistic and other texts related to the humanities.
Topic Modeling and Other Analysis, Diagnostics, & Utility Tools (Component Modules in the WE1S Workspace)
The WE1S Workspace includes a set of tools made available as what WE1S calls "modules," each of which contains one or more related Jupyter notebooks (and associated scripts and files). These can be run separately or in workflow series. The modules described below provide some of the key WE1S tools for importing/exporting/managing texts; pre-processing texts; performing various analyses (such as counting documents or terms); topic modeling; and conducting model diagnostics.
Tools for Visualizing Topic Models (Component Modules in the WE1S Workspace)
The WE1S Workspace WE1S Workspace includes a set of tools made available as what WE1S calls "modules," each of which contains one or more related Jupyter notebooks (and associated scripts and files). These can be run separately or in workflow series. The modules described below provide some of the key WE1S tools for visualizing topic models. (Also see the WE1S modules for "Text Collection, Analysis, & Topic Modeling" in the previous section of this page. Typically, a workflow would start by using WE1S modules for topic modeling collections of texts before going on to running modules for different kinds of visualization.)
Tools for Collecting and Scraping Text from the Web & Social Media
WE1S also makes available tools it has created to harvest text from Web and social media.
Additional tools created or adapted by WE1S.