Research Tools Overview

WE1S has created or adapted for use software tools for processing, analyzing, and visualizing topic models of large collections of texts. These tools are assembled into an open-source workflow platform we call the WE1S Workspace, whose “modules” (sets of related Jupyter notebooks and associated tools) we make available through a containerized computing environment that can be downloaded for deployment on your own computers. You can run these tools on our datasets or collections of data about media coverage of the humanities (part of the way we support open and reproducible digital humanities). Or you can run the tools on your own texts by starting with our Jupyter notebooks for creating a project (usertest_create_template_archive.ipynb) and importing your materials (import.ipynb). [[The WE1S Workspace and all tools below will soon be released.]]

Tools graphic (showing an abstract representation of a human interpreter observing machine learning)

See cards explaining some of our key software tools, all of which are open source or access.

WE1S Workspace Jupyter notebooks

WE1S Workspace Jupyter notebooks. (See cards describing the notebooks.)

WE1S Workspace topic modeling tools

Topic modeling notebooks and related tools in the WE1S Workspace. (See descriptive cards.)

WE1S Topic Model Observatory (adapted and original visualization interfaces for the interactive exploration of topic models)

WE1S Topic Model Observatory (adapted and original visualization interfaces for the interactive exploration of topic models). See Guide to the observatory tools.

WE1S Chomp and Tweetsuite tools

See cards describing the Chomp and TweetSuite tools.

Excerpt from the documentation for the WE1S Manifest Schema (screenshot)

Excerpt from the documentation for the WE1S Manifest Schema.

WE1S logo iconWE1S Workspace

Our Workspace is an ensemble of Jupyter notebooks that can be spun up in a user’s computer from the containerized WE1S Computing Environment. The notebooks can be used modularly or in a workflow sequence to collect, manage, analyze, topic model, visualize, and perform other operations on texts.

WE1S logo iconTopic Modeling Tools

Important modules in our Workspace include those for creating and running a topic modeling project—setting up the project; importing, exporting, or managing texts; pre-processing texts; performing various analyses (such as counting documents or terms); topic modeling; and conducting topic model diagnostics.

WE1S logo iconVisualization Tools

Our Workspace also includes Jupyter notebook modules for generating interactive visualizations of topic models. We call our suite of original or adapted visualization interfaces our Topic Model Observatory. Visualization interfaces in the Topic Model Observatory useful for general purposes—exploring a topic model and its underlying materials with various degrees of freedom in looking into topics, words, and documents—include Dfr-browser, TopicBubbles, and pyLDAvis. More specialized interfaces include Metatata&D, GeoD, and DendrogramViwer. (See our Topic Model Observatory Guide.)

WE1S logo iconTools for Collecting the Web & Social Media

We also make available our Chomp—a set of Python tools designed to find and collect text from webpages on specified sites that contain search terms of interest. Unlike other web scraping tools, Chomp is designed first and foremost to take a wide sweep—working at scale and across a variety of different platforms to gather material.

For collecting from Twitter, we offer our TweetSuite, a set of tools used to collect data from Twitter and prepare it for topic modeling. See also our research blog post on our methodology of collecting materials from Reddit.

WE1S logo iconInterpretation Protocol (for Topic Models)

We developed a topic-model “interpretation protocol” that declares standard instructions and observation steps for researchers using topic models. Our goal is a transparent, documented, and understandable process for the interaction between machine learning and human interpretation. (See WE1S Bibliography of Interpretability in Machine Learning and Topic Model Interpretation.) We do not to assert a definitive topic-model interpretation process (because this will be different depending on the nature of projects, materials, resources, and personnel). We declare our interpretation protocol to serve as a paradigm to be adapted, improved, and varied by others. The protocol takes the form of survey-style questionnaires that step researchers through looking at a topic model and drawing conclusions from it.

Our Interpretation Protocol is a workflow that is modularly customizable. For example, a researcher can start by intitially exploring a model (modules 1-2) and then choose chains of other modules for specific research purposes—e.g.,”Analyze a topic” followed by “Analyze a keyword.”

We implemented the Interpretation Protocol for our project as a modular series of Qualtrics questionnaires providing instructions to researchers about what to observe in a topic model and what questions to answer (in note fields that follow the principles of Grounded Theory human reporting on data). We provide our questionnaires not just as Qualtrics files (importable by others with institutional access to Qualtrics) but also as Word documents.

WE1S logo iconManifest Schema

To document our resources, tools, and workflow in a way that is both transparent to humans and computationally tractable, we created a “manifest” schema for our work that could be adapted by other digital humanities projects. The WE1S manifest schema is a set of recommendations, examples, and validation tools for the construction of manifest documents for the WE1S project. We use the manifest schema to define metadata for individual documents, collections, sources, and corpora, as well as topic modeling projects. (See definition of manifest in a computing sense.)

WE1S manifests are JSON documents that describe resources. They can be used as data storage and configuration files for a variety of scripted processes and tools that read the JSON format. Manifests may include metadata describing a publication, a process, a set of data, or an output of some procedure. Manifests can also describe software tools, processes, and workflows, as well outputs such as result data, information visualizations, and interactive interfaces. Their primary intent is to help humans document and keep track of their workflow.