Topic Model Interpretation Protocol

We developed a topic-model interpretation protocol that declares standard instructions and observation steps for researchers using topic models—a transparent, documented, and understandable process for the interaction between machine learning and human interpretation. Our goal is not to assert a definitive topic-model interpretation process (because this will be different for different projects and materials), but to publish a paradigm that can be adapted, improved, and varied by others in the digital humanities. (Current version of Interpretation Protocol: v. 2.0, June 2019.)


Rationale

Because complex data analysis can have a “black box” effect, researchers using machine-learning methods (e.g., in so-called in silico science) need not just to document technical workflows for reproducibility but make humanly understandable the steps in a workflow. The goal is to facilitate the interpretation of results. Digital humanities research, of course, is rooted not just in data science but long-standing traditions of humanistic hermeneutics, including the critical scrutiny of how humans “read” and “interpret” materials. Digital humanists thus carry the extra burden of needing to make visible the machine-to-human and human-to-human interpretive steps hidden in the interpretive process–how researchers read a topic model and provide evidence to reach credible conclusions from them. Yet there are currently no best practices in the digital humanities for explaining data workflow, let alone with attention to the act of human interpretation.

Our Topic Model Interpretation Protocol consists of a modular set of survey-like questionnaires with instructions, required observation methods and waypoints, and reporting methods (utilizing the principles of “grounded theory” iterative note-taking). Used in sequences, these protocols step a researcher through an interpretation and documentation process that results in a set of capturable notes that can be used to produce research reports.

We make Interpretation Protocol “as is” in their original Qualtrics survey formats (exported as QSF files for others who can import them into Qualtrics) as well as adapted Word .docx formats (using customized versions of Word’s “document properties” in each file to re-create the editable, repeated “running notes” in the original surveys). These files include instructions and references that are specific to the WE1S project and its materials. We hope that they can be forked, evolved, and adapted by other projects to evolve a consensus practice of open, reproducible digital humanities research.


See WE1S bibliography on “Interpretability and Explainability” in machine learning, and Alan Liu’s recorded lecture related to the WE1S Interpretation Protocol: “Humans in the Loop: Humanities Hermeneutics and Machine Learning.”

 

Module 0
User Training
Module 0...
Module 1
Overview of model
Module 1...
Module 2
Representative topics
Module 2...
Module 3.a
Analyze a topic
Module 3.a...
Module 3.b
Analyze a cluster of topics
Module 3.b...
Module 3.c
Analyze a keyword
Module 3.c...
Module 4.a
Compare sets of topics (multiple topics)
Module 4.a...
Module 4.c
Compare two keywords
Module 4.c...
Module 5
Compare two parts of corpus
     * part to whole
     * 2 metadata sets
     * 2 time ranges
     * compare to "random" corpus
Module 5...
Module Z
Add-on steps for collaborative interpretation
Module Z...
Module 7
Analysis & synthesis of interpretation results
Module 7...
Report Module --
Instructions for writing report
Report Module --...
= Created
= Created
Module 6
Compare two different
topic models
Module 6...
Module X
Make & document your
own workflow
Module X...
Report
(use Google Doc
research report template
Report...
Module Y
Document use of additional methods and tools
Module Y...
Viewer does not support full SVG 1.1
Flowchart of existing WE1S Interpretation Protocol modules. (Modules 0 and 5 are yet to created.) Click on diagram for a larger image.

 

The WE1S Interpretation Protocol consists of discrete modules that are combined in sequence or parallel to address research questions. The modules are like Lego™ or Minecraft™ blocks to be creatively snapped together. There is even a “Module X” for making and documenting improvised workflows bridging between other modules, and a “Module Y” for documenting use of additional methods and tools.

The modules are implemented as live Qualtrics surveys (only for WE1S developers) that we also share with others through QSF files that can be imported into the Qualtrics platform at other institutions with a license. We also share Interpretation Protocol modules in Word versions that can be used “as is” or adapted by others in place of the Qualtrics surveys (but are also useful for shared drafting of entries for the Qualtrics surveys, which otherwise can only be used by one user/computer at a time.)