In the Philosophy of Methods group, our informal experimenting produced less than fruitful results from this vague philosophy of “making sense” and “finding patterns.” Our basic assumption was conversely that coherence or consistency cannot be found in a void. In other words, the structuring moment able to make the group of words become meaningfully connected cannot be achieved by isolating the list as as a self-contained, self-referential signifying object. Eventually, our application of literary theory to the WE1S interpretation of topic models has provided a tentative, reproducible interpretive protocol that identifies the complexity and saliency of qualitative discourse analysis in the WE1S project. As critic Kathryn Schultz remarks on the 21st century development of large-scale computer learning and language analysis:
“Literature is an artificial universe, and the written word, unlike the natural world, can’t be counted on to obey a set of laws… The idea that truth can best be revealed through quantitative models dates back to the development of statistics (and boasts a less-than-benign legacy). And the idea that data is gold waiting to be mined; that all entities (including people) are best understood as nodes in a network; that things are at their clearest when they are least particular, most interchangeable, most aggregated – well, perhaps that is not the theology of the average lit department (yet).”
Running the risk of oversimplification, we expanded such concept to considering topic modeling as the temporary reduction of the complexities of language to data only to return results in terms of human-based language signification.
Using philosophies of language informing major literary theories, we noted multiple implied qualitative steps and acts of close-reading1 throughout the WE1S process of topic modeling. While an event of discourse analysis is not quite the same as the act of close-reading, there are striking similarities between the acts. For example, first, as in any act of interpretation, the danger for the WE1S scholar is to avoid overt misinterpretation, from falling short of any substantial content identification to so-called over interpretation. i.e., reading too much into the (topic) “text.” Not coincidentally, this has traditionally been the crucial node of major attempts at rigorous reading a text, i.e. literary theory oriented, reading of texts from formal close reading procedures enacted by New Critics to major post-Saussurean linguistic perspectives of contemporary literary theory.2 Furthermore, stop words play a role in the creation of a topic model – the modelers’ interpretation of language and their chosen general discourses affects the selection of omitted words. This interpretation alone can significantly alter the future interpretation of data collection. We should also consider stop words’ specific role in topic modeling as possibly different from the same in other forms of text mining and distant reading and general differences in how “omitted” words can alter the interpretation of data collection across tools.
Considering these qualitative facets of the project, humanities scholarship can likely be the topic modeler’s best resource in WE1S discourse analysis. In our exploration, we treated the document collection as a text and operate from the following protocol.
Devised Interpretive Protocol
- If more than 50% of a topic model’s document collection is about topics X,Y,Z, we might reasonably state that the overall “text” is about X,Y, Z. This follows a basic tenet of “close-reading” – establishing the major themes of a text.
- Assuming this, WE1S topic modelers should access the List view of the DFR Browser, order the topics by proportion of corpus, and look at the most prominent percentages.
- Select the top topic percentages, starting from the highest percentage, until reaching “50% +” of the total percentage (for example, often 10 topics of a 50 topic topic model will reach the “50%+” total).
- Next, “read” each of the topic word lists present in the “50% +” group. Before you label the topics, self-reflect on your intended pursuits of research and the context of your topic model and discourse analysis – is your project focused on labor and economy, gender and sexuality, history, abstract concepts or the material, or something else? What discourse are you modeling? Understand your own intent.
- After understanding your research focus, devise single topic labels using a qualitative, literary theory-focused agenda. For example, a feminist reading may look for issues of gender, sexuality, and inequality in the single topic word list and build their labels accordingly. Also, pay attention to topics that seem out of place – they may be rich with possibility. Another example: a structuralist reading of a list of words featured in a topic would look for terms that would create symmetry, contrasts, patterns, whereas a post-structuralist reading of the list of words would look for terms that would create conflicts, disunity, and fragmentation. Of course, you may obtain (slightly but not radically) different labels, but they all should be focused.
- Mark topics with colors trying to use same colors for similar topics – where similar means belonging to the same semantic field. This step can perhaps be automated, as suggested in recent studies.
- Topics would group into three or four major areas that would allow for manageable comparison(s) between different topic models
Our attempt at a theory-based reading of specific topics in various topic models has been quite successful in securing the co-presence of subjective human-based reading and objective guidelines that might channel the free-form reading into a (possibly standardizable) sequence of steps (protocol). That being said, can some components of this procedure automated by means of computer-based technological processes? We address this specific aspect of our protocol in a separate dedicated blog post: “Distributed human-machinic meaning creation.”
Abrams, Meyer Howard, and Geoffrey Harpham. A Glossary of Literary Terms. Cengage Learning, 2011.
Eagleton, Terry. Literary Theory: An Introduction. John Wiley & Sons, 1983.
_____, The Event of Literature. Yale University Press, 2012.
Lau, Jey Han, et al. “Automatic labelling of topic models.” Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011.
Schulz, Kathryn. “What is Distant Reading?” The New York Times 24 (2011).
1 M.H. Abrams defines close-reading in the Glossary of Literary Terms as, “The distinctive procedure of a New Critic is explication, or close reading: the detailed analysis of the complex interrelations and ambiguities (multiple meanings) of the verbal and figurative components within a work.” (Abrams 181).
2 Terry Eagleton’s Literary Theory: An Introduction and The Event of Literature, for example, are hallmark texts of literary criticism and close reading.